Log in / create account | Login with OpenID
DocForge
Programmer's Wiki

Garbage collection

From DocForge

Garbage collection is the disposal of virtual memory that has fallen out of scope. In other words, it's memory given back to the operating system as available memory when an application no longer needs it. In programming languages such as C and C++, garbage collection is manually performed by the programmer. In more modern languages, such as C#, Java, and D, garbage collection is performed automatically during run time with no additional explicit work necessary by the programmer.

Contents

[edit] Manual Garbage Collection

The best example of manual garbage collection is in C++, where heap memory objects (most structs and classes) are abandoned when they fall out of scope, causing a memory leak. It is therefore important to explicitly call the destructor of the objects as they fall out of scope. When one instance is referenced multiple times, this becomes increasingly difficult to ascertain when an object has truly fallen out of scope.

To this end, the first - and perhaps easiest to understand - form of manual garbage collection was developed, known as reference counting. Reference counting tracks the number of references to an instance (excluding the reference counting mechanism itself, of course), and when the number of references is equal to zero the reference counter calls the objects destructor, which will free the memory used by the object, preventing memory leaks.

The disadvantage of this is that it has the potential to force a destructor to be called during heavy load, which can further strain the running system. Good reference counting tools will postpone object destruction either until non-paged memory runs out, or until the system is no longer under load.

An example of a reference counting manual garbage collector is Boost C++'s Smart Pointers. Smart pointers are not a replacement for the practice of keeping track of when heap memory has fallen out of scope locally - such as during a routine in another instance. It is a tool for preventing the premature destruction of an instance, so that it is only destroyed when all references to it are gone. It is still very easy to experience a memory leak even with reference counting, since if you forget to remove your reference to an object - even once - the reference count will never equal zero, and so the instance will never be destroyed.

[edit] Automatic Garbage Collection

Automatic garbage collection is very convenient to the programmer, mainly because it makes the possibility of a memory leak nearly impossible. Perhaps the best example of a garbage collecting language is the Java programming language, which uses an automatic reference counting garbage collector. Automatic reference counting garbage collection basically automates the process of reference counting by heuteristically examining code and automatically performing the reference counting chore by marking when objects leave scope.[1]

The tradeoff is that automatic garbage collection is invariably slower than manual garbage collection. It also allows for memory leaks to persist until the garbage collector is run, which will clean all instances that have fallen out of scope from memory. This can be advantageous, however, since deferred garbage collection can prevent the garbage collection from further slowing down a system under load.

Garbage collection in most languages can be forced, such as with Java's System.gc() call. This is useful for forcing garbage collection do free up memory when the system has not yet reached the mark at which it will automatically run the garbage collector.

[edit] Memory Inside of Routines

Garbage collection is only applicable to heap memory objects, which are by definition not on the stack. However, during the execution of some routines it is very important to pay attention to memory usage, especially during the execution of loops and recursive function calls.

Memory allocated for local variables inside of a method are not subject to garbage collection, and they are freed when the function returns - with the exception of variables that are returned by the method, or if the runtime is from a compiler which does not support this, as many compilers for embedded systems do. You do not need to explicitly free local variables, only variables that have scope outside of a given method.

[edit] Variables and Loops

One of the most common mistakes in programming is the allocate and define variables inside of loops. An example of this common mistake would be:

while(!exit_now){
    char[] response=tango.io.Console.Cin.get;
    if(tango.text.Ascii.compare(response, "true"))
        exit_now=true;
}

The mistake lies in the second line. That is literally directing the compiler to allocate a new pointer for a new dynamic array and then size it to the size of the string returned from the user. If this were in a loop not dependent upon the typing speed of the user, you run the risk of spending large amounts of time allocating memory. This is especially dangerous in languages which do not support dynamically sized arrays, such as C, C++, and Java, though most dangerously Java. Java allows for string concatenation as a very low-level, easy operation. Java Strings, however, are an immutable type. Every time you change a Java String, you have to reallocate all the memory. You can avoid having to reallocate memory that is perhaps still usable by only assigning values to the variable inside the loop.

char[] response;
while(!exit_now){
    response=tango.io.Console.Cin.get;
    if(tango.text.Ascii.compare(response, "true"))
        exit_now=true;
}

The difference can significantly increase the speed of your applications if they are performing many operations inside of loops.

Java programmers should also consider using a StringBuilder, which is not an immutable type and does not incur all the overhead of reallocating all the memory and copying it to the new location at every modification.

[edit] Recursive Functions and Memory

Recursive programming is nitorious for running out of memory, and the chances that you have personally run out of memory are nearly 100% if you have every worked on a recursive algorithm of any sort. Every function call creates a new entry on the run time call stack. That stack contains such things as function arguments, but more importantly it is the history of functions necessary to return to the last function to resume execution. Recursive programs often abuse this capacity to store memory in the call stack, and this is extremely effective in most situations. However, it is not ideal for longer-running algorithms since the call stack is greatly limited in size, since it is normally limited to register memory only. Once the register memory is gone, the program crashes because it has no more memory available to it in order to continue to execute. This is most often manifested as a call stack overflow error or exception of some sort.

If your algorithm requires more memory, consider writing your own stack or using an existing stack (other than the call stack) to simulate the recursive behavior, pushing those variables to those stacks - which aren't limited by register memory. This has the effect of slowing down your application, but it increases stability since it is less likely to run out of memory.

Do you have information or insights to contribute to this article? Please feel free to edit this page. Ask questions or contribute to the discussion on this article's talk page.