Notes on the CLR Garbage Collector

A few days ago, some of us in my group were discussing about the GC mechanism used in the CLR. It was an interesting discussion, and like most discussions, inconclusive. So I decided to put (almost) everything interesting to know about the GC on paper. Some of the people who read this, particularly Vinod – he is sick of my long mails not converting into articles – suggested that I share this with a broader audience and put it on the blog.

While writing, I was actually vacillating – more than enough has been written about GC, so why write another piece? But then, I think no single article captures everything, and since there are so many anyway, why not another one?? 🙂 As far as I know, there is nothing new here – everything has already been said by the likes of Patrick Dussud, Brian Harry, Chris Brumme, Maoni Stephens, Rico Mariani, Jeff Richter – this is just a compilation, and not a very comprehensive one at that, esp. on the Best Practices, so I have included a Further Reading section. For whatever it is worth, here it is!!

1. The Problem Domain – Memory Management

1) Applications composed of components / libraries written by different developers at different points in time
2) These components have no idea of resources are allocated / released by each other – they are only aware of how they manage resources themselves
3) The substrate needs to provide a common memory management infrastructure which would let these components work with each other
4) The mechanism used should
* Be optimal on memory utilization
* Provide strong object locality to take get cache efficiency and take advantage of CPU Registers & Caches
* Ensure code correctness
* Minimize the burden on the developer
* Support scenarios like Finalization, Weak References and Pinning

2. Memory Management – What are the Choices?

2.1 Reference Counting
Each component keeps a count of number of clients using it and destroys itself when the last client Releases it
* Example: COM, Number of large-scale C++ projects
* Depends on components and clients keeping a correct count, mistakes can lead to a memory leak (accumulation of unreferenced objects) or premature releasing
* Difficult to resolve circular references

2.2 Garbage Collection – Mark and Sweep
* Example: C Runtime Heap
* Initially, allocate objects on the heap sequentially
* At some stage, mark the objects that are dead and can be removed
* Free the dead object slots at some stage
1) In C/C++ the programmer has to do it since the runtime can be misled about object size by type-casting, void pointers, etc.
2) Environments like CLR where type safety is guaranteed are always aware of object size and can free memory on their own
* Maintain a list of free slots (Free List)
* On next allocation request traverse the list and give the smallest possible slot that can fulfill the memory allocation request
* Advantage: Minimal house-keeping overhead (just one freelist)
* Disadvantage: Every allocation request requires a walk thru the freelist, makes allocations slow
* Disadvantage: Heap fragmentation – there may not be a single contiguous block to accommodate a request for a large block

2.3 Garbage Collection – Copy and Collect
* Keep two heaps
* Allocate only from one heap
* When collection is triggered on the heap, copy all alive objects to the second heap
* Switch the roles of heaps
* Advantage: Conceptual Simplicity
* Disadvantage: Copy operation needs to be done for all objects
* Disadvantage: Blocks a lot of memory unneccessarily

2.4 Garbage Collection – Mark and Compact
* Example: CLR, some JVM implementations
* Same as Mark and Sweep except that as free space from dead objects is claimed, the alive objects are compacted together to leave a single contiguous block of free memory
* Disadvantage: Compacting memory can be expensive for large objects
* Advantage: Allocation requires just a pointer to keep track of the top of the heap, so it is lightning fast
* Advantage: No heap fragmentation

3. What Happens in the CLR

3.1 Marking, Sweeping and Compacting

3.1.1 Large and Small Objects
There are actually two GC heaps: one for large objects and another for small objects. Large objects are kept in a separate heap called the Large Object Heap (LOH) which is Mark and Swept since copying large objects causes extra overhead
* The definition of large object used to be >= 20000 bytes in v1.0
* It changed to >= 85,000 bytes in 1.1
* It may change in future, so don’t make hard assumptions on the size

Smaller objects are kept in multiple heaps which are compacted from time to time.

3.1.2 Marking a heap (all GC heaps)
* GC assumes that all heap is garbage.
* It then starts building a graph of all live objects starting from the application’ roots.
* Roots are references to live objects. They exist in:
1) The call stack of a thread
2) Static and global objects
3) Freachable queue (see Finalization)
4) Places like the CPU registers
* The JIT compiler tracks the roots and gives a list to the GC
* All objects that are reachable are considered alive, everything else is dead and can be collected

3.1.3 Sweeping a heap (all GC Heaps)
* Merge adjacent freed blocks of memory into one.
* Maintain a FreeList of all space that is free.
* Allocation would now need to walk the FreeList

3.1.4 Compacting a heap (only the small object heaps)
* Slide objects so that all alive objects get compacted into one contiguous block
* Update references to each object so that they now point to new memory locations
* Move the heap pointer (pointer to the next object location) to the top of the heap
* Allocation would now happen at the heap pointer
* Each allocation moves the pointer further up
* LOH is only swept and never compacted
* Small Object heaps are frequently swept and less frequently compacted

3.2 Generational GC

3.2.1 Memory Usage Observations
Generational GC or ephemeral GC is an optimization based on the following observations:
* Newer objects tend to live for a short duration
* Older objects tend to live for longer durations
* Newer objects have strong relationships with each other and frequently accessed around the same time

3.2.2 The Three Generations
The GC keeps small objects in three generations:
* Gen 0: Newly created objects which have not seen a collection yet
* Gen 1: Objects which have survived one collection
* Gen 2: Objects that have survived two collections

* Gen 0 and Gen 1 are also called ephemeral generations, and they always live together in a single segment called the ephemeral segment (see Segments later)
* GC tries to fit Gen 0 in the L2 Cache of the CPU

3.2.3 Collections by Generations
* Every time Gen 0 heap crosses a threshold (called a “budget” – see allocation and collection below), a GC is triggered
* Gen 0 is collected very frequently.
* Frequency of Gen 0 collections > Gen 1 Collections > Gen 2 collections (Gen 1: Gen 2 being 10 : 1 is healthy)
* Collection for Gen N collects all Gens < N, so
1) A Gen 1 collection also causes a Gen 0 collection.
2) A Gen 2 collection also causes a
Gen 1 and Gen 0 collection. It also causes a LOH Collection. This is called a full collection
* If memory pressure is high, GC is more aggressive in collection and Gen 1 and Gen 2 collections may happen more frequently.

3.2.4 Generational GC Benefits
* No need to compact a single gigantic heap – multiple smaller heaps of different sizes are compacted at different frequencies – this is faster
* Objects which have a similar lifespan stay closer together. This improves object locality. Object locality helps improve perf with modern CPUs where proc speed >> memory speed
* The time required in the Mark phase can be significantly reduced by ignoring the inner references of older generation objects
* However, an older object may have been written to and may have hold a reference to a newer object
* To track this, the GC maintains an internal structure called the “card table”
1) Card Table consists of entries: 1 bit for 128 bytes of memory. This means one DWORD corresponds to 4k, which incidentally is the size of a page
2) So you can think about the card table as an array of DWORDs with every entry referring to a page and further split into every bit tracking a 128 byte range
3) Every time an object gets modified, the JIT emits code to write to the object as well as the card table
a) This is possible because of what is called the “Write barrier” – the JIT preventing a write directly
b) Another approach is to use the GetWriteWatch Win32 API that tells you what has changed
4) To map an object to the address range, another internal look up dictionary like structure is used.

3.3 Segments
GC reserves memory in segments. A segment is a unit of reservation.

3.3.1 Generations and Segments
* To begin with, GC creates two segments – one for Gen 0 and Gen 1, and the other for LOH
* Gen 0 and Gen 1 share a single segment throughout the lifecycle of the program. This is called the ephemeral segment.
* Gen 2 may have zero or more segments of its own
* LOH has one or more segments of its own

3.3.2 Segment Size
* As of CLR v2, each segment typically = 16 MB (but not fixed, dynamically tuned)
* Each Heap Segment has a Reserved Size and a Committed Size. Committed <= Reserved
* Initially reserve, commit on allocation on demand

3.3.3 Heap Expansion and Contraction
A heap expands or contracts by adding / deleting heap segments. A segment is allocated whenever there is a need for more memory. This can happen if
1) Existing LOH segments cannot satisfy an object allocation request. A new LOH segment is created.
2) Objects getting promoted from Gen 1 during a Mark Phase cannot be accommodated any more in the existing Gen 2 segments. A new Gen 2 segment is created.

3.3.3.1 Segment Allocation
* To allocate a new segment, GC calls VirtualAlloc.
* If there is no free contiguous block in the heap large enough for a segment
1) Segment Allocation fails
2) The allocation request also fails
3) GC returns NULL to the Execution Engine (EE)
4) EE throws an OOM exception

3.3.3.2 Segment Deletion
A segment which is not in use, is deleted in a full collection and memory is returned to the OS
* This can be prevented with a feature introduced with CLR 2.0 called VM Hoarding which can be turned on via hosting APIs
* In fact with hosting, the CLR host can do pretty much anything. As an example, SQL Server may not want to report the memory to CLR in a transparent way to control its behavior, or fail allocations, or reserve / commit memory in advance, etc. There is a lot to memory control for a CLR host, but a full discussion of hosting is outside the scope of this article.

3.3.4 Relationship between Heaps, Segments and Generations
1) To a program, the CLR exposes a single contiguous heap
* For MP machines, when Server GC is turned on, there is a full heap and a dedicated GC thread per CPU, but to the app, there is still only one heap across all CPUs (see GC Flavors later)
2) Segments are units in which the GC internally reserves and commits virtual memory, so a heap consists of multiple segments
3) Generations are object lifetime groupings and are used by the GC to trigger collections for small objects, so logically, they orthogonal to segments. Physically:
* Gen 0 and Gen 1 always live in the same segment called the ephemeral segment
* Gen 2 lives in zero or more segments
* LOH lives in one or more segments

3.4 Allocation and Collection
At a macro level, during allocation, depending on the object size:
* Small object (< 85000 bytes) – object gets created on ephemeral segment in Gen 0, – just move the Heap pointer forward
* Large object (>= 85000 bytes) – object gets created on LOH (no co-relation with Generations), so scan the FreeList for a large enough slot, or allocate a new LOH segment

3.4.1 Allocation Context
* As mentioned earlier, memory is not allocated upfront, memory is only reserved at a segment level
* For optimization, the Allocator doles out memory in 8 kb chunks – each chunk is called an Allocation Context
* An allocation context belongs to a thread
* Allocation of each chunk requires a lock for a UP machine (GC uses lightweight spin locks) and is lock free on MP machines (see GC flavors later)

3.4.2 Zeroing the Memory
When a chunk is given, GC zeroes the memory. This is done for two reasons:
1) Security – you do not stumble upon left over garbage
2) It helps with Fail Fast. Code referencing a null pointer fails faster than code referencing some random data

3.4.3 Allocating Memory to an Object
When it comes to assigning memory to an object
* If it is within the current allocation context, the GC just needs to move the pointer
* If it is outside the allocation context, the GC needs to allocate another allocation context

3.4.4 Generation Budget
For each generation, the GC keeps a “budget” – a threshold value which when crossed triggers a collection in that generation
* When the GC needs to come up with an allocation context and realizes it has crossed the generation budget, a collection is triggered for that generation.
* For Gen 0, the budget is much smaller than the ephemeral segment
* The budget is dynamically adjusted by the GC over the lifetime of an app

3.5 Finalization

3.5.1 The Finalization Problem
Objects that need to explicitly release resources as a part of their cleanup (typically unmanaged resources like database connections, file handles, etc.), cannot rely on GC alone to do a cleanup.
* It is desired that the calling code may do the cleanup imperatively
* It is also desired that during collection, the object gets a chance to clean itself up if the calling code has not done so imperatively earlier

3.5.2 Finalization in CLR

3.5.2.1 Imperative Finalization
The calling code can call some specific method (typically called “Close” or “Dispose”) and cleanup the object imperatively. Types are supposed to implement the IDispose interface for this (see IDispose later)

3.5.2.2 Finalization by the Runtime
To indicate to the GC that there is a cleanup required, the object can implement a method called Finalize
* The method called Finalize is exposed as a destructor in C#.
* A destructor in C# is not really a destructor, it just expands to a method called Finalize which in turn calls the base class’ Finalize method

3.5.2.3 Finalization Mechanism
The system keeps a track of objects with a Finalize method in a structure called the Finalization queue. Dur
ing the mark phase, after the graph of reachable objects has been built, the Finalization queue is checked, and if there is an object in the Finalization queue that is not on the graph:
1) GC puts it on a separate queue called the F-Reachable queue, and adds the object to the graph of reachable objects
* Since the object is now reachable, it does not get collected
* Since this object survives a collection it gets promoted to a higher generation.
* The same happens to all the objects referenced by this object
2) At some stage, GC invokes a special runtime thread to call Finalize on each object in the Freachable queue
* No guarantees on when Finalize would be called
* No specific order in which objects are finalized
* Exception in a Finalize method is interpreted by the GC as a proper method exit and not an exception
3) Finalized objects are removed from the Freachable queue and become unreachable and would be collected in the next collection
* So a Finalizable object and each object that it references, take two collections in a higher generation than otherwise
* If an object has been imperatively cleaned up, this overhead should be avoided
* Therefore a method called GC.SuppressFinalize is provided which should be called by an object if it has been cleaned up imperatively to avoid being put on the FReachable queue. This also prevents a Finalizable object from being promoted to the next gen (unless already in Gen 2)

3.5.3 Resurrection
* Since a Finalizable object after dying becomes alive again when put on the Freachable list, the phenomenon is called Resurrection. This is shortlived since the GC collects it shortly there after
* However, Resurrection can be more permanent if in the Finalize method, the object puts a pointer to itself in a global / static variable. Now the object is reachable even after being removed from the Freachable queue, so are the other Finalized objects to which this object may have held references.
* The resurrected object now needs to be put back on the Finalization list so that when it eventually gets collected, it gets Finalized. For this, the object should call GC.ReRegisterForFinalize()

3.5.4 Implementing a Finalizable Object

3.5.4.1 Imperative Cleanup
The mechanism an object uses to enable imperative cleanup is called the “Dispose Pattern”. The Dispose pattern is realized by implementing the IDisposable interface, which looks as follows:

public interface IDisposable
{
void Dispose();
void Dispose(bool disposing);
}

* Typically the object would keep track of whether Dispose has been called or not by using a private flag
* The Dispose() method maybe called imperatively. Typically Dispose() calls Dispose(bool) with a value of true and then calls GC.SuppressFinalize(this)
* Depending on what value is passed to the Dispose(bool), there are two possibilities:
1) Dispose(true) is called by the user code directly / indirectly. Both managed and unmanaged resources can be cleaned up here
2) Dispose(false) is called by the runtime from inside the Finalizer. Here, since the managed resources of the object may have already been cleaned up by the GC, no managed cleanup is done, and only unmanaged resources are cleaned up.

3.5.4.2 Runtime Cleanup
To enable the runtime to cleanup the code, the Finalize method needs to be implemented. This is typically combined with implementing the Dispose pattern. Here’s sample code for writing a Finalizable Type (taken from the MSDN documentation)

class Base : IDisposable
{
 private bool disposed = false;

 ~Base() // expands to Finalize, called by runtime
 { Dispose(false);  }


 public void Dispose() // called by user code
 {
  Dispose(true);
  GC.SuppressFinalize(this);
 }


 public void Dispose(bool IsDisposing)
 {
  if (disposed) return;

  if (IsDisposing)
  {
   // clean up managed as well as unmanaged resources
  }
  else
  {
   // clean up unmanaged resources only
  }

  disposed = true;
 }


 // some objects (like Database Connections, Files, etc.) prefer Close over Dispose
 public void Close() { Dispose(); }

 void DoSomething()
 {
  If (disposed) throw new ObjectDisposedException();
  // rest of the code for the method
 }

    // more code for the class

}

class Derived : Base
{
 protected override void Dispose(bool IsDisposing)
 {
  if (disposed) return;

  try
        {
   if (Isdisposing)
   {
    // cleanup managed & unmanaged resources added by this class only
   }
   else
   {
    // cleanup only unmanaged resources added by this class only
   }
  }
  finally
  {
   base.Dispose(IsDisposing);
  }
 }

 // more code for the class
}
        

3.6 Weak References
A weak reference (as against the usual reference called a strong reference) gives access to an object without stopping the GC from collecting it
* It is useful where you do not want to block a lot of memory, so are willing to have an object get collected, but in case it has not been marked for collection yet you would wish to use the object
* So this gives a way for the GC to reuse the heap, which is very useful in memory intensive applications (e.g. when working with images)
* It is only relevant during the period when there are no strong references to the object and the object has not yet been marked for collection
* To use an object that has a weak reference to it, the calling code first needs to obtain a strong reference

3.6.1 Kinds of weak references
1) Short weak references: weak references to non-Finalizable objects, or Finalizable objects whose resurrection is not getting tracked
2) Long weak references: weak references to Finaliable objects whose resurrection is tracked

3.6.2 Using Weak References

 LargeObject lo = new LargeObject(); // lo is a strong reference

 // use lo

 WeakReference wr = new WeakReference(lo, false) // not tracking resurrection
 lo = null; // remove strong reference

 // do some work

 // need lo again
 if (wr.IsAlive)
 {
                 lo = (LargeObject)wr.Target; // obtain strong reference
 }

 if (lo == null)  lo = new LargeObject();

 // use lo
        

3.6.3 How GC works with weak referenced objects
* Obtaining a weak reference to an object adds an entry into either of the two weak reference tables – Short Weak Reference Table (SWRT) or Long Weak Reference Table (LWRT) – internal structures used by the GC to work with weak references
* During the mark phase, the GC after building a graph of all reachable objects, scans the SWRT and if it finds a pointer to an object that is not in the graph, it deems the object as unreachable and sets the pointer to NULL
* As mentioned earlier, the GC scans the Finalization queue and if it finds an object not already on the reachable graph, it adds the objects to the Freachable queue and adds the object to the graph of reachable objects.
* GC scans the LWRT and if it finds an entry to an object that is not there on the graph of reachable objects (which now also includes freachable objects), it deems the object as unreachable and sets the entry to NULL
* So unlike strong references, objects which have just got weak references to them are considered unreachable and the GC marks them as being garbage

3.7 Pinning
* There are situations in which it is undesirable that the GC move a block of memory during the Compact phase.
* For such situations, the GC allows that objects be pinned. A pinned object is not moved.
* Pinning can happen when:
1) Managed Code calls unmanaged code (P/Invoke, COM Interop)
2) Use of the fixed keyword in C#
* Pinned objects prevent compaction and therefore cause fragmentation (see best practices later)
* Since LOH is never compacted, pinning does not matter in the LOH, but that is an implementation detail and should not be considered in app design

3.8 Object Layout
* Each object before its instance data has two pointers
1) One to a Method Table (offset zero)
2) Second to a Sync Block structure (negative offset)
* The lower two bits of the Method Table pointer are overlaid with Pinned and Marked flags.
1) Pinned bit tells the GC whether an object has been pinned
2) Marked bit tells the GC whether the object has been marked for collection
* The method table itself has a structure called GCDesc at a negative offset. This is used to group together all the references contained in a type. This grouping helps the GC in building a graph of reachable objects quickly.

3.9 Thread Suspension
For a collection to happen, it is important that the threads are in a known state and do not modify the heap in a way that cannot be tracked by the GC. So typically, all running threads are suspended, the collection done and threads resumed. Thread suspension services are provided by the EE and GC simply calls SuspendEE and RestartEE.

3.9.1 Suspension for Native Code
* A thread running unmanaged code can have a reference to a managed object. But in that case, the object would be alive and pinned. So it would not be collected anyway.
* To prevent an unmanaged thread from entering into managed code, a barrier is created
1) Basically the JIT finds out the stack frame where the thread went from managed code to native
2) The return address of this stack frame is hijacked and made to point to a stub which suspends the thread
3) It is ok for this thread to continue running native code as long as it wants to

So we can ignore native code altogether

3.9.2 Suspension for Managed Code
For managed code there are three ways of suspension:

1) Fully Interruptible Code: JIT may generate code in such a way that it tracks all register motion – this is called fully interruptible code.
* Such a thread can be directly hijacked at any time, its IP being adjusted to point to a stub which suspends the thread
* Producing all the tracking information bloats the code and is therefore avoided so fully interruptible code sections are rare.
2) Return Address Hijacking: For code which is making frequent function calls, instead of generating fully interruptible code, the same technique is used as for native code – the return address of the stack pointer is hijacked
3) GC Safe Points: The JIT can insert instructions in the code to run a function which checks to see if a GC is pending and if yes suspends the thread. The points in the code where the JIT inserts these instructions are called GC Safe Points.

3.10 Overall GC Algorithm
The GC algorithm consists of 5 phases:
1) Mark: Separate the live objects from dead objects for the generation being collected. At the end of this phase:
* All small dead objects have their Mark bit set, and if required Pin bit also set
* Finalizable objects are put on the FReachable queue
* Weak pointers to dead objects are nulled
* All large dead objects are put on the FreeList
2) Plan:
* Grow or shrink the heap as needed
* Choose whether Compacting would happen or not. If yes, then calculate new addresses of objects that will be moved.
3) Sweep: Put all dead objects on a free list
4) Relocate: Only for a Compacting Collection. Pointers of objects being moved are updated to point to new addresses (except of course for pinned objects)
5) Compact: Only for a Compacting Collection. Objects are moved to the new addresses.

3.11 Flavors of GC
There are three flavors of GC in the CLR

3.11.1 Workstation GC with Concurrent GC off
* Meant for high throughput on a uni-processor (UP) machine
* All that is described above is for this flavor

3.11.2 Workstation GC with Concurrent GC on
* Meant for interactive applications where response time is critical
* Gen 2 collections are concurrent – the GC does not suspend the threads for the entire duration of the collection, but only for very short durations very few times during the collection
* Gen 0 budget is far more than in non-concurrent GC so that threads can allocate even when GC is running. However if Gen 0 budget gets exhausted and the collection is still not over, a managed running thread requesting for more memory would be blocked till the GC finishes
* Has a slightly higher working set

3.11.3 Server GC
1) Meant for high throughput and high scalability of server-side applications running on multi-processor (MP) machines,
* Based on affinities between threads, CPUs and heaps
* Best case scenario is where there is a pool of threads with all thread doing similar jobs and having similar memory utilization patterns and very little or zero sharing of memory
2) For each CPU, GC creates one full managed heap (Gen 0, 1, 2 and LOH) and a separate GC thread
* The GC thread is affinitized to that CPU
* However, it is not that there is a separate GC / CPU. Remember that there will be cross references across heaps. So for example, the Mark phase on all CPUs must finish before the GC threads can move to the next phase
3) On a particular CPU, when a thread on a memory request triggers a collection, it signals an event to wake the GC threads and waits on it to finish. But this happens only on that CPU. All threads on all other CPUs keep running
* All managed threads are suspended just like in Workstation GC with Concurrent GC Off.
* GC threads finish collection and signal an event
* All managed threads resume

3.11.4 Configurations
* By default you get Workstation GC with Concurrent GC off on UP as well as MP machines
* To get Workstation GC with Concurrent GC on, in App.Config, specify:
* To get Server GC on a MP machine, in App.Config, specify:
* ASP.net and SQLCLR use Server GC automatically for MP machines
* If you ask for a Server GC on a UP machine, you get a Workstation GC with Concurrent GC off since that is optimized for high throughput

3.12 Dynamic Tuning
* The GC dynamically makes several choices. These include:
1) Budget for a generation
2) Permissible fragmentation in a generation (so whether to compact, or just sweep)
3) Size of a generation
4) Size of a GC Segment
* The GC arrives at these choices over a period as it observes the behavior of an application. For a well designed server, workloads would stabilize after some time, and this would help in the GC getting tuned and the tuning would not change much over a period of time

3.13 Compact Framework GC
Compact Framework 2.0 targets Windows CE 5.0. Windows CE 5.0 is designed for embedded systems and hence needs to run on very constrained hardware. This means that the architecture of Windows CE is very different from Mainstream desktop / server Windows NT OS in a big way.

3.13.1 Windows CE Memory Model
1) There are 32 process slots
* Slot zero is reserved for the currently executing app
* So there can only be 31 apps
2) Each app gets 32 MB of virtual address space. This contains
* Code pages of the EXE
* Statics, Globals, etc.
* Stack of each thread
* Heaps allocated by the app
3) Memory allocated outside 32 MB is from the 1 GB High Memory Area
* This memory is global, not private to the app
* All memory mapped files (DLLs, etc.) are loaded here
* VirtualAlloc calls for more than 2 MB allocate here
4) The OS code is loaded in a separate System Code Space (32 MB)
5) For more details see
* Programming Windows CE by Doug Boling
* Effective Memory, Storage and Power Management in Windows Mobile 5.0

3.13.2 Compact Framework Basics
Compact Framework is also very different from CLR for full framework. From a memory perspective:
* Native code for CLR system DLLs (mscoree.dll, mscoree2_0.dll) is loaded into the System Code Space
* IL Code for the app and the libraries it uses is loaded into the High Memory Area
* As the application executes, the JITed native code, the GC heap, the metadata structures, all go in the private 32 MB application address space

For more details see “Design of the .Net Compact Framework CLR” by Steven Pratschner:
* Part – I: Overview and Background
* Part – II: JIT Compiler Design Considerations
* Part – III: GC Heap Management

3.13.3 GC Heap on CF

3.13.3.1 Code Pitching
Besides the GC Heap, the JIT Heap is also in a limited 32 MB address space, so unlike the full CLR, in CF the JITed native code can at times be removed to free memory. This is called Code Pitching.

3.13.3.2 Single GC Heap
* There are no generations, no LOH, just a single heap.
* This is mostly swept, sometimes compacted

3.13.3.3 Allocations
Allocations on the GC heap take place in segments
* Typical segment size = 64k
* But if you ask for more than 64k memory, segment allocated would be > 64k

3.13.3.4 Regular Collections
* Trigger: 1 MB of objects have been allocated since last collection
* Returning memory to OS:
1) After each collection, Collector keeps 1 MB of 64k segments
2) All empty segments beyond 1 MB are returned to the OS

3.13.3.5 Full Collection
* Triggers:
1) Failure to allocate memory / resource
2) App moved to background
3) OS going to hibernation
* Process:
1) Memory from Unreferenced Objects released
2) Heap compacted
3) All free segments released, including the 1 MB reserve

4. Best Practices

4.1 Basics
1) Avoid too many allocations.
a) When the LOH is full or when Gen 2 is full, another segment allocation takes place, and this would kill perf.
b) Allocation cost is proportional to allocation volume, not number of times you make allocation requests. So do not
* Follow the practice of allocating in chunks (speculative allocation) like in malloc
* Round off memory requests like with malloc – ask for exactly what you need
c) Avoid using functions like String.Split in a loop since they can result in creation of a lot of objects
d) Avoid using string concatenation in a loop since this too involves creation of a number of objects. Use a StringBuilder instead
2) Lesser number of GCs is good
3) As few high generation GCs as possible
4) If you are not using an object, set the reference to null so that the object breaks from the roots. This ensures maximum coverage during the Mark phase of a GC.

4.2 Type Usage
1) Avoid storing references to young type instances in old type instances
2) Try to reuse a large object rather than creating several temporary large objects because
* The LOH collection / Gen 2 Collection is a full collection – the overhead is quite high
* The LOH heap is never compacted so too much creation and destruction can lead to fragmentation
3) If child objects need to have similar lifetimes as parent objects, try and create the parents and children together so that they stay together and have better object locality
4) Avoid writing too often to an object, especially an old object. This can slow down the marking phase of a Gen 2 collection. This is because of the Write Watching (see Generational GC Benefits)

4.3 Type Design

4.3.1 Finalizable Types vs. Non-Finalizable Types
1) A finalizable object takes two collections to get collected, so do all the objects referenced by a Finalizable object
2) This drives up the cost of GC
3) So a Finalizable types should be designed in such a way that it does not reference too many non-Finalizable type objects
4) This is especially true for types whose objects will live for a long time. Imagine a Gen2 object getting Finalized!
5) For a Finalizable object always implement both IDisposable and Finalizer – this ensures that the calling code gets a chance to clean up imperatively, as well as ensures finalization by the runtime in case the calling code does not clean up imperatively.
* In the Finalize / Dispose method implementation, ensure that GC.SuppressFinalize should be invoked.

4.3.2 Value Types vs. Reference Types
* Value Types are on the stack and have no GC overhead
* Frequent Boxing and Unboxing is worse than just keeping a type as reference type
* Very small types should preferably be value types – reference types have a 8 byte overhead by default (method table pointer and sync block pointer – see object layout above)

4.3.3 Embedded references
Avoid having too many references to other objects. This is for two reasons:
1) Too many references lead to too many writes. This causes Write-Watch overhead (see above in Type Usage)
2) Too many references remove the benefit of object locality and this hits perf

4.4 Miscellaneous

4.4.1 Lifetime
* Avoid having too many roots (globals, statics, etc.), esp those pointing to Gen 0 objects – this slows down the Gen 0 collection
* Avoid creating objects that suffer from the mid-life crisis – neither too long lived, nor too short lived. This increases the size of Gen 2 and Gen 2 collections are expensive.
* To avoid this, clean up (set references to null) the objects as much as
possible before blocking on a long operation (DB call, Web Service call, etc.)

4.4.2 GC.Collect
* You typically do not know better than the GC to figure when to do a collection
* It completely messes up the GC tuning
* The only scenario to use it is if on some rare, non-repeating event, a lot of old objects die, and without doing GC.Collect, the GC does not collect them for a long enough period of time for it to impact the app

4.4.3 Perf Mon
* Use the minimum possible interval (1 sec) and capture the log for a few minutes
* Look out for
1) % Time in GC: A high value indicates too many Gen 2 collections. Should try and modify allocation pattern
2) # of GC Handles: Should not rise steadily – indicates a memory leak, especially on a server workload

5. Further Reading
* Garbage Collection: Algorithms for Automatic Dynamic Memory Management by Richard Jones and Rafael Lins
* Garbage Collection on Wikipedia
* Resource Management (and Deterministic Finalization in the .Net Framework) by Brian Harry
* Garbage Collection: Automatic Memory Management in the Microsoft .Net Framework by Jeff Richter
* Garbage Collection – Part 2: Automatic Memory Management in the Microsoft .Net Framework by Jeff Richter
* Garbage Collection Basics and Performance Hints by Rico Mariani
* CLR Inside Out: Investigating Memory Issues by Claudio Caldato and Manoi Stephens
* CLR Inside Out: Using Concurrency for Scalability by Joe Duffy
* An Overview of .Net CF GC by Steven Pratschner
* The Design of the .Net Compact Framework CLR Part III: GC by Steven Pratschner
* Explanatory Notes on Rotor’s GC by Joel Pobar
* Traversing the GC Heap by MVStanton
* OutOfMemoryException and Pinning by Yun Jin
* Two Things to Avoid for Better Memory Usage by Rico Mariani
* When to Call GC.Collect by Rico Mariani
* Mid-Life Crisis by Rico Mariani
* Tracking Down Managed Memory Leaks by Rico Mariani
* The Perils of GC Collect (for compact framework) by Scott Holden
* Maoni Stephen’s Blog
* Chris Brumme’s Blog
* Patrick Dussud’s Blog
* CLI Specs
* Shared Source CLI
* Java Hotspot Garbage Collection
* Tuning Garbage Collection with 1.4.2 Java VM

4 thoughts on “Notes on the CLR Garbage Collector”

  1. Mahesh Kumar.R

    Thanks Vineeth, Must appreciated article.I took prinout of this to have insight of all this jargons..Thanks

  2. This is really useful. But one question – what are those app.config settings for tuning the GC? I dont see anything in the docs.

  3. Vinod Kumar wrote:

    Thanks Vineet for the mention. Happy to see these stuffs getting online.
     
    Just to mention, folks Vineet writes these loooooooooooooong emails with fully loaded content every now and then. And I am happy that these are getting online for the benifit of all.
     
    Enjoy !!!

Leave a Comment

Your email address will not be published. Required fields are marked *