Beware of the IDictionary<TKey, TValue>
Background
As it is the case in most scenarios the source of latency is usually due to the .NET Garbage Collection, so we always try very hard to minimize object allocations specially in our performance-critical components.
We have a messaging API which is used by many of our micro services. Some of those services use the API to publish thousands of messages per second to the market so it is important to keep the latency as low as possible.
In my quest to reduce latency and CPU consumption of our components, I came across a very interesting and probably not so well known performance gotcha!
Problem
The API has a method which accepts an IDictionary<string, string>
and publishes them to a message bus like so:
public void Publish(IDictionary<string, string> data)
{
foreach (var item in data)
{
_publisher.Publish(item);
}
}
Nothing really complicated right? WRONG!
What is going on?
Let’s do a quick and dirty profiling of an even simpler example:
void Main()
{
Dictionary<int, int> dic = new();
var sw = Stopwatch.StartNew();
for (var i = 0; i < 100000000; i++)
{
foreach (var item in dic) { ; }
}
sw.Stop();
// compose the result
StringBuilder result = new();
result.AppendFormat("[Time]\t{0}", sw.Elapsed);
result.AppendLine();
result.AppendFormat("[Gen-0]\t{0}", GC.CollectionCount(0));
result.AppendLine();
result.AppendFormat("[Gen-1]\t{0}", GC.CollectionCount(1));
result.AppendLine();
result.AppendFormat("[Gen-2]\t{0}", GC.CollectionCount(2));
Console.WriteLine(result.ToString());
}
What we have here is an empty dictionary where both the keys and values are of type int
. We are then simply enumerating the dictionary a hundred million times. Note we are not doing anything in the body of the foreach so that we can focus only on the enumeration.
Finally, we record the total time it took for the enumeration as well as the number of times GC performed collections of our GEN 0 to GEN 2. Running the above on a machine with a Quad-Core 3.4GHz and 8GB of RAM prints:
[Time] 00:00:01.8128806
[Gen-0] 0
[Gen-1] 0
[Gen-2] 0
So it took almost 2 seconds and resulted in no garbage collection, so far so good. Now let us try again but this time change our dic pointer to an IDictionary<int, int>
instead:
... same as above
IDictionary<int, int> dic = new Dictionary<int, int>();
... same as above
Which prints:
[Time] 00:00:04.3902405
[Gen-0] 765
[Gen-1] 2
[Gen-2] 1
This time it took more than twice than enumerating a Dictionary<int, int>
and resulted in 765 collections of GEN 0, 2 collections of GEN 1 and 1 collection of GEN 2. This is MADNESS!
And here is why
Let us look at the IL generated in the case when we are using a Dictionary<int, int>
:
... not important
IL_0013: ldloc.0 // dic
IL_0014: callvirt System.Collections.Generic.Dictionary<System.Int32,System.Int32>.GetEnumerator
IL_0019: stloc.s 04
IL_001B: br.s IL_0029
IL_001D: ldloca.s 04
IL_001F: call System.Collections.Generic.Dictionary<System.Int32,System.Int32>+Enumerator.get_Current
IL_0024: stloc.s 05 // item
...
IL_0029: ldloca.s 04
IL_002B: call System.Collections.Generic.Dictionary<System.Int32,System.Int32>+Enumerator.MoveNext
IL_0030: brtrue.s IL_001D
IL_0032: leave.s IL_0043
IL_0034: ldloca.s 04
IL_0036: constrained. System.Collections.Generic.Dictionary<,>.Enumerator
... not important
Here we can see a call to System.Collections.Generic.Dictionary<System.Int32,System.Int32>+Enumerator.get_Current
to get the current Enumerator
and on every iteration we have a call to System.Collections.Generic.Dictionary<System.Int32,System.Int32>+Enumerator.MoveNext
to move the Enumerator
forward.
Let us compare that with the IL for the case of an IDictionary<int, int>
:
... not important
IL_0013: ldloc.0 // dic
IL_0014: callvirt System.Collections.Generic.IEnumerable<System.Collections.Generic.KeyValuePair<System.Int32,System.Int32>>.GetEnumerator
IL_0019: stloc.s 04
IL_001B: br.s IL_0029
IL_001D: ldloc.s 04
IL_001F: callvirt System.Collections.Generic.IEnumerator<System.Collections.Generic.KeyValuePair<System.Int32,System.Int32>>.get_Current
IL_0024: stloc.s 05 // item
...
IL_0029: ldloc.s 04
IL_002B: callvirt System.Collections.IEnumerator.MoveNext
IL_0030: brtrue.s IL_001D
IL_0032: leave.s IL_0041
... not important
Very interesting! This time we have a callvirt
to System.Collections.Generic.IEnumerable<System.Collections.Generic.KeyValuePair<System.Int32,System.Int32>>.GetEnumerator
followed by more callvirt
to System.Collections.Generic.IEnumerator
.
Aha! Allow me to explain in case it is not that obvious.
Virtual Call / Interface Dispatch
The first issue is the Interface Dispatch (the parts that do a non-inlineable callvirt
to System.Collections.Generic.IEnumerator
) which is in contrast to the case of Dictionary<int, int>
where we are doing fast, non-virtual and inlineable calls.
Boxing
The second issue and by far the most significant performance hit is due to _boxing. Instead of returning a Dictionary<int, int>.Enumerator
, we are boxing the struct
and return an IEnumerator<KeyValuePair<int, int>>
which explains the high number of GC collections across our heap.
How about IList, IReadOnlyList, ICollection & IEnumerable vs List?
IList<T>
and IReadOnlyList<T>
implement ICollection<T>
which in turn implements IEnumerable<T>
and therefore using any of those interfaces for the enumeration incurs the similar performance cost we covered above.
This is even more clear if you look at the source of List<T>
:
public Enumerator GetEnumerator() => new Enumerator(this);
IEnumerator<T> IEnumerable<T>.GetEnumerator() => new Enumerator(this);
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator() => new Enumerator(this);
You can see in the last two GetEnumerator()
s the Enumerator struct
is being boxed to a generic an a non generic IEnumerator
.
Summary
It is a good idea to abstract yourself from an implementation, by that I mean programming against an interface rather than a concrete implementation however there are times where this can have a performance impact and what we covered in this article was one of such examples. In our case the problem was fixed by simply switching to a Dictionary<string, string>
which is the cost we are prepared to pay in return of a more efficient API.