Performance testing LINQ to Objects Except Method
When .NET introduced LINQ and lambdas, I was blown away by the efficiency improvements in writing code. But, after reading countless articles online about how horribly slow and inefficient LINQ (even to objects) was compared with just plain for-loops, I’ve always felt guilty about using it.
The other day, however, I wanted to see how much slower linq would be in comparing 2 lists of objects to find all of the items from list1 that were not in list2. LINQ has an extension method called ”Except” just for this.
I coded the solution in LINQ, and then coded a few “solutions” using pre-LINQ techniques…spoiler alert: LINQ was the best for large sets of objects. The objects in the lists look like this:
private class TempFileInfo: IEquatable<TempFileInfo> { public ulong Id { get; set; } public string Name { get; set; } public ulong ParentId { get; set; } public string VirtualPath { get; set; } public string FileName { get; set; }
public bool Equals(TempFileInfo other) { if (other == null) return false; if (!EqualityComparer<ulong>.Default.Equals(Id, other.Id)) return false; if (!EqualityComparer<string>.Default.Equals(Name, other.Name)) return false; if (!EqualityComparer<ulong>.Default.Equals(ParentId, other.ParentId)) return false; if (!EqualityComparer<string>.Default.Equals(VirtualPath, other.VirtualPath)) return false; if (!EqualityComparer<string>.Default.Equals(FileName, other.FileName)) return false; return true; }
public override int GetHashCode() { int hash = 0; hash ^= EqualityComparer<ulong>.Default.GetHashCode(Id); hash ^= EqualityComparer<string>.Default.GetHashCode(Name); hash ^= EqualityComparer<ulong>.Default.GetHashCode(ParentId); hash ^= EqualityComparer<string>.Default.GetHashCode(VirtualPath); hash ^= EqualityComparer<string>.Default.GetHashCode(FileName); return hash; } }
And then I have a test method that does all of the comparisons:
[TestMethod] public void perfTest() { var localList = new List<TempFileInfo>(); var remoteList = new List<TempFileInfo>(); var diffs = new Dictionary<TempFileInfo, bool>();
var listLength = 1000;
for (int counter = 0; counter < listLength; counter ++) { var guidText = Guid.NewGuid().ToString(); localList.Add(new TempFileInfo() {Id = (ulong) counter, Name = guidText, ParentId = 1, VirtualPath = "Meh", FileName = "fake file path"}); remoteList.Add(new TempFileInfo() {Id = (ulong) counter, Name = guidText, ParentId = 1, VirtualPath = "Meh", FileName = "fake file path"}); }
var uniqueName = "different"; var differentFile = new TempFileInfo() { Id = 0, Name = uniqueName, ParentId = 1, VirtualPath = "Meh", FileName = "fake file path" };
localList.Add(differentFile);
var watch = new Stopwatch();
//linq way watch.Start(); var linqDiffs= localList.Except(remoteList).ToList(); watch.Stop();
var linqDuration = watch.Elapsed.Duration(); watch.Reset();
//dictionary way watch.Start(); foreach (var a in localList) diffs.Add(a, true); foreach (var b in remoteList) diffs.Remove(b); watch.Stop();
var dictionaryDuration = watch.Elapsed.Duration();
watch.Reset(); diffs.Clear();
//contains way watch.Start(); foreach (var localInfo in localList) { if (!remoteList.Contains(localInfo)) { diffs.Add(localInfo, true); } } watch.Stop();
var containsDuration = watch.Elapsed.Duration();
watch.Reset(); diffs.Clear();
//double loop watch.Start(); foreach (var localInfo in localList) { var contains = false; foreach (var remoteInfo in remoteList) { contains = localInfo.Name == remoteInfo.Name; if (contains) break; } if (!contains) diffs.Add(localInfo, true); } watch.Stop();
var foreachDuration = watch.Elapsed.Duration();
watch.Reset(); diffs.Clear();
//remove way (if localList can be modified) watch.Start(); foreach (var remoteFi in remoteList) { localList.Remove(remoteFi); } watch.Stop(); var removeDuration = watch.Elapsed.Duration(); }
Let’s review the results
For just 1K items (listLength variable), LINQ isn’t very impressive
For 10K items…hmm, LINQ is working it’s way up to #1
If you plan to run this yourself, prepare to wait a while… the nested-looping techniques are effectively O(N^2)
… so the time it takes to run is exponential.
And what about 100K items? LINQ wins hands down
If you have time to kill, you can try it with higher numbers… but waiting 5+minutes for the .Contains()
method was long enough for me!
Conclusion
No need to feel guilty about using LINQ… where it matters (huge sets), it’s the best choice anyway.