The missing link: The LinqEqualityComparer
There are a few Linq extension methods that only support the combination of the Equals and GetHashCode methods of an object or the use of a custom EqualityComparer. When you want to select all items from a list that are not in a second list for example.
Look at the following code:
class MyClass { public string MyProperty { get; set; } }
var l1 = new List<MyClass> { new MyClass { MyProperty = "shared" }, new MyClass { MyProperty = "Item in l1" } }; var l2 = new List<MyClass> { new MyClass { MyProperty = "shared" }, new MyClass { MyProperty = "Item in l2" } };
When you are combining two lists based on the value of MyProperty, you have 3 options:
- Implement the Equals and GetHashCode method of the object, but that might not be very useful when the scenario is not very common;
- Create a custom EqualityComparer, which is an opportunity when MyClass is not in control;
- Don’t use the Except method but a use combination of other Linq extension methods instead:
l1.Where(m1 => !l2.Any(m2 => m1.MyProperty == m2.MyProperty));
I think this last option is used most, but it might have a performance penalty because of the lookup in l2. This might be reduced by creating a lookup when performance is important, but now I’ve also created a 4th option: the LinqEqualityComparer. It is a very simple class that is compares two objects based on an expression that is provided via the constructor. You will also have to make up a hash code. The GetHashCode is used by the runtime to do a quick pre-comparison before calling the more “expensive” Equals method. It is also possible to return a constant value if you cannot come up with a decent hash code. This ensures that the Equals method always gets called.
public class LinqEqualityComparer<T> : EqualityComparer<T> { Func<T, T, bool> _equals; Func<T, int> _hash; public LinqEqualityComparer(Func<T, T, bool> equals, Func<T, int> hash) { _equals = equals; _hash = hash; } public override bool Equals(T x, T y) { return _equals(x, y); } public override int GetHashCode(T obj) { return _hash(obj); } }
Now we can use the Except and Distinct methods with our new equality comparer:
var comparer = new LinqEqualityComparer<MyClass>((x, y) => x.MyProperty == y.MyProperty, m => m.MyProperty.GetHashCode()); var result = l1.Except(l2, comparer);
Remark: It is very important that the correct hash code is used, because when using the hash code method of MyClass all objects will be unequal to the comparer!
The next step is to create a factory class that creates a comparer and deduce the equals and hash code methods from a selected property. I also added an extra overload that takes an IEqualityComparer as input. This makes it possible to plug in an extra comparator for the actual comparison, for example the StringComparer.
static class LinqEqualityComparer { public static LinqEqualityComparer<T> Create<T, TKey>(Func<T, TKey> keySelector) { // Use the Equals and the GetHashCode method from the type that is selected with the key selector. return new LinqEqualityComparer<T>((x, y) => keySelector(x).Equals(keySelector(y)), m => keySelector(m).GetHashCode()); } public static LinqEqualityComparer<T> Create<T, TKey>(Func<T, TKey> keySelector, IEqualityComparer<TKey> comparer) { // Use the Equals and the GetHashCode method from the comparer using the types that from the key selector as input. return new LinqEqualityComparer<T>((x, y) => comparer.Equals(keySelector(x), keySelector(y)), m => comparer.GetHashCode(keySelector(m))); } }
This is how it looks in action:
var comparer = LinqEqualityComparer.Create<MyClass, string>(m => m.MyProperty); var result = l1.Except(l2, comparer);
and the extra overload:
var comparer = LinqEqualityComparer.Create<MyClass, string>(m => m.MyProperty, StringComparer.OrdinalIgnoreCase); var result = l1.Except(l2, comparer);
The last step is an extension method class to extend the Linq extensions methods. These methods will only reroute the call to the original Linq methods while plugging in the new equality comparer. I only implemented the Except method, but the Distinct and Intersect are not that different.
public static class LinqExtensions { public static IEnumerable<T> Except<T, TKey>(this IEnumerable<T> first, IEnumerable<T> second, Func<T, TKey> keySelector) { return first.Except(second, LinqEqualityComparer.Create(keySelector)); } public static IEnumerable<T> Except<T, TKey>(this IEnumerable<T> first, IEnumerable<T> second, Func<T, TKey> keySelector, IEqualityComparer<TKey> comparer) { return first.Except(second, LinqEqualityComparer.Create(keySelector, comparer)); } }
The first method can be used like this:
var result = l1.Except(l2, m => m.MyProperty);
And the second method like this, where the casing of the selected property is ignored:
var result = l1.Except(l2, m => m.MyProperty, StringComparer.OrdinalIgnoreCase);
I don’t now if this new comparison class will be very helpful. It turned out that in the situation where I discovered this method that I wasn’t interested in the whole object but only in the specific property where I was filtering on. So I found out there was a 5th method to solve this issue that might be more elegant and less work in the first place:
var result = l1.Select(m => m.MyProperty).Except(l2.Select(m => m.MyProperty));
But I can imagine that there are other situations where an EqualityComparer must be used and the LinqEqualityComparer might help… At last it was a nice exercise for me 🙂