Outer Join at LINQ

LINQ - how much has been invented in C # just so that we can enjoy the delights of Language Integrated Query. Namely:

  • Generics
  • Extension methods
  • Lamda expressions
  • Expression trees
  • Anonumus types
  • Object initializers
  • Type inferring

And that’s all so that we can write something like this:
	var query = 
		from itemA in listA
		join itemB in listB
			on itemA.Key equals itemB.Key
		select new {itemA, itemB};
	

One can not disagree - impresses.

And in the midst of all this syntactic sugar there was a spoon of dew which prevented me from getting enough sleep :)

This is a total lack of support for OUTER JOIN. But as it turned out tar easily turns ... turns ... turns ...

... into another syntactic sugar.

Those who tried to find a solution for LEFT OUTER JOIN on the Internet, probably know a similar solution:

	var query = 
		from itemA in listA
		join itemB in listB
			on itemA.Key equals itemB.Key into outer
		from itemO in outer.DefaultIfEmpty()
		select new {itemA, itemO};
	

Such a design clearly confuses the understanding by an order of magnitude and complicates an already simple construction. And this is just a replacement for INNER JOIN with LEFT OUTER JOIN. In order not to continue to shock, I will not give an example with a FULL OUTER JOIN.

It would seem as if it would be simple if we could write like this:

	var query = 
		from itemA in listA
		left join itemB in listB
			on itemA.Key equals itemB.Key
		select new {itemA, itemB};
	

or so

	var query = 
		from itemA in listA
		full join itemB in listB
			on itemA.Key equals itemB.Key
		select new {itemA, itemB};
	

But no. The authors of C # did not provide us such pleasure. Well, it doesn’t matter. Nevertheless, they will allow us to do this on our own, although not in such a beautiful way.

To begin with, if someone tells you that LINQ and the System.Collections.Generic.IEnumerable interface have something in common and cannot exist separately, you can safely laugh in person ...

Design

	var query = 
		from itemA in listA
		join itemB in listB
			on itemA.Key equals itemB.Key
		select new {itemA, itemB};

it is simply translated by the compiler into the following sequence of characters:
	var query = listA.Join(listB, itemA => itemA.Key, itemB => itemB.Key, (itemA, itemB) => new {itemA, itemB});

and it doesn’t matter what type of variables listA and listB are. Suppose listA is a variable of type TypeA, and the permalable itemB is of type TypeB. So, if TypeA and TypeB contain a property or field called Key, TypeA contains the Join () method with 4 arguments. This LINQ query compiles freely.

When using variables in LINQ that implement the standard IEnumerable interface, the extension method is used

public class System.Linq.Enumerable
{
		public static IEnumerable<TResult> Join<TOuter, TInner, TKey, TResult>(this IEnumerable<TOuter> outer, IEnumerable<TInner> inner, Func<TOuter, TKey> outerKeySelector, Func<TInner, TKey> innerKeySelector, Func<TOuter, TInner, TResult> resultSelector) {...}
}

Actually this method also produces the well-known INNER JOIN. And now street magic begins. In order to implement the LEFT / RIGHT / FULL OUTER JOIN (or JOIN which will please your soul), it is necessary to replace the call of the standard method with the one implemented by us. To do this, we need to convert the variable listA in some way to a type that we can control.

By implementing the following two classes:

public class JoinedEnumerable<T> : IEnumerable<T>
{
	public readonly IEnumerable<T> Source;
	public bool IsOuter;

	public JoinedEnumerable(IEnumerable<T> source) { Source = source; }

	IEnumerator<T> IEnumerable<T>.GetEnumerator() { return Source.GetEnumerator(); }
	IEnumerator IEnumerable.GetEnumerator() { return Source.GetEnumerator(); }
}

public static class JoinedEnumerable
{
	public static JoinedEnumerable<TElement> Inner<TElement>(this IEnumerable<TElement> source)
	{
		return Wrap(source, false);
	}

	public static JoinedEnumerable<TElement> Outer<TElement>(this IEnumerable<TElement> source)
	{
		return Wrap(source, true);
	}

	public static JoinedEnumerable<TElement> Wrap(IEnumerable<TElement> source, bool isOuter)
	{
		JoinedEnumerable<TElement> joinedSource 
			= source as JoinedEnumerable<TElement> ?? 
				new JoinedEnumerable<TElement>(source);
		joinedSource.IsOuter = isOuter;
		return joinedSource;
	}
}

we can easily write the next LINQ query

	var query = 
		from itemA in listA.Outer()
		join itemB in listB
			on itemA.Key equals itemB.Key
		select new {itemA, itemB};

and now implementing the Join extension method for the JoinedEnumerable class in the way we need, we get everything we need.

And here are the extension methods:

public static IEnumerable<TResult> Join<TOuter, TInner, TKey, TResult>(this JoinedEnumerable<TOuter> outer, IEnumerable<TInner> inner, Func<TOuter, TKey> outerKeySelector, Func<TInner, TKey> innerKeySelector, Func<TOuter, TInner, TResult> resultSelector, IEqualityComparer<TKey> comparer = null)
{
	if (outer == null) throw new ArgumentNullException("outer");
	if (inner == null) throw new ArgumentNullException("inner");
	if (outerKeySelector == null) throw new ArgumentNullException("outerKeySelector");
	if (innerKeySelector == null) throw new ArgumentNullException("innerKeySelector");
	if (resultSelector == null) throw new ArgumentNullException("resultSelector");

	bool leftOuter = outer.IsOuter;
	bool rightOuter = (inner is JoinedEnumerable<TInner>) && ((JoinedEnumerable<TInner>)inner).IsOuter;

	if (leftOuter && rightOuter)
		return FullOuterJoin(outer, inner, outerKeySelector, innerKeySelector, resultSelector, comparer);

	if (leftOuter)
		return LeftOuterJoin(outer, inner, outerKeySelector, innerKeySelector, resultSelector, comparer);

	if (rightOuter)
		return RightOuterJoin(outer, inner, outerKeySelector, innerKeySelector, resultSelector, comparer);

	return Enumerable.Join(outer, inner, outerKeySelector, innerKeySelector, resultSelector, comparer);
}

public static IEnumerable<TResult> LeftOuterJoin<TOuter, TInner, TKey, TResult>(this IEnumerable<TOuter> outer, IEnumerable<TInner> inner, Func<TOuter, TKey> outerKeySelector, Func<TInner, TKey> innerKeySelector, Func<TOuter, TInner, TResult> resultSelector, IEqualityComparer<TKey> comparer = null)
{
	var innerLookup = inner.ToLookup(innerKeySelector, comparer);

	foreach (var outerItem in outer)
		foreach (var innerItem in innerLookup[outerKeySelector(outerItem)].DefaultIfEmpty())
			yield return resultSelector(outerItem, innerItem);
}

public static IEnumerable<TResult> RightOuterJoin<TOuter, TInner, TKey, TResult>(this IEnumerable<TOuter> outer, IEnumerable<TInner> inner, Func<TOuter, TKey> outerKeySelector, Func<TInner, TKey> innerKeySelector, Func<TOuter, TInner, TResult> resultSelector, IEqualityComparer<TKey> comparer = null)
{
	var outerLookup = outer.ToLookup(outerKeySelector, comparer);

	foreach (var innerItem in inner)
		foreach (var outerItem in outerLookup[innerKeySelector(innerItem)].DefaultIfEmpty())
			yield return resultSelector(outerItem, innerItem);
}

public static IEnumerable<TResult> FullOuterJoin<TOuter, TInner, TKey, TResult>(this IEnumerable<TOuter> outer, IEnumerable<TInner> inner, Func<TOuter, TKey> outerKeySelector, Func<TInner, TKey> innerKeySelector, Func<TOuter, TInner, TResult> resultSelector, IEqualityComparer<TKey> comparer = null)
{
	var outerLookup = outer.ToLookup(outerKeySelector, comparer);
	var innerLookup = inner.ToLookup(innerKeySelector, comparer);

	foreach (var innerGrouping in innerLookup)
		if (!outerLookup.Contains(innerGrouping.Key))
			foreach (TInner innerItem in innerGrouping)
				yield return resultSelector(default(TOuter), innerItem);

	foreach (var outerGrouping in outerLookup)
		foreach (var innerItem in innerLookup[outerGrouping.Key].DefaultIfEmpty())
			foreach (var outerItem in outerGrouping)
				yield return resultSelector(outerItem, innerItem);
}

Voila ...

Beautiful LEFT OUTER JOIN:

	var query = 
		from itemA in listA.Outer()
		join itemB in listB
			on itemA.Key equals itemB.Key
		select new {itemA, itemB};

Beautiful RIGHT OUTER JOIN:

	var query = 
		from itemA in listA.Inner()
		join itemB in listB.Outer()
			on itemA.Key equals itemB.Key
		select new {itemA, itemB};

Beautiful FULL OUTER JOIN:

	var query = 
		from itemA in listA.Outer()
		join itemB in listB.Outer()
			on itemA.Key equals itemB.Key
		select new {itemA, itemB};

Now, if you wish, you can use your own approach - as the field for imagination is enormous here. I have several more interesting solutions for implementing goodies in the zashashnik. There will be time to share them.

Thank you for the attention.

May POWER be with you!

All Articles