HotDog's Blog

Hotdog (Robert Verpalen) about C# and vb.net

This blog hosted by:
http://blogs.vbcity.com      
  Home :: Syndication  :: Login

AprMay 2008Jun
SMTWTFS
27282930123
45678910
11121314151617
18192021222324
25262728293031
1234567

Articles

Archives

Topics

CONTACT

Fun but useful linkies

General

VS 2005

Wolfenstein ET

Introduction

Linq, the wonderful new world of querying anything you can create a wrapper around :)  By default Linq supports any IEnumerable<> interface, which means every collection, list, array and a whole range of other objects implementing IEnumerable<> can be used as a source for Linq queries. On top of that, the new framework has implementations for querying Sql Server (DLinq) and XML (XLinq) to make the life of programmers a whole lot easier.

Lots of documentation allready exists on the usage of Linq, but one of the holes I stumbled into was how to implement custom querying for our own classes. In my specific case: I had just finished our own ORM mapping structure, with all updates, queries etc, but wanted to use the ease of use of Linq as well. Being new to Linq, first stop of course was searching internet. First research seemed to point to the IQueryable (or IOrderedQueryable) interface, but that seemed so .... verbose. Luckily before I spend too much time trying to implement the mentioned interface, I stumbled upon the Query Expression Pattern

The first time I saw the pattern mentioned was in the C# Version 3.0 specification. Mind you, that version was of 2005 and did not contain the entire pattern, the complete can be found on this msdn page
No need to implement any interfaces, just include the functions you need for your objects and you're good to go. The functions can be added straight into your classes or as extension methods on any existing class. That is also the way the default IEnumerable<> Linq behaviour is made: with extenstion methods on that interface.

A quick look at the Linq basics

Back to the beginning: any IEnumerable<> interface can be queried. The default examples are using Lists, so I'll use a string, which is of IEnumerable. Keep in mind while reading though, that the same can be applied to any IEnumerable<> implementation.
Code Copy
string s = "Some string. Some numbers: 1,2,3,4,5";
var
res = from p in s
  where char.IsLetter(p) && p > 'o'
  select char.ToUpper(p);
This is all default Linq syntax. All parts are broken up into lambda expressions and the relevant parts of the Query Expression Pattern are called. How exactly the parts are broken up is well documented in serveral articles on the internet, so I won't go there :p Lambda expressions find their roots in functional programming. The short definition is something like "a function that takes a single parameter". For the longer offcial description take a look at Wikipedia ;)
In the end, what matters for the implentations is that the lambda's are a sort of anonymous delegates, that are only executed when needed. The when needed part in turn is decided by the code implementing the pattern. In other cases they are never actually executed, but rather evaluated. For example: Dlinq does not execute the lambdas, but translates them into SQL.
One other difference between the Linq extensions and Dlinq is that the first executes per part, while Dlinq only collects information and doesn't actually execute until it is enumerated. This has nothing to do with the Linq syntax itself, but purely on how it is implemented

Not very interesting, but for completeness: the results of the query above can be shown in a Console application by adding:

Code CopyHideScrollFull
foreach (var c in res)
{
Console.WriteLine(c);
}
Console
.ReadLine();
. . .

Extending

One of the new features in framework 3.0 is extensions. How they work is well documented in that large obscure place called the internet, so if you don't know what they are, read up, because they are certainly worth knowing :) As in all overloading, the methods with the strongest match take precedence. That means that if you extend on an IEnumerable<> class (either by using extensions methods or defining them in the class themselve), you can override the default behaviour.
Of course, instead of on IEnumerable<>, you could create those methods on a whole new class too, but it's easier to start on extending on the known, so it becomes clear how easy the pattern can be used.
The following example shows how you can create a Where, specifically for a string:
Code Copy
static class LinqExamples
{
public static IEnumerable<char> Where(this string s, Func<char, bool> predicate)
{
return ((IEnumerable<char>)s).Where(predicate);
}
}
The 'trick' of casting first to the IEnumerable interface is very useful, because then the extension methods become visible. The code here does nothing more than call the base code of the normal Linq operation of IEnumerable, so you'll see no difference in the results. However, if you place a breakpoint on this function and execute the query above, you'll see that this 'Where' is called first.
NOTE: extension on a string can be most convenient specific for a string, but it's safer to extend on IEnumerable<char> instead in case multiple 'where' statements are used which can return IEnumerable instead of a string. I used extension on a string here because it avoids confusion between the different parameters used. The Select example further below does use the 'proper' syntax.

If you executed the code and examined the predicate, you'll have noticed that you don't get much info on what exactly the criterium was. This is because the resulting lambda function is passed directly and you don't see the expression that describes it. This is one of those hardly documented features, but it is possible to easily get the expression itself. (You'll need to include a using System.Linq.Expressions; in order to make the code below work)

Code Copy
public static IEnumerable<char> Where(this string s, Expression<Func<char, bool>> predicate)
{
return ((IEnumerable<char>)s).Where(predicate.Compile());
}
The only difference in the methods definition is that Expression> is used instead of Func . If you look at the 'Where' method in the Query Expression Pattern, the lambda parameter is defined as Func. T is the type of the object to which you compare in the where part, so in this case 'char'. The second generic definer is the type that will be returned by the expression, which in case of a Where statement is logically a boolean. All lambda parameters in the pattern methods can be replaced by an Expression<> in the same way.
If you place a breakpoint now, you'll get a more accurate view of what's going on. The predicate parameter will show: p => (IsLetter(p) && (Convert(p) > 111)).

Extending, the next step

Using this kind of overloading you could override what is returned completely of prefilter the results. For example if you have a morbid fear of the letter 'u', you could make sure it's filtered out first in any linq query on strings:
Code CopyHideScrollFull
public static IEnumerable<char> Select(this IEnumerable<char> s, Func<char, char> predicate)
{
foreach (char c in System.Linq.Enumerable.Select(s, predicate))
{
if (c != 'u' && c != 'U') yield return c;
}
}
. . .<. . .
Notice that this time we extend on IEnumerable<char>, rather than on string . If we would extend on string, the select would not be used as soon as a 'where' clause is used. This is because the 'where' functions return the IEnumerable and not a string and 'select' in turn extends on that returned value.
The query will return the same letters it did earlier, but without the letter 'U'. Granted, not much use for this, but the example is meant to show how easy you can extend on existing IEnumerable<> constructs. By evaluating the expression first, you could decide on whether you want to filter or not or change behaviour of certain criteria. In most cases it will be better to include where statements (calling custom functions) which perform the filtering behaviour for you, but who knows, it may come in handy someday.
It also shows that you could circumvent having to implement the entire pattern, by creating an IEnumerable<> object and overriding only those parts you wish to adapt. Of course you can always filter inside the GetEnumerator function of your custom class, which is safer in the way that all enumerations are filtered, but when you know you'll be using Linq, you can add specific functionality based on specific queries.

The Pattern Types

Where

If you look at the Where function above, it returns an IEnumerable of type char and takes a lambda expression Func (The string parameter is only for the extension and is not relevant to Linq). The returned type can be changed, as long as it implements the query expression pattern. You can (mis)use the returned object to change which object will handle the Linq expressions that follow 'Where' (In this case 'Select'). The funny thing is, that if you have the 'Where' method return a different object than the select method, you'll see that the result of the Linq query is different depending on whether or not you use a where statement in the Linq query.

In the pattern the Func lambda is represented as: Func. The lamdba itself means that it represents a function that takes a parameter of type T and returns a boolean value.
In case of 'Where' it means that T is the type to which the comparisson is executed AND at the same type, the type that is returned by the comparisson. Mostly T will be the same Type as the Type that is returned (enumerated), but you could choose to return a different type altogether. You can use several types to overload specific behaviour. One important aspect though, is that you can't mix those types in one where statement. If you use multiple types, you'll have to use multiple where statements. Pretty vague eh? :D
Well, to still use a string as enumerable source: normally you enumerate characters and the criteria used are characters. Now suppose you also want to compare with a string. Why? I have no idea, but let's do it anyway. Ok, to make up some reason: you want to be able to compare with a range. That's really a bs range, because you can use Contains with default Linq:
Code Copy
string s = "abcdefghijklmnopqrstuvwxyz";
var
res = from c in s
  where "aeiou".Contains(c)
  select c;
But you've decided that you want to do:
Code Copy
string s = "abcdefghijklmnopqrstuvwxyz";
var
res = from c in s
  where c == "aeiou"
  select c;
If you compile the code above as is, you will get a compiler error warning you that == cannot be applied to char and string. Now that does sound logical, but all you need is an extra overload. Add a function with the following signature:
Code Copy
public static IEnumerable<char> Where(this string s, Func<string, bool> predicate)
To test if it compiles, you can simply have the function return null. If you compile again, the query is suddenly ok (of course you will still have to add your own code to make it work, otherwise you'll get a runtime error, the actual implementation doesn't matter for this example). Now comes the tricky part, now you have that overload, you can compare with char (no matter if using the default or a custom 'where') and also with string, but you can't put them in the same where clause. If you change the where clause to:
where c == "aeiou" && c > 'd'
you won't be able to compile, because the two cannot be combined with each other. But you can use multiple where clauses to surpass that problem:
Code Copy
string s = "abcdefghijklmnopqrstuvwxyz";
var
res = from c in s
  where c == "aeiou"
  where c > 'd'
  select c;
Using multiple statements has the same effects as using && (And) between criteria. Don't know if there is a way to make them || (Or) , but it doesn't look like it.
Visual Studio decides the type of 'c' based on the type of which it is compared to. If you hover over 'c' in both 'where' clauses, the first 'c' will be identified as string, the second as char. That in turn means you can use any string method on the first 'c'.
Of course this again is an example with no real world value on itself, but the idea is that you can use overloading on the type used in your criteria, which opens new possibilies for creating query sources.

Promoting IEnumerable<>

Some remarks in-between about using the IEnumerable<> interface. Personally, I've been a fan of the IEnumerable possibilities since the day I laid eye on them. You can pass functions the IEnumerable object or function to any other function (sound anything like functional programming? ;-) ).
Now Linq exists, using this interface becomes even more attractive. In effect it means you can query functions. In a way this is exactly what the IEnumerable<> implementation of Linq does: Enumerating the Enumeration results of the previous function, etc.
Using IEnumerables as parameters can create great reusability. I realize this is a very abstract thing to say, but the general line is, mostly I prefer creating a function with IEnumerable<> as parameter, than a function with for example List<>. Both can be used for Linq, but by using IEnumerable<>, you can also use another function that returns IEnumerable<> as a parameter of the first function. This creates a great freedom of reusing functionality.

Doing it ourselves

The extending examples above aren't very useful in themselves, but they were the build up for creating our own queries. Now comes the part where we want to use Linq for our own objects. Mostly this will be for disconnected scenarios. The web already contains examples on eg querying the filesystem and on querying Active Directory, but they use the I(Ordered)Queryable interface. If you are planning to be able to handle complex structures and expression trees, you may or not may be better off delving into those, but I prefer the query expression pattern myself for a quicker implementation with an overview that's easier on the eye. You only add the methods that you need. You will be warned by the compiler. So if you include only the Select and Where functions, you can use where and select. If you try to use something like a group by or join, the compiler will warn you that it is not supported.
The IQueryable interface does the same, but by including an expression with the nodetype 'call'. You would have to check the methodname to check which part is called. Using the Query Expression Pattern imho is easier on the eye and gives the advantage that you can call it directly. Of course, calling it directly might not always have the same effect in a disconnected scenario ;-)

To show a simple example, here is a very very basic class for browsing active directory. Not much comment on it, with all mentioned above the theory should be straight forward enough: a class that implements 'Select' and 'Where' of the pattern is enough to make it queryable. The functionality is extremely basic and for actual usage there are much more extensive projects available on the web (one on this blog even ;) ). But implementing only the basic functionality is done on purpose, so the main point of implementing the Query Expression Pattern remains in focus.
Code CopyHideScrollFull
using System;
using
System.Collections.Generic;
using
System.Linq;
using
System.Linq.Expressions;
using
System.Text;
using
System.DirectoryServices; //needs to be added as a reference

namespace
LinqTest
{
public class LDAP
{
DirectorySearcher searcher = new DirectorySearcher();
StringBuilder
sb = new StringBuilder();
public LDAP Where(Expression<Func<string, bool>> predicate)
{
AddExpression(predicate);
return
this;
}

public IEnumerable<DirectoryEntry> Select<S>(Expression<Func<DirectoryEntry, S>> selector)
{
searcher.Filter = sb.ToString();
foreach
(SearchResult sr in searcher.FindAll())
{
yield return sr.GetDirectoryEntry();
}
}
bool AddExpression(Expression e)
{
if (e is LambdaExpression)
return AddExpression((e as LambdaExpression).Body);
else if( e is MethodCallExpression)
{

var
m = (MethodCallExpression)e;
if
(m.Method.Name == "Equals")
{
AddCrit(m, "=");
return
true;
}
//you could implement other methods too, such as "StartsWith" and "Contains"
}
else
if (e is BinaryExpression)
{
var b = (BinaryExpression)e;
sb.Append("(");                
if
(e.NodeType == ExpressionType.AndAlso)
sb.Append("&");
else if (e.NodeType == ExpressionType.OrElse)
sb.Append("|");
else
throw new NotSupportedException();
AddExpression(b.Left);
AddExpression(b.Right);
sb.Append(")");
return
true;
}
throw
new NotSupportedException();
}
void AddCrit(MethodCallExpression e,string comp)
{
sb.Append("(")
.Append(GetValue( e.Object))
.Append(comp)
.Append(GetValue(e.Arguments[0]))
.Append(")");
}
string GetValue(Expression e)
{
if (e is ConstantExpression)
return GetValue((e as ConstantExpression).Value);
return e.ToString();
}
string GetValue(object o)
{
//here alterations would have to be made to types such as guids and datetimes
return
o.ToString();
}

}
}
. . .
For a "real" project I'd create wrapper classes around the properties, together with entity classes and use those to build the query classes on. Here I've used the expression functionality in that methods are not executed until the implementing code executes them (in this case: never).
That way, I can use strings to describe the properties:
Code CopyHideScrollFull
var ldap = new LDAP();
var
res = from entry in ldap
  where "objectClass".Equals("user")  
&& "sn".Equals("a*")
  select ldap;


foreach
(var c in res)
{
Console.WriteLine(c.Name);
}
Console
.ReadLine();
. . .
Not very sweet on the eye, I'll give you that, but I hope that this article illustrates how easy it can be to create your own datasource using the Query Expression Pattern. Maybe someday I'll need the IQueryable interface, but in the meantime I've got the feeling that using the pattern can handle all my needs and is easier to use too.
Have fun in the world of Linq :)
posted on Thursday, November 08, 2007 9:51 AM

Feedback

# Query Expression Pattern: creating custom Linq sources 11/8/2007 9:55 AM HotDog's Blog


# Query Expression Pattern: article on creating custom Linq sources 11/12/2007 2:11 AM HotDog's Blog


# re: Query Expression Pattern: implementing your own Linq source 2/5/2008 7:42 AM Hotdog
Comments can be posted on http://rverpalen.blogspot.com/

Post Feedback

Title:
Name:
Url:
Comments: 
Protected by Clearscreen.SharpHIPEnter the code you see: