Wednesday, April 16, 2008

Deffered query execution in LINQ

LINQ improves performance by deferring the execution of the query until it is actually needed. One of the most important features of LINQ is the lazy evaluation of the query. It is this property that makes LINQ to improve performance while querying the collections.

Consider the following example:

private static Collection<User> GetUsers()

{

return new Collection<User> {

new User { UserName = "Sreejith P.R", Salary = 5000, Country = "India" },

new User { UserName = "John Samuels", Salary = 3450, Country = "UK" },

new User { UserName = "Kemp Davis", Salary = 6733, Country = "USA" },

new User { UserName = "Rahul Bose", Salary = 9000, Country = "India"},

new User { UserName = "Micheal Johnson", Salary = 4566, Country = "USA"},

new User { UserName = "Kim David", Salary = 7822, Country = "Denmark"}

};

}

static Int32 IncreaseSalary(Int32 salary)

{

Console.WriteLine("\nIncreasing Salary...");

return salary + 2000;

}

static void Main(string[] args)

{

var userSalaries =

from user in GetUsers()

where user.Salary > 5000 //Execute when the foreach statment is called.

select new { Name = user.UserName,

Salary = IncreaseSalary(user.Salary) };

foreach (var user in userSalaries)

{

Console.WriteLine(user.Name + " " + user.Salary.ToString());

}

Console.ReadKey();

}

The output of this sample is

Increasing Salary... Kemp Davis 8733

Increasing Salary... Rahul Bose 11000

Increasing Salary... Kim David 9822

By looking into the results of this execution, you will understand that the execution of the query does not happen in one time; instead it defers the execution of the query until the actual data is requested. In our example the where user.Salary > 5000 line will execute only after the actual data is requested. Here it is the foreach statement when the data is requested. This property of LINQ is called deffered execution or deffered evaluation.

When you declare a query variable, it does not actually contain the result of the query,instead it captures the data structure of the query. This data structure contains the detail about what you want to query.

It is due to thiss feature of lazy initialization LINQ improves the performance of the queries. For example when you are querying a bulk data and want to stop the query after a particular condition is met, deffered execution helps you to improve the performance by not loading the results into the memory. It will only query the collection till the condition is met!!!.

Deffered execution is the default behaviour of LINQ queries. If you want to bypass the deffered execution, you have to call the .ToList() on the query like..

var userSalaries =

(from user in GetUsers()

where user.Salary > 5000

select new

{ Name = user.UserName, Salary = IncreaseSalary(user.Salary) }).ToList();

Will make my output look like

Increasing Salary...

Increasing Salary...

Increasing Salary...

Kemp Davis 8733

Rahul Bose 11000

Kim David 9822

Reason:

The normal query will iterate over the objects of type IEnumerable or IQueryable, .ToList() method converts my object into a IList type instead of a IQueryable type.

No comments: