SSW Foursquare

Rules to Better LINQ - 6 Rules

Language Integrated Query or LINQ for short, is described by Microsoft as "a set of technologies based on the integration of query capabilities directly into the C# language".When used right, it can result in some very powerful and readable code, delivering value and solutions faster.

LINQ is an important part of the .NET ecosystem, regularly receiving large improvements.Performance was a major focus for LINQ in the .NET 7 release which Nick Chapsas summarised:

Video: Nick Chapsas summarised the upgrades included in .NET 7 (11 min)

Here is a series of Rules on how to get the most out of LINQ.

  1. Do you avoid materializing an IEnumerable?

    This is the golden rule of LINQ. You generally have no idea how big the IEnumerable might be, so where possible just iterate through, don't force it into an array or list before doing that.

    The primary reason for this is that your input stream might not fit into RAM, and you will cause unnecessary object creation and garbage collection which will also consume a lot of CPU on top of eating memory for breakfast.

    foreach(var product in products.ToList())
    {
    	// Do something
    } 

    Figure: Bad example - This creates a list with all of the products in RAM before iterating over them. This can very easily cause an OutOfMemoryException.

    foreach(var product in products)
    {
        	// Do something
    }

    Figure: Good example - Doesn't force the data to be read into memory before iterating. This will behave nicely even for an infinite enumerator.

    Don't materialize an IEnumerable, just iterate it. ie don't ToList or ToArray it until it's been filtered. Do not assume that the input stream fits in RAM.

  2. Do you avoid iterating multiple times?

    Due to LINQ expressions being lazy executed, it is a important to avoid re-evaluating the expression. For LINQ extension methods, this is critically important.

    There are 2 problems with multiple evaluations:

    • It is needlessly expensive
    • It may not be possible

    Some IEnumerables may be tied to something like a Stream that doesn't support seeking. In this case, the enumeration can only occur once. This is true of the web request Content stream. That would mean that you simply can't enumerate it again as to start the enumeration again would require seeking back to the start.

    public IEnumerable<Product> UpdateStockLevels(IEnumerable<Product> products)
    {
        if (products.Any())
        {
            ... IfAnyItemsExist()
        }
    
        foreach (var product in products)
        {
            ... OnEachItem()
        }
    }

    Figure: Bad example - Calls any which enumerates the first item and then foreach which forces a second evaluation

    public IEnumerable<Product> UpdateStockLevels(IEnumerable<Product> products)
    {
        var first = true;    
        foreach (var product in products)
        {
            if (first)
            {
                ... IfAnyItemsExist()
            }
            ... OnEachItem()
            first = false;
        }
    }

    Figure: Good example - Only enumerates once

    The worst part about this is that you may test it against an array, or some other input and it may seem fine. This is especially true with unit testing, as typically an incoming stream of objects is simulated by just providing a fixed array or list, but these will support re enumeration where the real input in your application may not.

  3. Do you use LINQ as a query language?

    LINQ won't execute any of the subcalls until it needs to.

    It simply builds an expression tree.This can result in the side effects described below if you break the 'don't modify anything' rule.

    One of the core tenets of LINQ is that it is a query language designed for retrieving data, not updating it.

    There are a few rules around what not to do:

    • Project data in a Select, don't modify it
    • ForEach is not a LINQ method (do not use it in a LINQ query)
    • Remember LINQ lazy executes, so if you don't force the evaluation, it won't happen.

    So in the example below we have a list of products. I want to increase StockOnHand of product by 5 (if it's oversold this may still be less than 0)

    List<Product> products = ...
    var outOfStock = products
    	.Select(x => {x.StockOnHand += 5; return x;})
        .Where(x => x.StockOnHand <= 0);
    
    var inStockCount = products.Count(x => x.StockOnHand > 0);
    var outOfStockCount = outOfStock.Count();

    Bad example : StockOnHand does not get updated before inStockCount is calculated, it only gets evaluated when the enumerator for outOfStock is enumerated. This is unexpected from a quick glance at the code.

    The above example modified data in a select meaning subsequent calls didn't behave as expected.

    List<Product> products = ...
    foreach(var product in products)
    {
        product.StockOnHand += 5;
    }
    
    inStockCount = products.Count(x => x.StockOnHand > 0);
    outOfStockCount = products.Count(x => x.StockOnHand <= 0);

    Good Example : StockOnHand is updated for each item in products before being used in any LINQ queries.

    It's important to be aware of this to avoid unexpected side effects like this.

  4. Do you use the filtering parameter in LINQ methods?

    Many LINQ methods like Count, First and so on include an optional filter parameter. It's normally much more readable to use this than add an extra call to Where

    .Where(x => x < 5).Count()
    .Where(x => x < 5).FirstOrDefault()

    Figure: Bad example - More code that requires extra thought to understand.

    .Count(x => x < 5)
    .FirstOrDefault(x => x < 5)

    Figure: Good example - Shorter and easier to read.

  5. Do you know how LINQ has evolved?

    How LINQ has evolved:

    In .NET 1.1 - ArrayLists

    (System.Collections)

    ArrayList - Implements the IList interface using an array whose size is dynamically increased as required.

    Example:

    ArrayList greeks = new ArrayList();
        greeks.Add("Alexopoulos");
        greeks.Add("Gianopoulos");
        greeks.Add("Michaelides");
     
        //and
        ArrayList names = new ArrayList();
        foreach(string g in greeks)
        {
            if(g.IndexOf(“opoulos”) > -1)
            {
                names.Add(g);
            }
        }

    In .NET 2.0 -Generic Lists - enforces type, more OO, reduce code if different types

    (System.Collections.Generic)

    List <T>: IList - The List class is the generic equivalent of the ArrayList class. It implements the IList generic interface using an array whose size is dynamically increased as required.

    Example:

    List<string> greeks = new List<string>();
    greeks.Add("Alexopoulos");
    greeks.Add("Gianopoulos");
    greeks.Add("Michaelides");
     
    //and
    List<string> names = new List<string>();
    foreach(string g in greeks)
    {
        if(g.Contains(“opoulos”))
        {
            names.Add(g);
        }
    }  

    In .NET 3.5 - nicer to query

    (System.Linq)

    IQueryable<out T> : IEnumerable<T>, 
             IQueryable, IEnumerable

    The IQueryable<T> interface is intended for implementation by query providers. This interface inherits the IEnumerable<T> interface so that if it represents a query, the results of that query can be enumerated. Enumeration forces the expression tree associated with an IQueryable object to be executed.

    List<string> greeks = new List<string>();
    greeks.Add("Alexopoulos");
    greeks.Add("Gianopoulos");
    greeks.Add("Michaelides");
     
    //and
    IEnumerable<string> opoulos = greeks.Where(x => x.Contains(“opoulos”));

    In .NET 4.0 (thread safe)

    (System.Collections.Concurrent)

    (The System.Collections.Concurrent namespace provides several thread-safe collection classes that should be used in place of the corresponding types in the System.Collections and System.Collections.Generic namespaces whenever multiple threads are accessing the collection concurrently.)

    public class ConcurrentBag<T> : IProducerConsumerCollection<T>, 
             IEnumerable<T>, ICollection, IEnumerable
     
    Represents a thread-safe, unordered collection of objects.
     
        // Demonstrates: 
        //      ConcurrentBag<T>.Add() 
        //      ConcurrentBag<T>.IsEmpty 
        //      ConcurrentBag<T>.TryTake() 
        //      ConcurrentBag<T>.TryPeek() 
        static void Main()
        {
            // Construct and populate the ConcurrentBag
            ConcurrentBag<string> cb = new ConcurrentBag<string>();
            cb.Add("Alexopoulos");
            cb.Add("Gianopoulos");
            cb.Add("Michaelides");
     
            // Consume the items in the bag 
            int item;
            while (!cb.IsEmpty)
            {
                if (cb.TryTake(out item))
                    Console.WriteLine(item);
                else
                    Console.WriteLine("TryTake failed for non-empty bag");
            }
     
            // Bag should be empty at this point 
            if (cb.TryPeek(out item))
                Console.WriteLine("TryPeek succeeded for empty bag!");
        }

    In .NET 4.5 (casting backwards - read only)

    (System.Collections.Generic)

    public class List : IList, ICollection, 
             IList, ICollection, IReadOnlyList, IReadOnlyCollection, IEnumerable, 
             IEnumerable

    The Microsoft .NET Framework 4.5 includes the IReadOnlyList, IReadOnlyDictionary and IReadOnlyCollection generic interfaces. The main benefit is that the new interfaces are covariant, except for IReadOnlyDictionary. This means that you can use a derived type as the generic parameter, when passing in a collection into a method that's defined for a base type. If you have a Dog class, for example, that derives from Animal, you can have a method that accepts an IReadOnlyList<Animal> and pass it an IReadOnlyList<Dog>.

    public class Greek : Person
    {
    ..
    }
     
    List greeks = new List()
    {
        new Greek() { LastName  = "Alexopoulos" },
        new Greek () { LastName = "Gianopoulos" },
        new Greek () { LastName  = "Michaelides" },
    };
    // IReadOnlyList supports covariance
    IReadOnlyList<Person> people = greeks;
    Person first = people[0];
  6. Do you know how to get the best performance out of LINQ?

    LINQ is a super powerful toolkit, letting you get your business logic implemented in .NET using a standard set of operations.While it may be tempting to use it in every scenario, it's important to remember to profile and benchmark any code that is time sensitive for your application.

    Getting started with profiling

    Once you've identified that you've got a performance issue in your application, rather than assuming what the problem is, it's time to profile.Profiling your application will allow for method level insights into what's being called, how often, and how long each call takes.

    Armed with your profiling results, you can now identify any hot spots in the application.Effectively eliminating these is made far easier by benchmarking alternatives for fixing efficiency issues.

    Benchmarking your options

    Once the cause of the slowness has been identified, it should be isolated so that multiple fixes can be tried and compared through benchmarking.Much like profiling helps identify the exact cause of the slowness, rather than relying on intuition; benchmarking helps identify which fix is actually going to help the most.

    While you can set up your benchmarking manually, it's much easier when using a library like BenchmarkDotNet.

    The entire benchmarking process is showcased well by Nick Chapsas:

    Video: Stop using LINQ to order your primitive collections in C# - Nick Chapsas (14 min)

    The above video very clearly shows one of the cases where LINQ may not be the optimal solution.Here, LINQ provides very easy-to-use Order and OrderBy methods but there may be better implementations available, depending on the collection that needs sorting.

    LINQ may often be the easy solution to implement, but if you find the application needing better performance, you may need to inspect your LINQ usage and investigate if there's better and more appropriate alternatives available.

We open source. Powered by GitHub