Secret ingredients to quality software

SSW Foursquare

Rules to Better LINQ - 5 Rules

  1. Do you avoid materializing an IEnumerable?

    This is the golden rule of LINQ. You generally have no idea how big the IEnumerable might be, so where possible just iterate through, don't force it into an array or list before doing that.

    The primary reason for this is that your input stream might not fit into RAM, and you will cause unnecessary object creation and garbage collection which will also consume a lot of CPU on top of eating memory for breakfast.

    foreach(var product in products.ToList())
    {
    	// Do something
    } 

    Figure: Bad example - This creates a list with all of the products in RAM before iterating over them. This can very easily cause an OutOfMemoryException.

    foreach(var product in products)
    {
        	// Do something
    }

    Figure: Good example - Doesn't force the data to be read into memory before iterating. This will behave nicely even for an infinite enumerator.

    Don't materialize an IEnumerable, just iterate it. ie don't ToList or ToArray it until it's been filtered. Do not assume that the input stream fits in RAM.

  2. Do you avoid iterating multiple times?

    Due to LINQ expressions being lazy executed, it is a important to avoid re-evaluating the expression. For LINQ extension methods, this is critically important.

    There are 2 problems with multiple evaluations:

    • It is needlessly expensive
    • It may not be possible

    Some IEnumerables may be tied to something like a Stream that doesn't support seeking. In this case, the enumeration can only occur once. This is true of the web request Content stream. That would mean that you simply can't enumerate it again as to start the enumeration again would require seeking back to the start.

    public IEnumerable<Product> UpdateStockLevels(IEnumerable<Product> products)
    {
        if (products.Any())
        {
            ... IfAnyItemsExist()
        }
    
        foreach (var product in products)
        {
            ... OnEachItem()
        }
    }

    Figure: Bad example - Calls any which enumerates the first item and then foreach which forces a second evaluation

    public IEnumerable<Product> UpdateStockLevels(IEnumerable<Product> products)
    {
        var first = true;    
        foreach (var product in products)
        {
            if (first)
            {
                ... IfAnyItemsExist()
            }
            ... OnEachItem()
            first = false;
        }
    }

    Figure: Good example - Only enumerates once

    The worst part about this is that you may test it against an array, or some other input and it may seem fine. This is especially true with unit testing, as typically an incoming stream of objects is simulated by just providing a fixed array or list, but these will support re enumeration where the real input in your application may not.

  3. Do you use LINQ as a query language?

    LINQ won't execute any of the subcalls until it needs too. It simply builds an expression tree. This can result in the side effects described below if you break the 'don't modify anything' rule.

    One of the core tenets of LINQ is that it is a query language designed for retrieving data, not updating it.

    There are a few rules around what not to do:

    • Project data in a Select, don't modify it
    • ForEach is not a LINQ method (do not use it in a LINQ query)
    • Remember LINQ lazy executes, so if you don't force the evaluation, it won't happen.

    So in the example below we have a list of products. I want to increase StockOnHand of product by 5 (if it's oversold this may still be less than 0)

    List<Product> products = ...
    var outOfStock = products
    	.Select(x => {x.StockOnHand += 5; return x;})
        .Where(x => x.StockOnHand <= 0);
    
    var inStockCount = products.Count(x => x.StockOnHand > 0);
    var outOfStockCount = outOfStock.Count();

    Bad example : StockOnHand does not get updated before the foreach, it only gets evaluated when the enumerator for outOfStock is enumerated. This is unexpected from a quick glance at the code.

    The above example modified data in a select meaning subsequent calls didn't behave as expected.

    List<Product> products = ...
    foreach(var product in products)
    {
        product.StockOnHand += 5;
    }
    
    inStockCount = products.Count(x => x.StockOnHand > 0);
    outOfStockCount = products.Count(x => x.StockOnHand <= 0);

    Good Example : StockOnHand is updated for each item in products before being used in any LINQ queries.

    It's important to be aware of this to avoid unexpected side effects like this.

  4. Many LINQ methods like Count, First and so on include an optional filter parameter. It's normally much more readable to use this than add an extra call to Where

    .Where(x => x < 5).Count()
    .Where(x => x < 5).FirstOrDefault()

    Figure: Bad example - More code that requires extra thought to understand.

    .Count(x => x < 5)
    .FirstOrDefault(x => x < 5)

    Figure: Good example - Shorter and easier to read.

  5. Do you know how LINQ has evolved?

    How LINQ has evolved:

    In .NET 1.1 - ArrayLists

    (System.Collections)

    ArrayList - Implements the IList interface using an array whose size is dynamically increased as required.

    Example:

    ArrayList greeks = new ArrayList();
        greeks.Add("Alexopoulos");
        greeks.Add("Gianopoulos");
        greeks.Add("Michaelides");
     
        //and
        ArrayList names = new ArrayList();
        foreach(string g in greeks)
        {
            if(g.IndexOf(“opoulos”) > -1)
            {
                names.Add(g);
            }
        }

    In .NET 2.0 -Generic Lists - enforces type, more OO, reduce code if different types

    (System.Collections.Generic)

    List <T>: IList - The List class is the generic equivalent of the ArrayList class. It implements the IList generic interface using an array whose size is dynamically increased as required.

    Example:

    List<string> greeks = new List<string>();
    greeks.Add("Alexopoulos");
    greeks.Add("Gianopoulos");
    greeks.Add("Michaelides");
     
    //and
    List<string> names = new List<string>();
    foreach(string g in greeks)
    {
        if(g.Contains(“opoulos”))
        {
            names.Add(g);
        }
    }  

    In .NET 3.5 - nicer to query

    (System.Linq)

    IQueryable<out T> : IEnumerable<T>, 
             IQueryable, IEnumerable

    The IQueryable interface is intended for implementation by query providers. This interface inherits the IEnumerable interface so that if it represents a query, the results of that query can be enumerated. Enumeration forces the expression tree associated with an IQueryable object to be executed.

    List<string> greeks = new List<string>();
    greeks.Add("Alexopoulos");
    greeks.Add("Gianopoulos");
    greeks.Add("Michaelides");
     
    //and
    IEnumerable<string> opoulos = greeks.Where(x => x.Contains(“opoulos”));

    In .NET 4.0 (thread safe)

    (System.Collections.Concurrent)

    (The System.Collections.Concurrent namespace provides several thread-safe collection classes that should be used in place of the corresponding types in the System.Collectionsand System.Collections.Generic namespaces whenever multiple threads are accessing the collection concurrently.)

    public class ConcurrentBag<T> : IProducerConsumerCollection<T>, 
             IEnumerable<T>, ICollection, IEnumerable
     
    Represents a thread-safe, unordered collection of objects.
     
        // Demonstrates: 
        //      ConcurrentBag<T>.Add() 
        //      ConcurrentBag<T>.IsEmpty 
        //      ConcurrentBag<T>.TryTake() 
        //      ConcurrentBag<T>.TryPeek() 
        static void Main()
        {
            // Construct and populate the ConcurrentBag
            ConcurrentBag<string> cb = new ConcurrentBag<string>();
            cb.Add("Alexopoulos");
            cb.Add("Gianopoulos");
            cb.Add("Michaelides");
     
            // Consume the items in the bag 
            int item;
            while (!cb.IsEmpty)
            {
                if (cb.TryTake(out item))
                    Console.WriteLine(item);
                else
                    Console.WriteLine("TryTake failed for non-empty bag");
            }
     
            // Bag should be empty at this point 
            if (cb.TryPeek(out item))
                Console.WriteLine("TryPeek succeeded for empty bag!");
        }

    In .NET 4.5 (casting backwards - read only)

    (System.Collections.Generic)

    public class List : IList, ICollection, 
             IList, ICollection, IReadOnlyList, IReadOnlyCollection, IEnumerable, 
             IEnumerable

    The Microsoft .NET Framework 4.5 includes the IReadOnlyList, IReadOnlyDictionary and IReadOnlyCollection generic interfaces. The main benefit is that the new interfaces are covariant, except for IReadOnlyDictionary. This means that you can use a derived type as the generic parameter, when passing in a collection into a method that's defined for a base type. If you have a Dog class, for example, that derives from Animal, you can have a method that accepts an IReadOnlyList and pass it an IReadOnlyList.

    public class Greek : Person
    {
    ..
    }
     
    List greeks = new List()
    {
        new Greek() { LastName  = "Alexopoulos" },
        new Greek () { LastName = "Gianopoulos" },
        new Greek () { LastName  = "Michaelides" },
    };
    // IReadOnlyList supports covariance
    IReadOnlyList<Person> people = greeks;
    Person first = people[0];
We open source. Powered by GitHub