Secret ingredients to quality software

SSW Foursquare

Rules to Better Entity Framework - 15 Rules

  1. Do you know why to use Entity Framework?

    Entity Framework allows you to provide a strongly typed object framework (ORM) over your database. This helps abstract the database away allowing it to be replaced with other technologies as needed.

    using (SqlConnection conn = new SqlConnection())
    {
        conn.ConnectionString = "Data Source=(local);Initial Catalog=Northwind;Integrated Security=True";
        conn.Open();
      
        SqlCommand cmd = conn.CreateCommand();
        cmd.CommandText = "SELECT * FROM Customers WHERE CompanyName LIKE '" + companyNameTextbox.Text + "%'";
      
        bindingSource1.DataSource = cmd.ExecuteReader();
    }

    Figure: Bad example - using ADO.NET and not strongly typed

    var results = dbContext.Customers
        .Where(c => c.CompanyName.StartsWith(companyNameTextbox.Text));
    customersBindingSource.DataSource = results;
      
    // Or even
    var results = dbContext.Customers
        .Where(c => c.CompanyName.StartsWith(companyNameTextbox.Text))
        .Select(c => new
        {
            c.CompanyName,
            c.Phone
        });
    customersBindingSource.DataSource = results;

    Figure: Good example - at least 4 good reasons below

    1. Making queries that are independent from specific Database engine
    2. Strongly typed fields - SQL tables/entities have intellisense
    3. More readable and less code
    4. It's easy to chain more operation like OrderBy, GroupBy, FirstOrDefault, Count, Any and many more
    5. Developers are generally poor at writing SQL, so using code constructs makes writing queries much easier
  2. Do you only get the rows you need?

    It's expensive retrieving data from a database, as such it's important to only ask for the rows you require when getting data.

    Entity Framework nicely translates the various filters like the Where method into SQL WHERE clauses. This makes it really easy to write nice readable code that is also very efficient.

    List<Sale> sales1 = context.Sales.ToList();
    foreach(var sale in sales1)
    {
        if(sale.ProductId == request.ProductId)
        {
            // Do stuff
        }
    }

    Bad example - Retrieved all the data instead of items that matched the product id.

    List<Sale> sales1 = context
          .Sales
          .Where(sale => sale.ProductId == Request.ProductId)
          .ToList();
          
    foreach(var sale in sales1)
    {
        // Do stuff
    }

    Good example - Only the data required was retrieved from the database

  3. When you cast IQueryable to IEnumerable and then query the data from there, Entity Framework must collect the data at the point you do the cast. This can result in very significant extra database load, and extra processing on the client side.

    NOTE: Using .AsEnumerable() achieves the same effect.

    Counting

    // All examples below will result in a SQL query similar to:
    // SELECT * FROM Sales
    
    // The ToList generates a list of all records client side and then counts them.
    int count1 = context.Sales
        .ToList()
        .Count();
    
    // This implicitly treats the sales as an enumerable and enumerates all the items to count them.
    IEnumerable<Sale> sales = context.Sales;
    int count2 = sales.Count;	
    
    // EF Core will evaluate everything before `.AsEnumerable()` and after that line, everything is in-memory.
    int count3 = context.Sales
        .AsEnumerable()
        .Count();
        
    // This is the most common source of `IEnumerable` casting which can cause significant performance issues.
    public IEnumerable<Sale> GetSales() => context.Sales;
    
    // The code on the first glance looks alright but in fact it fetches the entire table from SQL Server
    // because it receives the query as `IEnumerable` before running `.Count()`.
    int count4 = GetSales().Count();

    Bad example - All these examples read the entire table instead of just returning the count from the database.

    // All of the examples below will result in SQL query:
    // SELECT COUNT(*) FROM Sales
    int count1 = context.Sales.Count();
    
    IQueryable<Sale> query = _context.Sales;
    int count2 = query.Count();
    
    public IQueryable<Sale> GetSales() => context.Sales;
    int count3 = GetSales().Count();

    Good example - Only the count is returned by the query

    Where

    // All of the examples below will result in a SQL query like:
    // SELECT * FROM Sales
    
    List<Sale> sales1 = context.Sales
        .AsEnumerable()
        .Where(x => x.Id == 5)
        .ToList();
    
    private IEnumerable<Sale> Sales { get { return context.Sales; } }
    List<Sale> sales2 = Sales
        .Where(x => x.Id == 5)
        .ToList();

    Bad example - The whole table is returned from the database and then discarded in code.

    // All Examples will result in a SQL query like:
    // SELECT * FROM Sales WHERE Id = 5
    
    List<Sale> sales1 = context.Sales
        .Where(x => x.Id == 5)
        .ToList();
    
    private IQueryable<Sale> Sales { get { return context.Sales; } }
    List<Sale> sales2 = Sales
        .Where(x => x.Id == 5)
        .ToList();

    Good example - Filtering is done on the database before returning data.

  4. Do you use AsNoTracking for readonly queries?

    One of EF Core's best features is the fact it tracks any changes you make to entities after you retrieve them. However this comes with a cost, if the data is only being read and the returned entities will never be modified then you can use the AsNoTracking method to inform EF Core not to bother tracking changes.

    This results in fairly significant memory and CPU improvements on the client side.

    return context.Sales.AsNoTracking().Where(x => x.Id == 5).ToList();

    Figure: Using AsNoTracking to save CPU and memory for a read only query

  5. Do you only project properties you need?

    When retrieving data it's much more efficient to only collect the data you need. It saves computation and IO on the database and also saves memory and CPU on the calling side.

    IEnumerable<string> GetProductGuids(string category)
    {
        IEnumerable<Product> products = context.Products
            .Where(x => x.Category == category)
            .ToList();
      
        return products.Select(x => x.ProductGuid);
    }

    Figure: Bad example - Retrieved the whole product record when we only needed 1 property

    IEnumerable<string> GetProductGuids(string category)
    {
        IEnumerable<string> productGuids = context.Products
            .Where(x => x.Category == category)
            .Select(x => x.ProductGuid)
            .ToList();
          
        return productGuids;
    }

    Figure: Good example - Retrieved only the required property.

  6. Do you avoid using update?

    The Update method on an entity in EF Core marks all of its fields as dirty. This will result in all the fields being written back to the database.

    Writing the entire record to the database can cause locking issues in the database server if there are foreign key relationships involving the entity being modified.

    var entity = context
        .Products
        .FirstOrDefault(item => item.ProductID == id);
            
        if (entity != null)
        {
            entity.Name = "New name";    
            context.Products.Update(entity);
                
            context.SaveChanges();
        }

    Figure: Bad example - The whole record is written back to the database.

    var entity = context
        .Products
        .FirstOrDefault(item => item.ProductID == id);
            
        if (entity != null)
        {
            entity.Name = "New name";    
            context.SaveChanges();
        }

    Figure: Good example - Only the modified fields are written back to the database.

  7. Often developers will include all the related entities in a query to help with debugging. Always remember to take these out. They cause excessive database load.

    If you need the related entities, then that is what Include is for.

    var query = _dbContext
            .Sales
            .Include(x => x.SalesPerson);

    Figure: Bad example - Retrieved the sales records and the salesperson, even though we don't intend to use the salesperson record.

    var query = _dbContext
            .Sales;

    Figure: Good example - Retrieved only the sales records themselves

  8. Do you do pagination database side?

    Pagination can be expensive if all the pages are retrieved from the database before grabbing the relevant page. It's much more efficient to get only the page number requested back from the database.

    var query = context
        .Sales
        .AsNotTracking()
        .Where(x => x.SalesPersonId == salesPersonId);
    var result = await query.ToListAsync(ct);
    int count = result.Count;
    
    result = result
        .Skip(page * pageSize)
        .Take(pageSize);
    return (count, result);

    Figure: Bad example - Reads all the data from the database, counts the records and filters down to the page

    var query = context
        .Sales
        .AsNotTracking()
        .Where(x => x.SalesPersonId == salesPersonId);
    
    int count = await query.CountAsync(ct);
    
    query = query
        .Skip(page * pageSize)
        .Take(pageSize);
    var result = await query.ToListAsync(ct);    
    return (count, result);

    Figure: Good example - Reads only the count and 1 page of data from the database

  9. Do you use TagWith?

    TagWith adds comments to the generated SQL. This makes it easier to identify queries when they run on the database.

    This is very useful when debugging issues as there are often multiple pieces of code that generate similar statements and as such it's hard to identify what is executing particular queries.

    var list = await context
        .Sales
        .TagWith("Get All Sales")
        .ToListAsync(ct);

    Figure: Code to add tagging

    -- Get All Sales
    
    select * from sales

    Figure: SQL generated by the above code

  10. To avoid embarrassing failures in Production, it is important to ensure that your development systems are as similar as possible to what's expected in Production.

    Modifying and querying database tables is very dependent on the amount of data in the table. Often developers will run their code in a database without sufficient data in the tables and therefore the queries are nice and fast. The problem is when there's millions of transactions already in the database, all the queries turn out to be far too slow.

    So it is an important part of the development process to seed your development databases with a reasonable amount of representative data.

  11. Do you benchmark?

    Benchmarking your system's performance is important. This is making sure you have information on how your system performs with a known set of data and load.

    Benchmarking allows you to then optimize code and actually know that things improved.

  12. Do you know the best benchmarking tools?

    There are plenty of good benchmarking tools for .Net solutions.

    • BenchmarkDotNet is good because it monitors memory consumption and timings.
    • Bombardier is a simple CLI load testing tool.
    • NBomber is good for automating load tests
    • RedLine13 uses AWS spot instances to provide really cheap enormous scale for load testing

    Try these out and there are more available. Which one suits will depend on your solution and what information you want.

  13. Do you bulk process in chunks?

    Databases are slow at doing bulk updates. It's generally significantly faster to break bulk processing down into manageable chunks. It also avoids other database users experiencing significant blocking issues.

    Linq include the Chunk method to make this process nice and easy.

    var productIds = context.ProductIds;
    foreach(var chunk in productIds.Chunk(10))
    {
        // Do stuff
    }
  14. Do you know when to use raw sql?

    Raw SQL comes with risks but sometimes it is the best solution.

    Using raw SQL involves taking care of SQL injection and other risks, however there are a number of situations where it may be the best solution.

    The most obvious is a SQL UPDATE statement which updates a large number of rows.

    await context.Database.ExecuteSqlInterpolatedAsync($"UPDATE Employees SET Active = {activeState}", ct);

    Good example - Updating a large number of rows quickly with SQL

  15. Do you know to use code migrations

    TODO: Byrden

            

    Bad example - Don't modify database directly

    Good example - modify entities and generate a migration

We open source. Powered by GitHub