SSW Foursquare

Rules to Better SQL Databases - Performance - 16 Rules

Here are some of the typical things that all SQL Server DBAs and Database developers should know about performance and how to identify the causes of poor performance. These should get you out of trouble the majority of the time.Also included is a list of the most common mistakes developers make that cause performance issues.

Want to develop your SQL Server Database with SSW? Check SSW's Databases consulting page.

  1. Identifying the cause of SQL Server performance problems

    Identifying CPU Pressure

    When looking at SQL Server, you often get performance issues, but how can you figure out what might be the cause?

    To figure out whether the SQL Server itself is experiencing CPU pressure, fire up Task manager and take a look at the CPU usage. If the CPU is high and SQL Server is not the primary consumer, evaluate whether you can separate the CPU consuming workload from your SQL Server by moving one onto another server.

    A good way to identify if SQL Server is experiencing CPU Pressure internally is to try the following SQL. It counts the number of SQL batches, compilations and recompilations per second averaged over a minute. Note A batch is a group of one or more SQL statements sent to the server at the same time. The batch separator, GO, is used to separate batches.

    DECLARE @BatchRequests bigint;
    DECLARE @Compilations bigint;
    DECLARE @Recompiles bigint;
    
    select @BatchRequests = [Batch Requests/sec], @Compilations = [SQL Compilations/sec], @Recompiles = [SQL Re-Compilations/sec] from
    (select cntr_value, counter_name from sys.dm_os_performance_counters
    where counter_name in ('Batch Requests/sec', 'SQL Compilations/sec' , 'SQL Re-Compilations/sec') ) as SourceTable PIVOT
    (max(cntr_value) for counter_name in ([Batch Requests/sec], [SQL Compilations/sec], [SQL Re-Compilations/sec])) as pivottable
    
    WAITFOR DELAY '00:01:00';
    
    select ([Batch Requests/sec] - @BatchRequests) / 60 as BatchesPerSec, ([SQL Compilations/sec] - @Compilations) / 60 AS CompilationsPerSec, ([SQL Re-Compilations/sec] - @Recompiles) / 60 as RecompilesPerSec from
    (select cntr_value, counter_name from sys.dm_os_performance_counters
    where counter_name in ('Batch Requests/sec', 'SQL Compilations/sec' , 'SQL Re-Compilations/sec') ) as SourceTable PIVOT
    (max(cntr_value) for counter_name in ([Batch Requests/sec], [SQL Compilations/sec], [SQL Re-Compilations/sec])) as pivottable

    The BatchesPerSec should be under 1000. Compilations should be less than 10% of the BatchesPerSec and the RecompilesPerSec should be less then 10% of the CompilationsPerSec.

    What to do about CPU pressure

    Identifying Memory Pressure

    Open Task manager, select Performance > Memory to check whether all the memory is being used.

    Use Perfmon and monitor these counters:

    • Process\Working Set - to check individual apps' memory usage.
    • Memory\Available MBytes - to check overall memory usage.

    If any of these are using all of the memory then SQL Server may be experiencing memory pressure.If the memory is not being used by SQL Server, then evaluate whether SQL Server and the other workload should be on separate servers.Otherwise: What to do about memory pressure

    Identifying IO Pressure

    Open Task Manager, select Performance > Disk (*) to check for disks being pushed to their limits.

    Monitor the following using Perfmon:

    • LogicalDisk\Disk Bytes/sec
    • LogicalDisk\Avg. Disk sec/Transfer

    If any disks are being pushed to their limits, you need to evaluate whether SQL Server is performing the IO. If it is not, then the easiest solution is to move the application doing all the disk access on to a separate server to your SQL Server.

    If not then try What to do about IO pressure

  2. Identifying the cause of Azure SQL Database performance problems

    When looking at Azure SQL Dataabase, you often get performance issues, but how can you figure out what might be the cause?

    The first step when working with Azure SQL is to identify whether the problem is a single poorly optimised query, or whether you are reaching a limit of the server. In Azure SQL Database you choose a level of performance expressed in DTUs (Database Transaction Units). These are just a way of expressing a fixed amount of available CPU, Disk IO and memory. If you are finding that you are hitting a bandwidth limit against only one of those, you can tune your queries to use more of the other parameters while reducing the one that is being throttled. For example, you can often choose a technique that uses less CPU, but requires more memory.

    To identify where the bottleneck lies, try the following SQL query. Note the historical data is only retained for about an hour, so you do need to execute the query shortly after the issue occurs.

    SELECT * FROM sys.dm_db_resource_stats ORDER BY end_time DESC;

    The results are expressed as percentages of the maximum allowed for the service tier you are running on. So it's very easy to pick out which limit is being reached.

    There is a roughly equivalent call available for Azure Managed Instance. Try:

    SELECT * FROM sys.server_resource_stats ORDER BY end_time DESC;

    From the returned statistics you should be able to determine whether SQL Server is under CPU, IO, Network or memory pressure.

  3. What to do about SQL Server CPU Pressure?

    So you've identified that your SQL Server is under CPU pressure. What can you do about it?

    Here's a simple in-depth presentation covering things that can help reduce CPU pressure.

    1. Throw hardware at the problem
    2. Update the index statistics
    3. Identify high CPU queries
    4. Identify poor queries (reading too much data either columns or rows)
    5. Identify missing indexes

    Add more CPUs

    In many situations, the problem is poorly specifying the hardware configuration for your SQL Server. It's worth thinking about whether increasing the server specifications is the easiest solution, or whether optimising the applications are a better choice.

    Update index statistics

    This can be achieved by

    exec sp_updatestats

    Be aware that this can be expensive and as such using it all the time to avoid problems is not a great solution. But as a one off to verify whether your problem is a statistics issue, this is perfect.

    Identifying high CPU queries

    The following sql identifies high CPU queries running right now.

    SELECT TOP 10 s.session_id,
               r.status,
               r.cpu_time,
               r.logical_reads,
               r.reads,
               r.writes,
               r.total_elapsed_time / (1000 * 60) 'Elaps M',
               SUBSTRING(st.TEXT, (r.statement_start_offset / 2) + 1,
               ((CASE r.statement_end_offset
                    WHEN -1 THEN DATALENGTH(st.TEXT)
                    ELSE r.statement_end_offset
                END - r.statement_start_offset) / 2) + 1) AS statement_text,
               COALESCE(QUOTENAME(DB_NAME(st.dbid)) + N'.' + QUOTENAME(OBJECT_SCHEMA_NAME(st.objectid, st.dbid))
               + N'.' + QUOTENAME(OBJECT_NAME(st.objectid, st.dbid)), '') AS command_text,
               r.command,
               s.login_name,
               s.host_name,
               s.program_name,
               s.last_request_end_time,
               s.login_time,
               r.open_transaction_count
    FROM sys.dm_exec_sessions AS s
    JOIN sys.dm_exec_requests AS r ON r.session_id = s.session_id CROSS APPLY sys.Dm_exec_sql_text(r.sql_handle) AS st
    WHERE r.session_id != @@SPID
    ORDER BY r.cpu_time DESC

    If queries aren't driving the CPU currently, try the following query.

    SELECT TOP 10 st.text AS batch_text,
        SUBSTRING(st.TEXT, (qs.statement_start_offset / 2) + 1, ((CASE qs.statement_end_offset WHEN - 1 THEN DATALENGTH(st.TEXT) ELSE qs.statement_end_offset END - qs.statement_start_offset) / 2) + 1) AS statement_text,
        (qs.total_worker_time / 1000) / qs.execution_count AS avg_cpu_time_ms,
        (qs.total_elapsed_time / 1000) / qs.execution_count AS avg_elapsed_time_ms,
        (qs.total_logical_reads / qs.execution_count) AS avg_logical_reads,
        (qs.total_worker_time / 1000) AS cumulative_cpu_time_all_executions_ms,
        (qs.total_elapsed_time / 1000) AS cumulative_elapsed_time_all_executions_ms
    FROM sys.dm_exec_query_stats qs
    CROSS APPLY sys.dm_exec_sql_text(sql_handle) st
    ORDER BY (qs.total_worker_time / qs.execution_count) DESC

    Identify missing indexes

    Indexes can dramatically improve query performance. SQL Server has inbuilt mechanisms to try and identify indexes that would aid a particular query. Running the following SQL identifies the 50 queries consuming the most CPU where SQL Server has identified that there is potentially a missing index.

    SELECT
        qs_cpu.total_worker_time / 1000 AS total_cpu_time_ms,
        q.[text],
        p.query_plan,
        qs_cpu.execution_count,
        q.dbid,
        q.objectid,
        q.encrypted AS text_encrypted
    FROM
        (SELECT TOP 50 qs.plan_handle,
         qs.total_worker_time,
         qs.execution_count FROM sys.dm_exec_query_stats qs ORDER BY qs.total_worker_time DESC) AS qs_cpu
    CROSS APPLY sys.dm_exec_sql_text(plan_handle) AS q
    CROSS APPLY sys.dm_exec_query_plan(plan_handle) p
    WHERE p.query_plan.exist('declare namespace
            qplan = "http://schemas.microsoft.com/sqlserver/2004/07/showplan";
            //qplan:MissingIndexes')=1

    This is super useful for giving suggestions. Otherwise, you may need to manually identify potential indexes and test those.

    Identifying parameter-sensitive problems

    Try running

    DBCC FREEPROCCACHE

    This will empty the plan cache. If this resolves the issue, then it's probably a parameter-sensitive problem.

    Note DBCC is an acronym for Database Console Command and identifies things that do not denote structured queries.

  4. What to do about SQL Server Memory Pressure?

    So you've identified that your SQL Server is under memory pressure. What can you do about it?

    If SQL Server is the primary consumer, then read up on reducing SQL Server memory usage, or whether more memory is appropriate for your workload.

    Run

    select * from sys.dm_os_memory_clerks where type='MEMORYCLERK_HOST'

    This isolates a few of the SQL processes that aren't part of the SQL Server engine.Look for high memory usage for OLE DB providers (MSOLEDBSQL), SQL Native Client (SQLNCLI*) and so on.This may indicate using some non core features and you should evaluate whether these are necessary. Non core features are things like running .Net CLR code, translating queries to things like OLE DB and other things that aren't strictly database operations.

    Try running the following query. It categorises the various memory allocations SQL Server has made.

    SELECT pages_kb, type, name, virtual_memory_committed_kb, awe_allocated_kb
    FROM sys.dm_os_memory_clerks
    ORDER BY pages_kb DESC

    This should allow you to identify what is consuming the most memory.

    MEMORYCLERK_SQLQERESERVATIONS

    If the memory clerk MEMORYCLERK_SQLQERESERVATIONS is consuming memory, identify queries that are using huge memory grants and optimize them via indexes, rewrite them (remove ORDER by, for example).For more information

    OBJECTSTORELOCKMANAGER

    The most common example is OBJECTSTORELOCKMANAGER consuming large amounts of memory. This is indicative of a large number of locks being obtained by the server. Often this is due to poor indexing meaning there are locks on far more objects than are required. Another option is shortening the length of any transactions.

    CACHESTORE_SQLCP

    This indicates a large number of ad-hoc query plans are cached. Identify non-parameterized queries whose query plans can't be reused and parameterize them by converting to stored procedures, using sp_executesql, or by using FORCED parameterization. If you have enabled trace flag 174, you may disable it to see if this resolves the problem.

    You can use the sys.dmexeccached_plans dynamic management view to identify non-parameterized queries. This view returns a row for each query plan that is cached by SQL Server. You can filter the results to show only non-parameterized queries by checking the usecounts column value. If the usecounts column value is 1, the query is non-parameterized 1. Alternatively look for the objtype column containing "Adhoc".

    Once you have identified non-parameterized queries whose query plans can’t be reused, you can parameterize them by converting them to use parameterized SQL, use stored procedures, use sp_executesql or use forced parameterization.

    CACHESTORE_OBJCP

    If the object plan cache store CACHESTORE_OBJCP is consuming too much memory, identify which stored procedures, functions, or triggers are using large amounts of memory and possibly redesign the application to eliminate the majority of them. Commonly, this may happen due to large amounts of databases or database schemas with hundreds of procedures, functions or triggers in them.

    Release memory inside SQL Server

    You can run one or more of the following DBCC commands to free several SQL Server memory caches:

    DBCC FREESYSTEMCACHE
    DBCC FREESESSIONCACHE
    DBCC FREEPROCCACHE

    Restart SQL Server service

    In some cases, if you need to deal with critical exhaustion of memory and SQL Server isn't able to process queries, you can consider restarting the service.

    Add more RAM on the physical or virtual server

    If the problem continues, you need to investigate further and possibly increase server resources (RAM).

  5. What to do about SQL Server IO Pressure?

    So you've identified that your SQL Server is under IO pressure. What can you do about it?

    If disk is being used by SQL Server, try the following:

    1. Identify any high IO queries and optimize
    2. Check whether more memory might allow caching to dramatically reduce disk IO
    3. Identify whether database files under high load can be on separate disks, maybe splitting each file up into several each on a separate disk.

    Identify any high IO queries and optimize

    Use the Query Store view Top Resource Consuming Queries in SSMS. Look for high values of Logical Reads, Logical Writes and Physical Reads. These indicate IO intensive queries.

    Here's a simple in-depth presentation on techniques to optimize SQL queries and reduce the IO required.

    Try adding more memory

    This solution is often a quick and easy solution that may be less expensive than more extensive optimization.If the cost is not overly high, it often provides a better return on investment than spending large amounts of effort on optimization.

    Identify the database files under pressure

    Use the following query from the Microsoft Learn article Troubleshoot slow SQL Server performance caused by I/O issues to identify which database files are under pressure.

       SELECT   LEFT(mf.physical_name,100),
             ReadLatency = CASE WHEN num_of_reads = 0 THEN 0 ELSE (io_stall_read_ms / num_of_reads) END,
             WriteLatency = CASE WHEN num_of_writes = 0 THEN 0 ELSE (io_stall_write_ms / num_of_writes) END,
             AvgLatency =  CASE WHEN (num_of_reads = 0 AND num_of_writes = 0) THEN 0
                            ELSE (io_stall / (num_of_reads + num_of_writes)) END,
             LatencyAssessment = CASE WHEN (num_of_reads = 0 AND num_of_writes = 0) THEN 'No data' ELSE
                   CASE WHEN (io_stall / (num_of_reads + num_of_writes)) < 2 THEN 'Excellent'
                        WHEN (io_stall / (num_of_reads + num_of_writes)) BETWEEN 2 AND 5 THEN 'Very good'
                        WHEN (io_stall / (num_of_reads + num_of_writes)) BETWEEN 6 AND 15 THEN 'Good'
                        WHEN (io_stall / (num_of_reads + num_of_writes)) BETWEEN 16 AND 100 THEN 'Poor'
                        WHEN (io_stall / (num_of_reads + num_of_writes)) BETWEEN 100 AND 500 THEN  'Bad'
                        ELSE 'Deplorable' END  END,
             [Avg KBs/Transfer] =  CASE WHEN (num_of_reads = 0 AND num_of_writes = 0) THEN 0
                        ELSE ((([num_of_bytes_read] + [num_of_bytes_written]) / (num_of_reads + num_of_writes)) / 1024) END,
             LEFT (mf.physical_name, 2) AS Volume,
             LEFT(DB_NAME (vfs.database_id),32) AS [Database Name]
           FROM sys.dm_io_virtual_file_stats (NULL,NULL) AS vfs
           JOIN sys.master_files AS mf ON vfs.database_id = mf.database_id
             AND vfs.file_id = mf.file_id
           ORDER BY AvgLatency DESC

    This then gives somewhere to investigate.

    You can find out what types of IO waits are occuring in SQL Server with the following query.

    SELECT r.session_id, r.wait_type, r.wait_time as wait_time_ms
                                           FROM sys.dm_exec_requests r JOIN sys.dm_exec_sessions s
                                            ON r.session_id = s.session_id
                                           WHERE wait_type in (
                                            'PAGEIOLATCH_EX',
                                            'PAGEIOLATCH_SH',
                                            'PAGEIOLATCH_UP',
                                            'WRITELOG',
                                            'ASYNC_IO_COMPLETION',
                                            'IO_COMPLETION',
                                            'BACKUPIO')
                                           AND is_user_process = 1

    If these waits exceed 10-15 milliseconds consistently, I/O is considered a bottleneck.

    PAGEIOLATCH_EX

    Occurs when a task is waiting on a latch for a data or index page (buffer) in an I/O request. The latch request is in the Exclusive mode. An Exclusive mode is used when the buffer is being written to disk. Long waits may indicate problems with the disk subsystem.

    PAGEIOLATCH_SH

    Occurs when a task is waiting on a latch for a data or index page (buffer) in an I/O request. The latch request is in the Shared mode. The Shared mode is used when the buffer is being read from the disk. Long waits may indicate problems with the disk subsystem.

    PAGEIOLATCH_UP

    Occurs when a task is waiting on a latch for a buffer in an I/O request. The latch request is in the Update mode. Long waits may indicate problems with the disk subsystem.

    WRITELOG

    Occurs when a task is waiting for a transaction log flush to complete. A flush occurs when the Log Manager writes its temporary contents to disk. Common operations that cause log flushes are transaction commits and checkpoints.

    Common reasons for long waits on WRITELOG are:

    Transaction log disk latency

    This is the most common cause of WRITELOG waits. Generally, the recommendation is to keep the data and log files on separate volumes. Transaction log writes are sequential writes while reading or writing data from a data file is random. Mixing data and log files on one drive volume (especially conventional spinning disk drives) will cause excessive disk head movement.

    Too many VLFs

    Too many virtual log files (VLFs) can cause WRITELOG waits. Too many VLFs can cause other types of issues, such as long recovery.

    Too many small transactions

    While large transactions can lead to blocking, too many small transactions can lead to another set of issues. If you don't explicitly begin a transaction, any insert, delete, or update will result in a transaction (we call this auto transaction). If you do 1,000 inserts in a loop, there will be 1,000 transactions generated. Each transaction in this example needs to commit, which results in a transaction log flush and 1,000 transaction flushes. When possible, group individual update, delete, or insert into a bigger transaction to reduce transaction log flushes and increase performance. This operation can lead to fewer WRITELOG waits.

    Scheduling issues cause Log Writer threads to not get scheduled fast enough

    Prior to SQL Server 2016, a single Log Writer thread performed all log writes. If there were issues with thread scheduling (for example, high CPU), both the Log Writer thread and log flushes could get delayed. In SQL Server 2016, up to four Log Writer threads were added to increase the log-writing throughput. See SQL 2016 - It Just Runs Faster: Multiple Log Writer Workers. In SQL Server 2019, up to eight Log Writer threads were added, which improves throughput even more. Also, in SQL Server 2019, each regular worker thread can do log writes directly instead of posting to the Log writer thread. With these improvements, WRITELOG waits would rarely be triggered by scheduling issues.

    ASYNCIOCOMPLETION

    Occurs when some of the following I/O activities happen:

    • The Bulk Insert Provider ("Insert Bulk") uses this wait type when performing I/O.
    • Reading Undo file in LogShipping and directing Async I/O for Log Shipping.
    • Reading the actual data from the data files during a data backup.

    IO_COMPLETION

    Occurs while waiting for I/O operations to complete. This wait type generally involves I/Os not related to data pages (buffers). Examples include:

    • Reading and writing of sort/hash results from/to disk during a spill (check performance of tempdb storage).
    • Reading and writing eager spools to disk (check tempdb storage).
    • Reading log blocks from the transaction log (during any operation that causes the log to be read from disk for example, recovery).
    • Reading a page from disk when database isn't set up yet.
    • Copying pages to a database snapshot (Copy-on-Write).
    • Closing database file and file uncompression.

    BACKUPIO

    Occurs when a backup task is waiting for data, or is waiting for a buffer to store data. This type isn't typical, except when a task is waiting for a tape mount.

  6. Do you keep database tables small?

    Reading data from smaller tables is much faster. How can you keep the amount of data stored down.

    There are a number of reasons to keep table sizes small.

    • Large tables take longer to read and update
    • They also take up more space in the databases buffer cache
    • They take up extra disk space

    There are many solutions to avoiding tables getting too big. If only a small subset of the data is ever queried from, then you can archive unused data out to separate tables. This solution often works for sales based systems where transactions greater than a month old may never be read from again.

    Another way to reduce table size is to look carefully at the datatypes for each of the columns in the database. For instance using nvarchar(500) for telephone number storage is overdoing it, the 500 could be reduced to somewhere around 25 saving a vast amount of storage for every row in the table. Also looking at the size and accuracy of numeric columns. Often integer percentages are stored in int64s instead of bytes and so on.

    Table partitioning is another strategy that can be used to achieve similar improvements. But it is much harder to setup and maintain and is probably best only used if there is a professional Database Administrator available to manage it.

  7. Do you only SELECT the columns you require?

    It can be expensive retrieving all the columns of a table. Find out why.

    Retrieving all the columns in a table is a common pattern.

    SELECT * FROM Users

    This is not an unusual statement to see. However statements of this type should not be employed inside an application. It's important to minimise the number of columns returned by queries because:

    • It takes more network bandwidth to transmit the results
    • It takes extra CPU at the database and the caller to encode/decode the response
    • In cases where all the columns required are available in an index, retrieving all columns can cause reading an index and then the main table to retrieve the remaining columns
  8. Do you know how to verify SQL indexes are being used?

    So you've created some indexes on your database tables. How can you tell if they are being used by your queries?

    You'd be surprised to find out how often the indexes that have been defined on a table are not being used. As such it's worth checking that your indexes are being used.

    Ways to verify

    If you know the exact query being run, you can simply run the query in SQL Server Management Studio or Azure Data Studio with "Include Actual Execution Plan" turned on. Then check the execution plan to verify what indexes were used.

    The second option is to use the query below. It retrieves the index usage stats. The WHERE clause can be adjusted to ensure you only get the indexes you are interested in.Then you can run this query, then perform the operations in your application that you think should use the index, and then rerun this query. If the index was used you should see the UserSeeks, UserScans or UserLookups increase between the 2 runs.

    SELECT 
        object_name(s.[object_id]) AS TableName,
        i.name AS IndexName,
        i.index_id AS IndexID,
        us.user_seeks AS UserSeeks,
        us.user_scans AS UserScans,
        us.user_lookups AS UserLookups,
        us.user_updates AS UserUpdates
    FROM 
        sys.dm_db_index_usage_stats us
    INNER JOIN 
        sys.indexes i ON (us.object_id = i.object_id AND us.index_id = i.index_id)
    INNER JOIN 
        sys.objects s ON us.object_id = s.object_id
    WHERE 
        s.type = 'U' -- Filter for user tables only
    --    AND i.type_desc = 'NONCLUSTERED' -- Filter for non-clustered indexes
    --    AND (us.user_seeks = 0 AND us.user_scans = 0) -- Filter for unused indexes
    ORDER BY 
        TableName, IndexName;

    Be aware the statistics in this query are reset when the server is restarted or a user explicitly resets the statistics.

  9. Do you avoid implicit data type conversions in SQL queries?

    Specifying the wrong data types in SQL queries can make the server scan your whole table. That can take ages.

    It doesn't seem right that getting the data type wrong can result in the database having to walk an entire table, but if we think about the following 2 queries.

    SELECT * FROM dbo.VotesString WHERE PostId = '9999997'
    SELECT * FROM dbo.VotesString WHERE PostId = 9999997

    Note In the VotesString table, the PostId is a string column, and there is an index on it.

    So the first query can seek straight to the correct row, and return it straight back. Almost no effort.

    But the second case where we are comparing the PostId as an integer, there are multiple integer representations of the same value. For instance '009999997' or in exponential form. Hence the server has to walk through every row of that column and try to cast the string to an integer and then perform the comparison.

    So the rule of thumb is always make sure any comparisons have the same data types.

  10. Do you avoid looping in database queries?

    It's very inefficient to loop in database queries. You should avoid it wherever possible.

    Looping in SQL causes the server to execute each iteration of the loop one after another. If you can get it to do in parallel much better and often GROUP BY or aggregation are a better choice.

    There are still times that looping is necessary in a query, but it is something to think long and hard about whether there is a better alternative.

    The most common reason for doing looping is to do aggregation of data so be aware of all the built in aggregation functions available as they often negate the need for looping.

  11. Do you know that AND is much more efficient than OR in a WHERE clause?

    To understand why this is the case it takes a little bit of thought.

    When doing an OR, the server needs to get all the rows that match the first clause in the OR statement. Then it needs to get all the rows that match and then join the 2 results together. If you think about it this is just doing a union and you can generally write a statement with an OR using a UNION instead and the performance will improve. It will however be a bit less readable.

    SELECT Id, Location, DownVotes FROM dbo.Users WHERE location = 'USA' OR DownVotes > 5;
    
    SELECT Id, Location, DownVotes FROM dbo.Users WHERE location = 'USA'
    UNION 
    SELECT Id, Location, DownVotes FROM dbo.Users WHERE DownVotes > 5

    Now if you consider an AND, it's the same as an intersection. You can find all the rows that match the first clause, and then immediately check the row matches the second clause before adding it to the results.

  12. Do you avoid large database writes?

    It can be very expensive to write large blocks of data into databases. What can you do instead?

    When data is written into a database table, any records being modified have to be locked. The more records being modified the more locks that are reuired. If there are enough locks going on, the server may choose to escalate the row locks into page or even table locks. This means that other queries are prevented from touching those records/pages/tables while the write query transaction is not complete.

    Most databases have custom Bulk Update libraries which optimise these operations as much as possible, so using these will alleviate these issues somewhat.

    The other thing to be aware of is that any indexes or foreign keys that contain the columns being updated by the writes will cause extra locking. This is one of the reasons that indexing is a balancing act between creating too few and too many indexes. Often you want a small number of indexes that improve query performance enough, rather than providing perfect coverage for all expected queries.

    This one is often caused by the amount of locking that goes on.Typically using Bulk Insert libraries you can avoid pain here. Be aware that the more foreign keys attached from or to your table, the worse this will get.Indexes also have significant effect here.

  13. Do you avoid avoid using wildcards at the start of a string?

    Using a wildcard at the start of a string match can cause performance problems with queries. Find out more.

    If you have a table that is indexed on a string column and then you use a query like:

    SELECT * FROM Users WHERE Name LIKE '% Oliver'

    Then have a think about whether the index can be used to efficiently to find the result.Unfortunately not, because the index is based on the first letter, then the second letter and so on of the column value. But in this case that's the bit that can be anything.

    But there is a simple solution if this is a pattern that your application requires regularly. Basically create a new column (for instance ReverseName) and reverse the text in that column. Then you can write a query like

    SELECT * FROM Users WHERE ReverseName LIKE Reverse('% Oliver')

    As long as there is an index on the ReverseName column then suddenly this query can use the index to do a seek, dramatically reducing the amount of IO required by the query.

  14. Do you use JOIN instead of WHERE?

    It's better to join tables together using a JOIN clause instead of a WHERE clause.

    There are several good reasons to use JOINs rather than WHERE clauses to join tables.The first is that it keeps the joining of tables separate from the filtering. So it means when you see a WHERE clause you know it is being used to further filter data.

    The second reason is that modern SQL optimisation engines optimise JOINs better than WHERE clauses. So your query will generally run faster with a JOIN.

    Finally it's easier to express the various types of joins better with a JOIN as you can state INNER, OUTER and so on depending on how you'd like to join. This achieves the equivalent of intersections, unions and so on very clearly.

  15. Do you avoid joining too many tables?

    Joining too many tables in a single query can cause issues. Find out why.

    When a query is sent to a database server, the server will then try and optimise that query into the best set of steps that will retrieve the required results.However the optimisers begin to struggle somewhere between around 4-7 tables being joined in a single query. When they struggle they tend to abandon optimisation and just brute force without using any of the available indexes.

  16. Do you use TOP for sampling in your queries?

    Top is an efficient way of identifying what a tables structure and data look like.

    Retrieving all the data in a table is a common pattern especially when someone is looking into a production incident. If the table in question is a large table, grabbing all the rows back may make the problem worse.

    SELECT * FROM Users

    Using TOP to reduce the number of rows being returned makes this a much less expensive operation.

    SELECT TOP 20 FROM Users

    TOP works with filtering and joining, so you can use it to get a sample of only the rows you are interested in.

We open source. Powered by GitHub