SSW Rules to Better Business Intelligence

Home > Archive > SSW Standards > Rules > SSW Rules to Better Business Intelligence

Do you agree with them all? Are we missing some? Let us know what you think.

Rules to Better Business Intelligence

When presenting make sure you use AdventureWorks database and OLAP cube
Make a separate database for your Data Warehouse
Do you follow a naming convention for your Data Warehouse?
Do you have a DimTimeDay table in your Data Warehouse?
Do you know which Performance you care about?
Do you always use Shared Dimensions?
Always use Virtual Dimensions instead of Real Dimensions where possible
When you are using Real Dimensions, when applicable specify the Depends On Dimension property
BI - Always make sure the dimensions All Captions = All *FIXED IN SQL 2008*
Do you always use Shared Dimensions?
Do you always use spaces in names?
Always make time scales clear, if using "financial year" then make it obvious
Move your OLAP MetaData from Access to SQL Server
When designing don't compute cube aggregations
Do you hide zero values when you are doing financial report?
Do you remove decimal places from chart?
Do you use conditional formatting to visualize your data?
Can you use the super cool Microsoft Data Mining Addin?
Do you avoid defaulting data?
Do you always sort dates in Excel in descending order from right to the left?
Do you check the cube has been processed?
Do you know where to put your KPIs? (Cube or PerformancePoint Server)
Do you build a layer of views in the data warehouse that has the business rules and calculations?
Do you use logging in your ETL and SSIS packages?
Do you fix data integrity problems before it gets to the data warehouse?
Do you add a diagram for each fact table?

When presenting make sure you use AdventureWorks database and OLAP cube

For years we have been using the samples of Northwind (for SQL Server) and FoodMart (for OLAP). Well no more... SQL Reporting Services includes a new sample database of AdventureWorks2012.

Download AdventureWorksDW zip file which contains the AdventureWorks2012DW database (a backup of a flattened/de-normalised version of AdventureWorks2012) and the AdventureWorks2012 analysis server database that includes the Reseller Sales cube. (7.41 MB - .zip)

View Instructions on setting up the above resources.

Make a separate database for your Data Warehouse

You should always base your cubes off a separate data warehouse database. For larger organizations this will also reside on separate machines. This will be used as the store for all the data collected as time rolls on. Why would you do this?

The good things:

Only keep data you want
Allows you to purge old data from OLTP
Allows you to record Time Dependant info (Slowly Changing Dimensions)
Performance not hitting the OLTP

The bad things:

Redundancy Disk space
Mirror Schema changes

People often ask: Why make a new database and not just make Views/StoredProcs? The answer to this is that when using multiple databases, cleaning data gets harder and messier without the data warehouse in the middle (where a DTS package may exist for each datasource into the warehouse).

: Figure: A good naming convention for your Data Warehouse is to tack on DW to the end of the name

An Exception:

If you are not worried about performance issues and your data is clean, then you dont need to make another database
Just add the business logic using views (eg. If you have a bit field, but you want to add meaning, then you add a view with a CASE statement).

Note: In SQL Server 2005 you would add this business logic in a new layer Data Source Views within the UDM (Unified Data Model)

Do you follow a naming convention for your Data Warehouse?

Use Dim<Name> for the dimension tables. (e.g. DimAccount)

( Tip : think of Dimensions as your filters/WHERE/ GROUP BY)

Use Fact<Name> for the fact tables. (e.g. FactFinance)

( Tip : Fact tables contain the numbers/measures (or the transactions tables/COUNT()/SUM())

Figure - Bad Example - no standard naming conventions,it is unclear what some tables are used for

Figure - Good Example - We can clearly see which tables are dimensions and which are facts

Do you have a DimTimeDay table in your Data Warehouse?

Your data warehouse must have a table DimTimeDay with a record for every day. This way when you want to make reports based on time (which you inevitably will), you can view data for periods other then standard calendar periods (e.g. the fiscal quarter).

How do you create this?

Well, there are two ways that we recommend:

Using the Dimension Wizard in SQL Server 2005 (see Creating a Time Dimension in 10 Easy Steps for more information - Preferred Method).
Using BI Accelerator .

While we prefer using the Dimension Wizard in SQL Server 2005, you may find using BI Accelerator to be more customizable; however, you should be aware that BI Accelerator is an Excel sheet that creates an entire data warehouse (including DB, Cubes and DTS to populate it) and not just the Time Dimension table.

Figure: Time Dimension table after completing the Dimension Wizard in SQL Server 2005

Figure: This is what your Time Dimension will look like after using BI Accelerator.

Do you know which Performance you care about?

There is SQL Server 2008 White Paper: Analysis Services Performance Guide .

Sometimes it is a trade-off like when you set lots of aggregates which making the users querying faster, but makes the Cube slower to build, and larger.

Always make the Lowest Level of a Dimension the Key

You should always have a unique key for every element of data. This means that the lowest level of a given dimension should be the key as this will be unique and based on some unique key already.

For example:

Figure: What is wrong with this? ...

Figure: ... Well the bottom Level (called the Leaf Level) should always be the key.

Note: Remember to make the Member Keys Unique = True and Member Names Unique = True

The first reason is for performance as in the Cube Editor you can successfully run Tools Optimize Schema. Note: This improves Cube Building Performance (always) and Querying Performance (for some cases).

The second reason is a smaller cube size, but you need to apply the next rule for that..

Always use Virtual Dimensions instead of Real Dimensions where possible

Figure: What is wrong with this? ...

Well we are using Real Dimensions for "Product Color" and "Product Size", when they should be Virtual Dimensions of the Product Dimension

There are 3 steps to doing that:

1) Add the Member Properties eg. Color and Size

2) Delete the original Shared Dimension

3) Add the new Virtual Dimension using the wizard

Note: This improves Cube Building Performance (always) and Querying Performance (for most cases as the cube is physically smaller).

When you are using Real Dimensions, when applicable specify the "Depends On Dimension" property

A common property which is often overlooked is "Depends on Dimension". This will indicate to the aggregate engine that these dimensions are related, therefore more efficient aggregates will be built.

Figure: In this example the Depends on Dimension property should be set to "Product"

Note: This improves Cube Building Performance (sometimes) and Querying Performance (for most cases because you have more efficient aggregates in the Cube).
BI - Always make sure the dimensions All Captions = All - *FIXED IN SQL 2008*

When you are carrying out reporting (using Reporting Services) based on some OLAP cube you should change the "All Caption" property of each dimension to be have the value "All". By default the caption for the dimension will be "All <dimension_name>" (e.g. All Products). This default is distracting when used in reporting solutions and doesn't particularly clarify anything anyway.

Figure: Set the "All Caption" to read "All" instead of the distracting default

When you use this dimension in the parameter toolbar of Reporting Services this default value will look horrible and it should be changed to some consistent value in order to facilitate fast scanning of the parameter values.

Figure: Having set the All caption nicely, you can more easily see which products aren't using the defaults.

Read our rule on Controls - Do you include '-All-' option in your ComboBoxes?
Do you always use Shared Dimensions?

When you're building an Analysis Server Database you should make always use Shared Dimensions. This is because:
1. all of your cubes can re-use them, making future cubes faster to develop
2. they are only processed once
Figure: This is the only place you should be creating dimensions. Stay aware from Private dimensions hidden in the cube.
Do you always use spaces in names?

Always use spaces for Dimension Names, Level Names and Measures this is a business tool and you want users to see friendly names.

Always make time scales clear, if using "financial year" then make it obvious

2004

...

- Month 06

2005

- Month 07

- Month 08

- Month 09

- Month 10

Figure: What is wrong with this?

3 things:

If you are at the year level, the user doesnt know if it is financial or calendar
When you are at the month level, the user doesnt know what year they are in
The full month names should be used because there are no issues with sorting.

When you create the dimension you specify the Order By to be Key. Therefore it would look like this:

FY 2004

- Jun 2004

FY 2005

- Jul 2004

- Aug 2004

- Sep 2004

- Oct 2004

Figure: This is the type of data we want. Make names clear and clarify that you are referring to financial year.

Move your OLAP MetaData from Access to SQL Server

Analysis Server will by default use an Access database to store the meta data for each of the databases. Presumably this is because you don't necessarily need to have SQL Server on the same machine as Analysis Server. You should change this repository to point to your SQL Server because it's a more secure, robust, and allows you to backup your data with your normal SQL Server backups.

To migrate the repository of meta data:

In Enterprise Manager, create a new database to use for the meta data. Call it "OLAPMetaData".
In Analysis Manager, right click the server, select "Migrate Repository" and point it to your table in the database.

Figure: Move the MetaData repository away from the default Access Database into SQL Server

At the end you can also double check that it worked successfully by checking the Repository Connection String in the context menu shown above.

When designing don't compute cube aggregations

When prompted about whether to process the aggregations for the cube when you are designing, you should click No in order to save time. However, when deploying or testing query performance you should click Yes.

Figure: Click No while designing, click Yes when deploying or testing performance.
Do you hide zero values when you are doing financial report?

Financial report could be very complicated and there are large number of figures showing, we only want to focus on the significant figures. Zeros could be ignored, so we will hide zeros in order to make other figures standard out.

Figure: Bad Example - a lot of zeros are showing in the gird, we cannot see the significant figures.

Figure: Good Example - zeros are hidden, easy to focus on the values

To do this in Excel sheet, you can simply choose the cell and format it as "Accounting"

Figure: Excel - choose "Accounting" as the cell format
Do you remove decimal places from chart?

Graphs or charts are used for showing big picture or trend of data. Decimal places in such reports are not useful at all, only a waste of space. So you should always remove decimal places from charts.

Figure: Bad Example - decimal places on a chart is a waste of space

Figure: Good Example - decimal places are removed from the chart
Do you use conditional formatting to visualize your data?

If you are using Excel 2007, there is a very exciting new feature that allows you to visualize your data in the place. It allows you to view your data in a visual way, make it very easy to tell the difference.

Figure: Bad Example - no visualization, cannot tell the difference of data easily

Figure: Good Example - visualized data, very easy to read and understand the difference

You can simply choose "Conditional Formatting" from the tool bar in Excel 2007 to do this.

Figure: Choose "Conditional formatting - Data Bars" to enable data visualization
Can you use the super cool Microsoft Data Mining Addin?

Microsoft SQL Server 2005 Data Mining Add-ins for Microsoft Office 2007 (Data Mining Add-ins) allow you take advantage of SQL Server 2005 predictive analytics in Office Excel 2007 and Office Visio 2007.

Figure: Simply click the highlight exceptions to see data that looks abnormal

Figure: Click Forecast on this one
Do you avoid defaulting data?

In Business Intellegence, it is not preferable to default data, eg. Country=USA, There are so many web pages that do this and it it influences data - and ruins your Business Intelligence reports - you think they are all from the USA - when really they have just left the default.
Check rule Do you avoid "Data Junk"? in Rules to Better Interfaces.
Do you always sort dates in Excel in descending order from right to the left?

When you're working with spreadsheets in Excel, be sure to sort dates in descending order from right to left.

Figure: Bad Example - Dates are sorted from left to right.

Figure: Good Example - Dates are sorted from right to left.
Do you check the cube has been processed?

You'd better check the cube has been processed.

Figure: proczsValidateSSWData2005Cube_ver.1-01.
Do you know where to put your KPIs? (Cube or PerformancePoint Server)

PerformancePoint Services 2012 gives you a nice GUI to build KPIs if you are not familiar with MDX. The bad news is that these KPIs can't be used by any other cube browsers (e.g. Excel)

So the question is... Do you put the KPI in the cube (and link it in PerformancePoint Services 2012) or directly into PerformancePoint Services 2012?

Figure: Bad Example - PerformancePoint Services - Dashboard Designer - Creating a KPI - Very easy to use

The cube gives you the flexibility to write whatever KPIs you want (including trends) ?but you need to know MDX

Figure: Good Example - BIDS - Creating a KPI with MDX statements - ultimate flexibility,steep learning curve

Ultimately, having the KPIs in the cube makes more sense. Keeping them centralized, link them in PerformancePoint Services and potentially allow other cube browsers (like Excel) to access them

Using data in Microsoft Office Excel workbooks You can configure a KPI in an Excel workbook and link to the KPI from Office SharePoint Server 2007. As the data in the workbook changes, the KPI is automatically updated. You can choose to have the workbook displayed on the same Web page by using the Excel Web Renderer (EWR). https://support.office.com/en-ca/article/Create-and-publish-Key-Performance-Indicators-KPIs-e6dd8e06-b596-431b-bd03-b588e6450903#bmbacktotop

Figure: The reason to put your KPIs in the cube
Do you build a layer of views in the data warehouse that has the business rules and calculations?

Having the cube work from views is the first step in ensuring the stability of the cube. As the cube matures and the business refines what it needs to report on, the cube will change and have facts and dimensions added or removed from it.

By using views developers can hide the change in schemas and always present a consistent interface for the cube to work with.

For example, you can put the calculation of the GST component as a calculated field in a view. However, our preference is to create a denormalized field to store this data. (See our rule Do you use triggers for denormalized fields? )
Do you use logging in your ETL and SSIS packages?

The data warehouse is the one source of truth (see Do you fix data integrity problems before it gets to the data warehouse? ), so to prevent bad data from getting in there we can log them using SSIS.

You can log any exceptions to a table (see http://msdn.microsoft.com/en-us/library/ms140246.aspx . This way you can either allow good data to still be processed and just filter out the bad data.

You can then report on, and have metrics on the errors.

This will help improve the quality of the data in the warehouse as the cube matures.
Do you fix data integrity problems before it gets to the data warehouse?

The data warehouse is the one source of truth. Data should have integrity and be verified to exist in the warehouse.

Your first line of defence is verifying the data in your source databases are correct. This can be achieved using a procValidate to check the data. See our rule Do you validate each "Denormalized Field" with procValidate?

You next line of defence is in the ETL process where you can check for data integrity issues (e.g. missing keys, values). At this point you can choose to fail the process or proceed with the valid records (logging the invalid ones). See our rule Do you use logging in your ETL and SSIS packages?

The last thing you can do, and at this point the data is in the cube already, is to hide these errors with "unknown members".
Do you add a diagram for each fact table

Adding a diagram helps users clearly see which dimensions are related to which fact tables.

Figure: Bad Example - There are no diagrams at all to help the user see how facts are related to dimensions

Figure: Good Example - There is a diagram for each fact table