Category Archives: Database

The Snowflake Schema

The MicroStrategy platform is designed to run on a data warehouse architected using a snowflaked data model.  For those unfamiliar with this term, snowflaked schemas are similar to the star schema concept except that they are allowed to have additional dimension tables joining directly off of other dimensional tables.  When one looks at the dimensions sprouting off of one another,  one might compare the structure to that of an actual snowflake’s crystals.

 fig11

So are there any advantages of using a snowflaking over the straight star schema model?

First off, lookup tables are a simple way of immediately taking advantage of snowflaking.   Say you have a dimension table called D_PRODUCTS and a field called PRODUCT TYPE which has only 4 distinct values.  Let’s say you require a list of PRODUCT TYPE’s like when you are creating a drop down for a filter on a report.   Without snowflaking, MicroStrategy will be forced to do a “select distinct PRODUCT TYPE from D_PRODUCTS” in order to pull the distinct product type values to display to the user.  With snowflaking, we can instead create a table called V_LOOKUP_PRODUCT_TYPE that snowflakes off of D_PRODUCTS, and whenever we need to pull a product listing, we instead only need to query a “SELECT * from V_LOOKUP_PRODUCT_TYPE” which will only contain the 4 values we want to display.   Although the solution without snowflaking probably is not that bad, it is still more expensive than with snowflaking, but can be even more expensive if D_PRODUCTS contains a large amount of data.

Secondly, snowflaking allows for more logical and efficient ways of storing data into hierarchies.   Using a star schema model without snowflaking requires that all dimension tables be flattened out and joined directly to our fact table(s).   So, let’s say we have a hierarchy which consists of various manufacturers and the products that they manufacture.   Here is an example of what a model without snowflaking might look like. 

 fig12b

And, with snowflaking:

 fig12c

Notice that with snowflaking, we can now store manufacturers off as it’s own entity which can help organize data more logically.  Also, we reduce the size of D_PRODUCT, and only have to go to D_MANUFACTURER when we need to pull data about the manufacturer. 

Without snowflaking, we have two options for adding new attributes.  Either 1. add the attributes to existing dimension tables, or 2.  add new dimension tables and place the attributes there.  The first option eventually leads to unnecessarily long dimension tables which can be costly in terms of loading and query performance.   The second option might reduce the size of your dims, but also separates the dimension tables apart, so they can no longer be queried together unless joined to a fact table.   For example, if we split out D_PRODUCT and D_MANUFACTURER, each joined separately to our fact, then a report listing all products and their manufacturers, now must be queried by joining also to the fact table.    With a snowflaked schema, we can logically join dimension tables together to make up our hierarchies, and also limit the size of our dimensions, all without having to join to a fact table.  

I’m sure there are other advantages of using a snow flaked schema over the traditional star schema, such as the possibilities of simplifying ETL and EDW designs.   What are your thoughts on using the Snowflake Schema data model for analytics?