Category Archives: Database

MicroStrategy Best Practises – part 1

First installment of best practices for MicroStrategy focusing on data warehouse. Thumb rule of best BI implementation,n not just MicroStrategy, is the optimized for setting up the data warehouse correctly. Here are few from me, few from MicroStrategy tech notes lost in deep. I shall post the best practices for MicroStrategy soon.

1. Data warehouse design should be snowflake schema.

2. Don’t give direct access to tables. Create view and provide access to them. Don’t create custom views.

3. Login/Password used for creation of DNS used to configure the MicroStrategy should be secret. DBA must not tell them to anyone and should be present to input L/P when MSTR Admin is configuring the I-Server.

4. Unique key of table should not be dependent on another other column of same table irrespective of such a column part of unique key to not.

5. Everyone of MicroStrategy team must have individual access to an interface to run SQL queries against the DEV WH with create access.

6. Everyone of MicroStrategy team must know the PK, FK and composite key of each table in OLTP (source) system.

7. All leading and training white spaces in table cells must be cleansed. Using SQL in MSTR (for same) is not a great idea. White spaces decrease SQL speed and it is visible for large warehouses.

8. Only fact tables should be partitioned.

9. To keep the ETL simple, you might want to build your re-normalized structure first, and afterward “generate” the higher level look-ups from the lowest-level look-up table.

Look forward for addition and criticism for above.

Late Edit: Every statement above has a blog post to justify. [:)]

Selecting a BI Package – Proof of Concept

The selection of a BI software package can be a difficult process to go through. There are alot of things to consider when making such a decision such as what your specific needs are as an organization, what skill sets your organization has available for support, hardware considerations, and cost. Here are a few ideas that might be useful when selecting your next Business Intelligence solution.

  1. Determine what hardware and operating systems the package is compatible with. Have the sales team document and lay this out up front, so you know if you have the skillset, software licenses, and hardware to use the package in the first place.
  2. Further layout what the additional costs will be for procuring software/hardware, purchasing support, and hiring or training a proper skill set. This will help you get to a total cost of ownership. Understand what the annual support fee structure is like.
  3. Make sure to walk through the development process. Often times, proof of concepts revolve around software capabilities, but it is just as important to look under the hood. A complicated development process can lead to many problems down the road for development and maintenance. Not all development processes are created equal.
  4. Every BI tool seems to come with a license for each function of the software. Licensing can get pretty complicated, so make sure you understand what each license covers. It really helps to lay it out in some form of spreadsheet, so you can total up how much licensing will cost based on what you select.
  5. Understand the capabilities of each package, simplify them into common terms across all packages, and attempt to compare these capabilities, apples to apples. Such capabilities might include OLAP, caching, scheduling, delivery, pixel perfect reporting, dashboarding capabilities, report development interface, user interface, architecting process, SQL generation, etc.
  6. Understand the technology behind the OLAP that is being delivered and determine if this will serve your organization adequately. Is the technology multi dimensional OLAP or relational OLAP? Multi dimensional OLAP performs faster, but also requires that data be stored physically on the platform. Relational OLAP may be slower, but sits directly on your EDW and usually is more flexible when expanding the scope of data that is to be reported. HOLAP which is a hybrid OLAP utilizes relational OLAP, but also allows for caching of data which can improve performance. Beware that caching may be sold as a multi-dimensional OLAP alternative, but there are limitations to caching’s effectiveness. In the past this was mainly limited by the RAM size of your BI server, but as technology advances, caching becomes more and more useful.
  7. Is it important to develop a proof of concept on your own data? It is definitely nice to see your data within a sales presentation, but building a POC directly on an EDW will most likely not be possible. Most vendors will ask for downloads of data that they will then store locally on their laptop for demo purposes. Because this is normally the case, the only real benefit of doing this is so that the business can see a presentation in terms they can understand. Don’t expect to gain any other type of insight about a product because they are using your data in the POC….unless they actually decide to develop on your EDW.

Well, that’s all I can think of for now. Cheers.

Calling a Stored Procedure from MicroStrategy

I alwasy wondered how one can do it. This is something that I had seen during my employment but as not involved, was not able to recall it. finally found how to do it. MicroStrategy kb search is very pathetic.

How to use it: Open VLDB Properties of the report and changed setting Pre/Post Statements based on your requirement.

Take an example: {call GenerateDate([Age Interval])}

Calling a stored precedure from MictoStrategy Report

Calling a stored precedure from MictoStrategy Report

{} curly brackets are required as per MicroStrategy.
MicroStrategy recommends using CALL statement instead of EXEC to execute/call a stored procedure.
[Age Interval] is a MicroStrategy numeric value prompt that can be passed as parameter to the stored procedure. This is optional.

Other things to do but not for all ODBC Drivers.

MicroStrategy ODBC Driver must be configured to accept results from stored procedures. As per tech-note I found only Oracle needs to do following.

UNIX: The following line must be added manually to the Data Source Name (DSN) definition in the ODBC.INI file
ProcedureRetResults=1
Windows: Open the DSN and see if there is some setting for Procedure Result. Select that.

This can be used to pass parameter to a stored procedure from MicroStrategy too. In cases where multiple values need to be passed from the same prompt, an element list prompt should be used.

Spool space error

Okay, so this is an old one, but I still find it useful.

If you can’t run a report because of a spool space error, and you can’t convince the DBA to give you a couple of gigabytes more, then you might want to go to the Report Editor, Data ->VLDB Properties -> Tables -> Intermediate Table Type and select True temporary table.

What this does is modifying the SQL so that instead of having a huge chunk of code (that would use up a lot of memory) it breaks the code into CREATE – INSERT bits that are far easier to handle. It also makes the SQL more readable by humans (this includes programmers).

By the way, this setting also comes to rescue when you get the “Database is terrified by the script and it can’t even begin to analyze it” error. I don’t have the specific error message at hand, but I know for sure that Teradata does this when the SQL code is all in one pass and is so large that it can be seen from the Moon.

Historical Master Data Attributes

Historical master data attributes are often specially handled when designing an Enterprise Data Warehouse (EDW).   On most BI projects I’ve been on, there is an exercise of sitting with the business in order to determine which attributes are required to be stored historically.  Once this is complete, the requirements are then sent off to the Data Warehouse team for design and development.  How this solution gets architected may be one of the determining factors on whether your project is successful or not, and as a BI team member, sometimes getting your two cents can make your life that much easier.

In the past, I’ve come across three different designs for handling historical master data attributes, with one of them standing out above the rest.    The first design involves placing historical attribute fields directly on to fact tables.  Please remember we are talking about historical attributes that are master data related, and not historical attributes that are related directly with transactions.  The major issue with this comes about during validation and warehouse maintenance if data problems exist.   In order to validate that ETL processes are working properly, one now has to validate every row in fact table to make sure that the  historical attribute is correctly derived.   Also, because this attibute is stored at the fact level, we now have to add this attribute to any new fact tables when needed.  If, for any reason, problems are found in the historical attribute derivation, then the fact tables have to be dropped and reloaded, then re-validated.  Simply put, this solution should not be used if it can be avoided.

Another design which is quite popular is the use of time based, type 2 dimensions.  This solution involves storing attributes, along with the date range for which those values are valid.   So for example:

fig21a

This solution works just fine in conjunction with a fact table.  To pull the current value of an attributes, we only need to pull the latest record, usually marked with an end date of ‘12/31/9999′.   The only issues that come about with this approach is one, if you store more than one historic attribute on a dimension, then reporting history of those attributes without joining to a fact table cannot be accomplished.    From our example diagram above, notice that the Price Type field is accurately showing history, but Placement Type contains an extra record showing Aisle 5 starting on both 6/1/2001 and 7/1/2001.   One solution for avoiding this, is to store the Previous values for every historic attribute, and then only pull historic data for an attribute if the Prev value <> Current value.  The second issue often comes about when ETL is used to create history in the warehouse.   This may be the only possible solution, but allowing this can ’open up a can of worms’, and now makes validation a much more difficult process.

The recommended solution here actually involves setting rules to the type 2 dimension just discussed.   First off, only track the history of one historic attribute per dimension table.

fig21d

This simplifies the ETL and data storage designs, and also eliminates the confusion when you have more than one historical attribute on a table.    Storing the data in this way not only seems more natural, but also simplifies the validation and maintenance of the historical data.   If a problem is found, then quite simply, fix that problem in one place rather than dealing with more complex ETL procedure used to track multiple historic fields on one table.  Second, don’t use ETL to create history if it can be helped.   ETL created history only starts at the time that the ETL script is implemented, so you never get a full view of history unless it was implemented from the beginning.   Also, ETL created history is difficult to validate, and in the case that problems arise, fixing that history can also be a problem.   Of course, every implementation is different, and you may not have a choice.  All of the information in this article has been tested thoroughly through implementation via MicroStrategy 8.1.