SSAS Sub Select Vs Where Clause

I have created various reports with MDX, and mostly I have used sub selects. Recently, in one of the report, for some reason, I swapped “Sub Select” with “Where Clause” and post the testing I sent the report to production. I felt about blogging on it, and here it goes….

Please note that, I will be using Sub Select and Sub Cube interchangeably.

Let’s see a simple MDX for each:

MDX with Sub-Select:

SELECT {[Measures].[Order Count]} on 0 ,[Product].[Category].Members
on 1

FROM     (select

{[Product].[Product Categories].[Subcategory].&[26]

,[Product].[Product Categories].[Subcategory].&[22]} on 0

from [Adventure Works] );

If you notice the numbers, the total for accessories and clothing does not match the number shown for “All Products”. It is not a rocket science and I leave it to readers to find that out. Also, here another concept of Attribute Relationship comes into play, which I am going to write a separate blog.

Sub Cube how it works:

  • All members from all dimensions are eligible for selection.
  • Cube context is defined by the default members of each dimension. However, you can refer to members not within that Data Space.
  • In case of sub selects, the cube is sliced based on the dimensions specified in sub selects, which actually limits the cube space. Thus the members on axis by default contain only related data that is now specified by the sub select. Thus the Categories belonging to sub categories in sub select are shown. The order count, also is for those sub categories.
  • Sub selects are intended to provide “visual totals” type functionality, where the value of the ‘All Member’ on a hierarchy changes to reflect the members actually selected in a query. What that means is the totals would be the totals of the numbers in the cube space restricted by the sub select.
  • Another point of interest is that calculated members have the ability to look outside the subcube. For example:

 with member measures.test as

[Product].[Category].currentmember.prevmember.name

select [Measures].test on 0, [Product].[Category].[Category].members on 1

from ( select ( {[Product].[Category].[Category].&[4],

[Product].[Category].[Category].&[3]} ) onfrom [Adventure Works])

  • Sub cubes cannot contain with clause, non empty clause and a having clause.
  • Sub cubes do not restrict levels as seen below:

 select

descendants([TIME].[yqmd].year.members, time.yqmd.month,self_and_before) on 0

from ( select [TIME].[YQMD].[Quarter].&[2001-10-01T00:00:00] on 0

from [Sales])

  • Tuple specifications for sub cube: When you specify the tuples in the sub cube, the data cells not part of the sub cube will not contribute to aggregates. Contrast this with autoexists.
SELECT

{[Date].[Calendar].[Calendar Quarter].members, [Date].[Calendar].[Calendar year].members} on 0

,[Product].[Category].Members on 1

FROM (select

{([Product].[Product Categories].[Subcategory].&[26]

,[Date].[Calendar].[Calendar Quarter].&[2004]&[1])

,([Product].[Product Categories].[Subcategory].&[22],[Date].[Calendar].[Calendar Quarter].&[2004]&[2])}

on 0

from [Adventure Works] )

where [Measures].[Internet Sales Amount];

  • Where clause specified in the sub cubes is not passed on to outer MDX query:

Let’s find out the total sales for Australia for year 2002:

select [Customer].[Customer Geography].[Country].&[Australia] on 0,

[Measures].[Internet Sales Amount] on 1

from [Adventure Works]

where ([Date].[Calendar].[Calendar Year].&[2002]);

Now let’s use this in the sub cube clause:

select [Customer].[Customer Geography].[Country].&[Australia] on 0,

[Measures].[Internet Sales Amount] on 1

from

( select [Customer].[Customer Geography].[Country].&[Australia] on 0,

[Measures].[Internet Sales Amount] on 1

from [Adventure Works]

where ([Date].[Calendar].[Calendar Year].&[2002]))

Notice that for Australia in 2002, the sales was 2.15 million. But, when the same query was used as sub cube, the slicer in the inner query was ignored by the outer query. It did not restrict the time dimension in any way. That implies that the where clause specified in the sub cubes are not passed on to outer query context.

  • Aggregate function does not restrict its values to the sub cube:
select [Customer].[Customer Geography].[Country].&[Australia] on 0,

[Measures].[Internet Sales Amount] on 1

from [Adventure Works]

where ([Date].[Calendar].[Calendar Year].&[2002]);

The Australia sales is:

The total sales for all countries is:

select

[Customer].[Customer Geography].[All Customers] on 0,

[Measures].[Internet Sales Amount] on 1

from [Adventure Works]

where ([Date].[Calendar].[Calendar Year].&[2002]);

Now when we use aggregate function:

with
member measures.m as

aggregate([Customer].[Customer Geography].[Country].members, [Measures].[Internet Sales Amount])

select [Customer].[Customer Geography].[Country].&[Australia] on 0,

measures.m on 1

from [Adventure Works] where ([Date].[Calendar].[Calendar Year].&[2002]);

  • Performance: When a query contains a sub select it means that the results of any calculations can only be cached for the lifetime of the query. When you get to multiple nested sub selects they do have a slight performance advantage.

MDX with Where Clause:

SELECT {[Measures].[Order Count]} on 0,

[Product].[Category].Members on 1

FROM [Adventure Works]

WHERE ({[Product].[Product Categories].[Subcategory].&[26],

[Product].[Product Categories].[Subcategory].&[22]} );

Before we go further, notice that the order count returned is for the product categories which contain the sub categories specified in the where clause. In other words, the MDX states that give me the sum of order count for categories that contain specified sub categories.

To validate it, see the following MDX:

SELECT {[Measures].[Order Count]}on 0

,{[Product].[Product Categories].[Subcategory].&[26]

,[Product].[Product Categories].[Subcategory].&[22]}

on 1 FROM [Adventure Works]

Compare these numbers with the MDX using Sub-Select above.

Where Clause how it works:

  • All members from all dimensions except the dimensions whose members are specified in the where clause are eligible for selection.
  • You cannot refer to members outside the Data Space hence you cannot use the <where member>’s dimension in the <columns set> or <rows set>.
  • No Visual total: It is important to note that the total (or ALL) member is the total sum of the whole dimension to which the <where member> belongs. Notice the effect of having a where clause here. Categories are pulled based on the members specified in the where clause. The measures are still rolled up by members selected on the axis. In other words, order count is for the product categories (including all sub categories).
  • Now to get the measures rolled up by the members specified in the where clause, we need to use the “existing” condition.

Before we look at EXISTING keyword, let’s first look at the plain MDX:

WITH MEMBER Measures.OrderCount as

SUM( {[Product].[Product Categories].[Subcategory].&[26]

    ,[Product].[Product Categories].[Subcategory].&[22]}

, [Measures].[Order Count])

SELECT {Measures.[OrderCount]}        on 0

        ,[Product].[Category].Members    on 1

FROM [Adventure Works]

WHERE{[Product].[Product Categories].[Subcategory].&[26]

,[Product].[Product Categories].[Subcategory].&[22]} );


Notice that 2573 is the sum of these two categories accessories and clothing as we saw in the previous MDX. We do get the same number for “All Products” and these two categories. This is because the context of where clause is not being used in evaluating calculated member. Also, while evaluating row axis, for each product category, sum returned same value.

Now, let’s put EXISTING keyword, and notice that the correct context is passed as is seen in the MDX below:

WITH MEMBER Measures.OrderCount as

SUM(existing {[Product].[Product Categories].[Subcategory].&[26]

            ,[Product].[Product Categories].[Subcategory].&[22]}

, [Measures].[Order Count])

SELECT {Measures.[OrderCount]}        on 0

        ,[Product].[Category].Members    on 1

FROM [Adventure Works]

WHERE ( {[Product].[Product Categories].[Subcategory].&[26]

,[Product].[Product Categories].[Subcategory].&[22]} );


  • If the WHERE clause does not contain one member but a <where set>, the “CurrentMember” will not work properly. You will need the help of the keyword “EXISTING” on the dimension/set used by the <where set> to ensure you stay within the context of the members defined by this <where set>.
  • May produce better performance
  • It is more readable
  • You might want to use the WHERE clause because it sets the context that is, it defines the “CurrentMember” and some specific MDX functions rely on this capacity (YTD, PP…).

Note:
When there are multiple members are specified in the where clause, it presents a popular MULTI-SELECT issue. I am going to blog about it very soon. Stay tuned…

Wrapping this up with a quick summary here:

SUB SELECT

WHERE CLAUSE

All members from all dimensions are eligible for selection.

<where member> context hence that specified member plus all dimensions EXCEPT the one to which the <where member> belongs are eligible for selection.

Limits the cube space

It does not limits the cube space.

This context is defined by the default members of each dimension.However, you can refer to member not within that Data Space with calculated members.

You cannot refer to members outside the Data Space hence you cannot use the <where member>’s dimension in the <columns set> or <rows set>.

Visual total: It is important to note that the total (or ALL) member will present you a total sum of the values for the members defined by the <subselect set>. This is a “Visual Total”.

No Visual total: It is important to note that the total (or ALL) member will present you the total sum of the whole dimension to which the <where member> belongs.

Aggregate function does not restricts its values to the subcube.

Now to get the measures rolled up by the members specified in the where clause, we need to use the “existing” condition.

When you get to multiple nested subselects they do have a slight performance advantage. Which is why you will see SSRS use both sub-selects and the WHERE clause. This way they get both the performance gain and get the context set, but it makes the query a real mess aesthetically.

You might want to use the WHERE clause because it sets the context that is, it defines the “CurrentMember” and some specific MDX functions rely on this capacity (YTD, PP…).

Sub cubes cannot contain with clauses, non empty clause or having clause.

If you are dealing with “raw” measures, they can often be used interchangeably

If you are dealing with “raw” measures, they can often be used interchangeably

Data cells not part of the subcube will not contribute to aggregates

More readable

Sub cubes do not restrict levels.
Where clause context specified in subcube is not passed on to outer query.
Comparing performance with these two is not considered here.

Please provide your feedback, if you have any. I can be reached at jagdish_malani@hotmail.com

Advertisements

SSAS Server based Time Dimension

For any SSAS implementation, time dimension plays an important role. Different architects take different approach on this. Some of them decide to create their own physical time dimension table and populate it accordingly. Some architects use the SSAS Server based time dimension. SSAS again provides two options namely:

  • A table for time dimension is created in the data source
  • A table for time dimension is created on the server

Let’s see the impact of using server based time dimension and then you would be able to make a call about which way to go. Following are the points that need consideration:

  1. Relational database contains all fact tables and dimension tables, except a time dimension table, which is implemented in an OLAP layer. So the data model is split between data source and OLAP layers. If relational data warehouse database needs to be exposed to other applications for some reporting, custom rollup by time periods would be difficult to implement.
  2. With server based time dimension, the keys at different levels of a hierarchy are not unique, which present different problems while querying the AS database. Let’s quickly look at this scenario.

    This is the fact table, where reservation date is the data the booking is made and check_in_date is the date when the quest arrived. The column sales is used as a measure.


    The requirement is to create a report which displays the sales for all months, where the month of reservation date and month of check in date are same.
    I have populated the table as follows:


    If we were to use the SQL, then following SQL would give us the required results:


    With MDX, this was again a simple query. In my cube, I have used server based time dimension. Let’s see the results:

    WITH

    MEMBER [MEASURES].[SALES FOR SAME MONTH RESERVATION] AS

    (LINKMEMBER([TIME].[YQMD].CURRENTMEMBER,[CHECK IN].[YQMD]),[MEASURES].[SALES])

    // Displaying the check in date member name to ensure, correct linking

    MEMBER [MEASURES].[CHECK IN DATE] AS

    LINKMEMBER([TIME].[YQMD].CURRENTMEMBER,[CHECK IN].[YQMD]).MEMBER_NAME

    SELECT

    {MEASURES.[CHECK IN DATE],[MEASURES].[SALES FOR SAME MONTH RESERVATION]}

    ON
    COLUMNS,

    YTD([TIME].[YQMD].[MONTH].&[2011-04-01T00:00:00])

    ON
    ROWS

    FROM [TEST CUBE]


    Notice that for January, 2011 the check in date member is at returned as Calendar Year 2011. Also for April, it returned member at Quarter level. Why is this so?

    The reason for this is server based time dimension, uses the same key at different levels as shown below. Thus link member links it to wrong member.

Member

Key

Calendar 2011

1-Jan-2011

Quarter 1, 2011

1-Jan-2011

January 2011

1-Jan-2011

1-Jan-2011

1-Jan-2011

To fix it, we need to change the MDX a little as follows:

WITH

MEMBER [MEASURES].[SALES FOR SAME MONTH RESERVATION] AS

(iif (

    linkmember([TIME].[YQMD].CURRENTMEMBER

    ,[CHECK IN].[YQMD]).level.ordinal = 1,

        linkmember([TIME].[YQMD].CURRENTMEMBER

        ,[CHECK IN].[YQMD]).firstchild.firstchild,

        iif (

            linkmember([TIME].[YQMD].CURRENTMEMBER

            ,[CHECK IN].[YQMD]).level.ordinal = 2,

                linkmember([TIME].[YQMD].CURRENTMEMBER

                ,[CHECK IN].[YQMD]).firstchild,

                linkmember([TIME].[YQMD].CURRENTMEMBER

                ,[CHECK IN].[YQMD])

            )

    ),[MEASURES].[SALES])

MEMBER [MEASURES].[CHECK IN DATE] AS

iif (

    linkmember([TIME].[YQMD].CURRENTMEMBER

    ,[CHECK IN].[YQMD]).level.ordinal = 1,

        linkmember([TIME].[YQMD].CURRENTMEMBER

        ,[CHECK IN].[YQMD]).firstchild.firstchild,

        iif (

            linkmember([TIME].[YQMD].CURRENTMEMBER

            ,[CHECK IN].[YQMD]).level.ordinal = 2,

                linkmember([TIME].[YQMD].CURRENTMEMBER

                ,[CHECK IN].[YQMD]).firstchild,

                linkmember([TIME].[YQMD].CURRENTMEMBER

                ,[CHECK IN].[YQMD])

            )

    ).MEMBER_NAME

SELECT

{MEASURES.[CHECK IN DATE],[MEASURES].[SALES FOR SAME MONTH RESERVATION]} ON
COLUMNS,

YTD([TIME].[YQMD].[MONTH].&[2011-04-01T00:00:00]) ON
ROWS

      FROM [TEST CUBE]

3. If you notice, the month names are being displayed as “January 2011”. Usually the month names are displayed as Jan-11 in reports. If that              is required, you have to change the label expression in all those reports probably as follows:

      =Left(Fields!Month.Value,3)+”-“+Right(Fields!Month.Value,2)

The bottom line is, we do not have much control on server based time dimension and we cannot customize it.

With all these issues (if you may call it), I would think, server based time dimension in SSAS can be used in following scenarios:

  • We have no control on the data sources at all.
  • We are already in production, and we cannot do much about it.
  • Or by the virtue of UDM, your data source is some OLTP system. For our data warehouse database, we would not be allowed to change the OLTP system in anyway.

Please feel free to provide feedback.


Slowly Changing Dimension Type 2 in SSAS Dimension

In the world of BI, everybody must be familiar with slowly changing dimension. There are three types of changing dimensions namely Type 1 (where the attributes are overwritten), Type 2 (history is preserved) and Type 3(limited history is preserved in additional columns).

And when it comes to creating a SSAS dimension, we need to take care of few things like

  • attribute relationships and
  • how the hierarchies would behave if there are some
  • Performance

I will only discuss about SCD – II here in this blog.

Let’s look at the data in customer dimension. We have a SCD – II here.

We are tracking history by marital status. And for the sake of brevity, we will allow change in value only once.

Thus for any change in the value of “Married” column, would require a new record to be inserted with new value in ScdStartDate and ScdState would be marked as “C” for current. The previous record would be marked as “H” in the column ScdState column and end date would be stamped in ScdEndDate. See the following screenshot. Notice that Customer 6 has two records


Now, first we will see, how to set up the attribute relationships in this case. Following is the screenshot of how we setup the attribute relationship:


Here are the attribute which may change, are made related attributes of Dim Customer (Key Attribute). All other attributes which would not change are made related attributes of Customer business key. We are assuming, birth place of a customer would not change.

Let’s create a hierarchy State – City – Customer:


All is well so far. Let’s browse this dimension. We see following:


We see that customer 6 is repeating under the city MUMBAI. This is not what we want. As you have guessed, the solution is pretty simple and change the hierarchy level from Dim Customers (which would repeat for a record with change in marital status) to COD Customer (business key).

Here we go. We change the


Let’s browse once to see if every thing is perfect:


Yes everything is perfect. We are almost done with our dimension design. No wait !!!.

If you have noticed, in this design, we have create a un-natural user hierarchy.


This would affect the querying performance. If you consider real world examples, some dimensions are very huge. For instance, customer dimension for a store like Walmart.

Now let’s see how to fix this.

As a first step, we need to create an intermediate attribute between city and COD Customer. This can be done in DSV with a named calculation as follows:


After that bring in this attribute in the dimension and create a level in the hierarchy with this attribute between City and COD customer:


Let’s make this attribute invisible in the dimension. We do not want this level to show up in the hierarchy as well while browsing. So we need to make Customer – City level invisible. How do we do that, as there is no direct property available for that. To do that, change the name column of this attribute to “City” and use “HideMemberIf” property to “ParentName” as follows:


Let’s see how this comes up while browsing:


Now we can say that our job is done.