Contents

Overview

Back to Contents
Parent/Child dimensions, wherein each member references the ID of its parent member rather than being assigned a specific level of a hierarchy, can cause poor performance in Analysis Services databases. One way to deal with this problem is to convert a poorly performing P/C dimension table to a natural one. Instead of a structure wherein each member ID column corresponds to its parent ID column, which must be traversed recursively until the root of the hierarchy tree is reached in order to determine the full tree structure, a converted table contains one column for each level of the hierarchy in addition to the member ID, and each ancestor ID of a member is specified in the column associated with its level in the hierarchy. This prevents the requirement for traversal of the hierarchy tree in order to identify each member's location in the hierarchy.

NOTE: As of Aug 11 the tool now supports converting CustomRollupColumn and UnaryOperatorColumn from the original PC dimension (although in some cases, these features can degrade performance themselves, and it is preferable to use SCOPE calculations to implement similar functionality in 2005).

Example Hierarchy

Back to Contents
Here is a simple example of a hierarchy structure:

  • President (Employee ID = 1)
    • Vice President (Employee ID = 2)
      • Manager (Employee ID = 3)
        • Salesperson 1 (Employee ID = 4)
        • Salesperson 2 (Employee ID = 5)

Parent Child Representation of Hierarchy

Back to Contents
Here is the associated P/C dimension table that might represent this hierarchy, where each member has an ID column and a ParentID column that references its parent member:

Employee ID Manager ID MemberData
1 NULL President
2 1 Vice President
3 2 Manager
4 3 Salesperson 1
5 3 Salesperson 2

Natural Representation of Hierarchy

Back to Contents
Here is an example of a natural dimension table for the same hierarchy (where each level of the hierarchy is represented explicitly in the table rather than simply referencing its parent member):

Employee ID Level1 ID Level2 ID Level3 ID Level4 ID Level1 Data Level2 Data Level3 Data Level4 Data
1 1 NULL NULL NULL President NULL NULL NULL
2 1 2 NULL NULL President Vice President NULL NULL
3 1 2 3 NULL President Vice President Manager NULL
4 1 2 3 4 President Vice President Manager Salesperson 1
5 1 2 3 5 President Vice President Manager Salesperson 2

Why this tool can be useful

Back to Contents
The P/C table structure is very common in relational databases since it is more efficient for most transactional queries and is very compact. But due to the nature of OLAP data, it frequently does not translate to good performance in an Analysis Services dimension, especially if it is a large dimension with many members, or is used in conjunction with many other P/C dimensions in the AS database, or in many calculations. Level based aggregations are not possible using the P/C structure since the table does not contain level specific data. The level for each member must instead be derived by successively requesting the parent id of each of a member’s ancestors until the root of the table is reached. This causes additional overhead each time a parent child dimension member is accessed, and precludes one of the primary performance advantages of OLAP data, prebuilt aggregations. So in some cases, “naturalizing” the P/C table in the relational data warehouse for use by Analysis Services can significantly improve performance. However, this can be much more easily said than done in practice, since it requires recursive traversal of the P/C hierarchy in the relational source to build the level structure, significant subsequent revision to the organization of member data, and a complete redesign of any existing Analysis Services parent child dimension.

How to use it

Back to Contents
The PC Dimension Naturalizer tool can simplify the conversion process from a PC dimension to its natural equivalent. It can be used with a user interface or automated, either from the command line or as an assembly from a .NET application (or SSIS package).

Command Line Options

Back to Contents
The PC Dimension Naturalizer now supports complete control through command line options. Every option from the user interface can be specified, and a log file option is also provided. Here is a summary of the options.

/?
Show help.

/LogFile=<file path and name>
Create or append to existing log file at specified location.

/ShowUI
Show the user interface. This is the default if no other options are specified. If specified with other options, the UI will be displayed, with the relevant options preset rather than the defaults, but action will not be taken until the Naturalize button is clicked in the UI. If other options are specified without the UI, the following minimum options are required:

For AS dimension source: /Server, /Database and /Dimension
For SQL table source: /Server, /Database, /SQLTable, /SQLIDColumn and /SQLParentIDColumn

/MinLevels
A minimum number of levels to be created, whether or not the existing parent child data extends this far. This can be useful to allow room for future growth of a dimension beyond the number of levels that initially exist, so subsequent processing jobs after initial processing can use Process Update on the dimension only.

/Server=<server name>
The name of the server source of the parent child data, in the form <server\instance>, if a named instance exists, or <server> otherwise. If the /SQLSource option is specified, this is a SQL Server instance. Otherwise, it is an Analysis Services instance.

/Database=<database name>
The name of the source database for the parent child data. If the /SQLSource option is specified, this is a SQL Server database. Otherwise, it is an Analysis Services database.

/Dimension=<dimension name>
The name of the source dimension for the parent child data. This option does not apply and is ignored if the /SQLSource option is specified.

/PCAttributes=<attribute list>
A comma delimited list of attribute names from the source data that should be imported into the naturalized parent child dimension. The attribute will be replicated at each level of the natural hierarchy. The default if this option is not specified is to import all attributes related to the parent child key either directly or indirectly. This option does not apply and is ignored if the /SQLSource option is specified.

/NonPCHierarchies=<hierarchy list>
A comma delimited list of user hierarchy names from the source data that should be imported into the newly created dimension. Attributes of these hierarchies are not specific to the parent child hierarchy, so they are not replicated for each level of the naturalized dimension and appear only once for each member in the resulting view and dimension. The default if this option is not specified is to import attributes of all user hierarchies. This option does not apply and is ignored if the /SQLSource option is specified.

/ActionLevel=<1-5>
Level of action to be taken for conversion of the parent child Analysis Services dimension to a natural dimension. This is a number, 1-5. Each value is inclusive of all the prior action levels. The default is 4 if the option is not specified. Following are the meanings of each level:
  1. Create a naturalized view in the dimension’s SQL Server source table. The view has the same name of the table preceded by “DimNaturalized_”.
  2. Add the naturalized view to the Analysis Services Data Source View and replicate any applicable relationships in the DSV.
  3. Replicate the parent child dimension in a new natural dimension in the Analysis Services database. The new dimension has the same name as the original dimension preceded by “DimNaturalized_”.
  4. Add the new dimension to all the cubes of which the original dimension was a member and replicate measure group dimension usage.
  5. Process the new dimension and all cubes to which it was added.

This option does not apply and is ignored if the /SQLSource option is specified.

/SQLSource
Indicates the natural parent child data should be extracted directly from a SQL Server table rather than the default of an existing Analysis Services dimension. If the /ShowUI option is also selected, a UI specific to the SQL Server source is displayed instead of the default UI. When a SQL Server table is used directly as the source, the ActionLevel is always 1, since no Analysis Services dimension exists from which to replicate a new dimension. The new view can subsequently be added to an Analysis Services Data Source View and used to build a new Analysis Services dimension using the normal approach with natural dimensions.

/SQLTable
If /SQLSource is specified, this identifies the source parent child table in SQL Server. Otherwise this option is ignored.

/SQLIDColumn
If /SQLSource is specified, this identifies the member ID column from the source table in SQL Server. Otherwise this option is ignored.

/SQLParentIDColumn
If /SQLSource is specified, this identifies the parent ID column from the source table in SQL Server. Otherwise this option is ignored.

/SQLColumnsAsPCAttributes
If /SQLSource is specified, this identifies a comma delimited list of column names from the source table in SQL Server that should be imported for use as parent child attributes in the generated view in SQL Server on which a natural dimension may later be based. Otherwise this option is ignored. The default if this option is not specified is to import all columns other than the member and parent ID columns for use as parent child attributes.

/SQLColumnsAsNonPCAttributes
If /SQLSource is specified, this identifies a comma delimited list of column names from the source table in SQL Server that should be imported for use as natural attributes in the generated view in SQL Server on which a natural dimension may later be based. Otherwise this option is ignored. These columns are not specific to the parent child hierarchy, so they are not replicated for each level of the naturalized dimension and appear only once for each member in the resulting view. The default if this option is not specified is not to import any columns for use as natural attributes.

Default User Interface

Back to Contents
The default user interface (when the /ShowUI or no command line options at all are specified, and /SQLSource is not specified), prompts for a server, database and dimension, with defaults set to the Adventure Works database’s Employee dimension if not otherwise specified in the command line options:

Default UI

The Change Settings button for the default user interface allows configuration of the remaining settings available from the command line (except those specific to the /SQLSource option, and the /LogFile option):

Default Options

User interface configurable options include the action level, the minimum number of levels to be generated in the new natural dimension, any user hierarchies for which member attributes should be imported outside of the parent child level structure, and specification of parent child attributes that should be imported for use with the new natural parent child hierarchy. For details about these settings, see the command line options reference. Any command line options included along with /ShowUI are set automatically, but can be changed from the user interface.

If an attribute is unselected from the Attributes pane of the Change Settings dialog, any attributes indirectly related to the parent child key by way of the unselected attribute are also removed, and an informational message is displayed:

image006.jpg

Similarly, if an attribute is selected that is related to the parent child key indirectly, the attributes by way of which it is related to the key are also selected:

image008.jpg

A progress dialog is displayed when the Naturalize button is clicked on the primary user interface:

image010.jpg

When action is complete, the progress dialog reports success:

image012.jpg

Any errors occurring during action are also reported in the progress dialog, and logged to the log file along with each successful step of the action if a log file was specified in the command line options.

SQL Source User Interface

Back to Contents
In some cases, such as when a new Analysis Services database is under development, a parent child dimension may not yet exist when the PC Dimension Naturalizer is used. A parent child dimension may be created first, and then the default interface for the tool can be used to convert it to a natural dimension. Alternatively, the SQL Source option of the tool can be used to build a naturalized view of the parent child data in SQL Server, after which a dimension can be built in the usual way using the dimension wizard in Analysis Services. If possible, conversion from an existing parent child dimension is preferred, since the number of attributes to be configured grows as a product of the number of levels in the hierarchy, and their relationships and other settings are automated when a dimension is converted directly using the default approach. The new view in SQL Server, just as when generated using the default approach, is “DimNaturalized_” followed by the source table name. It is added to the same schema as the source table.

If the /SQLSource command line option is specified with the /ShowUI option, or with no other options, an alternate user interface is displayed allowing selection of a SQL Server source instance, database, parent child table, and ID and Parent ID columns:

SQL Source User Interface

As with the default user interface, these settings default to the Adventure Works DW SQL database’s DimEmployee table, upon which the Adventure Works Analysis Services database’s Employee dimension is based, if not otherwise specified with command line options. The Change Settings button for the SQL Source user interface allows configuration of the remaining settings available from the command line (except those options specific to the default Analysis Services source, and the /LogFile option):

SQL Source Options

SQL Source user interface configurable options include the minimum number of levels to be generated in the new naturalized SQL view, and columns that should be imported for use as parent child attributes, and natural attributes outside of the parent child level structure (e.g., for an Employee Department or Geography hierarchy not dependant upon the parent child relationship between members). By default, all relational columns other than the member and parent ID columns are imported for use as parent child attributes for the SQL Source conversion, and no columns are imported for use as natural attributes. For details about these settings, see the command line options reference. Any command line options included along with /ShowUI are set automatically, but can be changed from the user interface.

A progress dialog is displayed when the Naturalize button is clicked, and any errors, or successful completion of the creation of the new view in SQL Server, are reported.

.NET Assembly Usage

Back to Contents
PCDimNaturalizer.exe can be added to a .NET project (including SSIS projects) as a referenced assembly, and its methods can be called in code directly. It exposes three objects:

1. PCDimNaturalizer.PCDimNaturalizer
The class upon which the worker classes for the default (AS) conversion and SQL Source conversion are based. This class cannot be directly instantiated from code, but serves to provide common members between both of the inherited classes that derive from it. A variable declared as this type of class can be assigned to an instance of one of its derived classes and then used for generic functions until it is necessary to call a function specific to the derived class. It can be cast to the derived cast to call functions specific to that.

The public properties exposed in this class are as follows:

MinimumLevelCount
The minimum level count to create in the dimension or SQL table, depending on which derived class is used to perform the action.

SourceWindowHandle
The source window handle of the caller (progress dialog in the UI uses this). The WndProc in the source window will be called with WMUSER and WMSETTEXT Windows messages to send progress update text and bitmaps to update a user interface during the progression of the action. WM_SETTEXT must be trapped in the caller’s WndProc function (using override in the class definition), and then action taken on the contents of the messages. Here is an simplified form of the progress dialog implementation in the user interface included with the tool:
         protected override void WndProc(ref Message m)
         {
            switch (m.Msg)
            {
                case Program.WM_SETTEXT:
                    txtStatus.Text = Marshal.PtrToStringAnsi(m.LParam);
                    break;
                case Program.WM_USER:
                    pictureBox1.Paint -= pictureBox1_Paint;
                    pictureBox1.Image = Bitmap.FromHbitmap(m.LParam);
                    pictureBox1.Refresh();
                    pictureBox1.Invalidate();
                    break;
                default:
                    base.WndProc(ref m);
                    break;
            }
        }


2. PCDimNaturalizer.ASPCDimNaturalizer()

The derived class for use in automating conversion of dimensions from Analysis Services databases to natural equivalents. Its members are as follows:

ASNaturalizationActionLevel
Specifies the action level to be taken, 1-5. Details of this are described in the command line options reference.

MinimumLevelCount
Minmium number of levels to create in the dimension, to allow room for growth when new members are added in the future, potentially at lower levels than would otherwise be created here. Allows up to 32 levels to be prebuilt regardless of the actual number of levels in the parent child data structure.

NonPCHierarchiesToInclude
List<string>: List of string values representing the Non-PC Hierarchies whose attributes should be recreated in the new dimension and SQL view.

PCHierarchiesToInclude
List<string>: List of string values representing the PC Attributes that should b recreated in the new dimension in the SQL view and the resultant DSV, Dimension, and/or cubes (depending on ActionLevel).

SourceWindowHandle
The source window handle of the caller (progress dialog in the UI uses this). The WndProc in the source window will be called with WMUSER and WMSETTEXT Windows messages to send progress update text and bitmaps to update a user interface during the progression of the action. WM_SETTEXT must be trapped in the caller’s WndProc function (using override in the class definition), and then action taken on the contents of the messages.

Naturalize()
Overloads two function forms:

void Naturalize(object MinLevels);
- and -
void Naturalize();

Not passing a MinLevels parameter is the equivilent of running Naturalize() with zero minimum levels, which will be garaunteed to generate the smallest possible view for only existing data, but without room for growth of any new levels in the future without completely regenerating the view, DSV references and dimension, and reprocessing Full the dimension and any cubes it is in.

3. PCDimNaturalizer.SQLPCDimNaturalizer()

The derived class for use in automating conversion of SQL Server parent child tables to their natural equivalents. Its members are as follows:

MinimumLevelCount
Minmium number of levels to create in the dimension, to allow room for growth when new members are added in the future, potentially at lower levels than would otherwise be created here.

SourceWindowHandle
The source window handle of the caller (progress dialog in the UI uses this). The WndProc in the source window will be called with WMUSER and WMSETTEXT Windows messages to send progress update text and bitmaps to update a user interface during the progression of the action. WM_SETTEXT must be trapped in the caller’s WndProc function (using override in the class definition), and then action taken on the contents of the messages.

SQLColsAsPCAttributes
List<string>: List of strings represenenting the columns of the dimension other than the member ID and parent Key that are available to be included in the new dimension. If this isn’t specified, by default, all attributes are included in the PC dimension.

SQLColAsNonPCAttributes
List<string>: List of strings represenenting the columns of the dimension other than the member ID and parent Key that are available to be included in the new dimension for natural use (not members of the parent child dimensiont). If this isn’t specified, by default, no attributes are included in the PC dimension.

Naturalize()
Overloads two function forms:

void Naturalize(object MinLevels);
- and -
void Naturalize();

Not passing a MinLevels parameter is the equivilent of running Naturalize() with zero minimum levels, which will be garaunteed to generate the smallest possible view for only existing data, but without room for growth of any new levels in the future without completely regenerating the view, DSV references and dimension, and reprocessing Full the dimension and any cubes it is in. The other settings apply. A new SQL view and AS Dimension will be created, stored in the DSV and relationahips created, and processed fully, depending on what options were checked. Calling code to refesh your dimension data overnight is simple and can be incorporated into SSIS easily and scheduled. Using MinLevels you can even allow significant room for future growth that need not require a complete reworking of your dimension’s settings.

What the naturalizer does

Back to Contents

Creation of natural SQL view

Back to Contents
When run in default mode, the PC Dimension Naturalizer examines a parent child dimension and builds a view of that dimension’s source table in SQL Server. The view uses a SQL CTE to recursively traverse the level structure and generate one column for each level’s key, and populates those columns with the corresponding keys from the original table. This is joined to the original dimension table by these key values to import data members.

For those data members that are related directly or indirectly to the parent child key, one column per level in the resulting view is generated, with the prefix “Levelx_”, where x indicates the level. The level numbering starts at 1 and goes as high as the minimum level count specified in the tool’s options, or as high as the number of actual levels in the parent child structure, whichever is greater.

For those data members that are members of a natural user hierarchy, such as the Department Name hierarchy in the Adventure Works sample database’s Employee dimension, or an Employee Geography dimension to allow browsing employees by location, attributes will be created only once for each member of the dimension, without the multiple instances and level prefixes required for parent child attributes. This allows user hierarchies to be reconstructed in the new dimension to map directly to the previous user hierarchies independently of the parent child hierarchy.

Any given attribute may be have a representation as a parent child attribute, with one version for each level, as well as a natural attribute expressed only once for use by natural user hierarchies.

When run in SQL Source mode, the naturalizer only creates a natural SQL View, since there is no existing dimension structure against which to reconstruct a new one. The view does not interpret attributes or hierarchies, but only columns from the original table. The member and parent id columns are specified before the tool runs, and by default, all other columns are treated as natural attribute columns, and one copy per level in the parent child structure is generated. By default, no columns are created as natural attributes, but any or all columns may also be selected.

The generated view is dynamic and based on the original table except for the level count, which is static. Members added to the table at existing levels will appear in the resultant table. Members added to the original table will still be present in the view, but data for their level structure below the maximum number of levels specified in the view will not be displayed. If new levels are added to the source data, the view should be rebuilt to accommodate them.

Adding the view to the Analysis Services DSV

Back to Contents
After the view is generated, if the action level is > 1 and the SQL Source option was not used, it is added to the same Data Source View from Analysis Services on which the original parent child dimension was based. Relationships in the DSV are recreated for the new view, with one relationship for each level of the natural hierarchy mapping to the same relationships the original member ID column has. Natural attribute columns (which do not correspond to the level structure of the parent child hierarchy) are related exactly as they were in the original parent child table.

Building the new dimension

Back to Contents
Once the SQL view is added to the DSV, if the action level is > 2 and the SQL Source option was not used, a new dimension is created using the new view. The dimension is structured identically to the original dimension, but instead of using a parent child key, it generates a new user hierarchy with the same name the previous parent child hierarchy had, materializing each level of the hierarchy physically, using the same naming convention (if any) specified in the original parent child hierarchy’s naming template. Attribute relationships are configured between levels of the hierarchy to optimize performance, and attribute key columns for the hierarchy levels are constructed to ensure correct results when browsing. Each attribute that was related directly to the key previously is related to its corresponding level in the new hierarchy, so Level1EmployeeName is related to the Level1 key, Level2EmployeeName is related to the Level2 key, and so on. Each attribute that was related indirectly to the dimension key originally is related to its corresponding parent’s version at its corresponding level, so Level1BirthDate might be related to Level1BirthYear, which is in turn related directly to the Level1 key. Other than building the level specific attribute relationships, attribute relationships are preserved exactly as they originally existed, so if BirthDate was related indirectly to the dimension key by way of BirthYear in the original dimension, this indirect relationship is preserved in the new level structure. Natural attributes that are not related to the level structure are related to one another just as they were in the original dimension. Natural user hierarchies are also migrated exactly as they existed in the original dimension, including attribute relationships.

Adding the new dimension to cubes

Back to Contents
Once the new dimension is created, if the action level is > 3 and the SQL Source option was not used, the dimension is added to all the same cubes where the original dimension existed, and its measure group dimension usage settings are configured identically.

Processing the new dimension and related cubes

Back to Contents
If the action level is > 4 and the SQL Source option was not used, the new dimension is finally processed and all cubes of which it is a member are also processed.

Permissions Requirements

Back to Contents
The user running the tool must have administrative rights to the SQL Server instance where the relational data on which the parent child dimension is based is stored in order to successfully create a naturalized SQL view (either when using the default interface or the SQL Source option). Administrative rights are also required in the Analysis Services instance to successfully create the naturalized dimension when using the default interface.

Summary

Back to Contents
Naturalizing parent/child dimensions is a potentially cumbersome process and is not appropriate for implementations with satisfactory performance (or if performance problems exist, but are due to other causes). But in some cases the gain in query performance can be well worth the investment. The flexibility of the parent child hierarchy is sacrificed in order to attain greater performance in the resulting data. This tool intends to help simplify the tedious process of conversion so in those cases where it is appropriate, natural dimensions can replace their parent child equivalents with greater ease and lower cost.

I am Jon Burchel

Back to Contents
Jon Burchel
Please email me with any questions, suggestions, bug reports or comments.

Last edited Aug 30, 2008 at 4:11 AM by jburchel, version 31