header image
Bottoms Up or Top Down? The Eternal Business Intelligence Question
November 24th, 2006 under Methods. [ Comments: none ]

There are two kinds of client prospects that a business intelligence systems consultant is likely to encounter: those who already know something about business intelligence systems and data warehousing, and those who don’t.  Of the former, practically all of them have read at least one book by Ralph Kimball, or Bill Inmon, or both, and invariably ask: which methodology do you, the consultant, recommend?

Well, this consultant is likely to say both, or neither, or it depends.  Neither approach is perfect for all clients, but both contain elements that are useful in almost any context.  The issues have been discussed in many other blogs and forums, and I’m sure I have nothing revolutionary to say about it.  

But this is my blog, and I want to communicate with my prospective clients, so I’m going to lay out my take on it.

The majority of projects I have been involved in started with a scope limited to a particular business process, and I suspect this is true for the bulk of enterprises tackling their first business intelligence project.  They are set up as “pilot” projects or “proofs of concept”, sponsored by an executive or department head with a specific objective in mind.  The limitation of scope (and of time and resources) forces the adoption of a focused methodology that will produce a useful result in the shortest possible time.  That limits the options to the “bottom up” dimensional modeling approach promoted by Ralph Kimball.

Dimensional modeling is a bottom-up approach to data warehousing because what it produces is not a data warehouse but a data mart.  When applied to multiple business processes, it results in a collection of data marts that is referred to as the data warehouse

Occasionally an enterprise looks to overhaul the way it does business entirely.  It does one of two things: a complete business process re-engineering, or the implementation of an enterprise resource planning system (ERP).  These amount to the same thing, but in the latter, the ERP forces change on the business processes according to its own model, while in the former the enterprise remakes itself in response to an examination of its goals, objectives, and the requirements of the market place.  The business then drives the selection or development of the information systems, rather than the other way around.

In these cases, the development of a data warehouse can, and should, proceed from a top-down perspective, in parallel with the rest of the process.  The resulting data warehouse is a truly integrated operational data store, based on the reengineered business model, from which application data marts can be extracted virtually on demand.

The big drawback to the top-down approach is the protracted lead time from project inception to the first usable result, though subsequent applications follow very quickly.  The drawback to the bottom-up approach is that application data marts may not integrate with each other very well. 

Consider, for example, in a hypothetical apparel business, a data mart designed for advertising versus one designed for inventory management.  Inventory management needs to know every unique item at every location at every point in time.  That data mart will use a product dimension based on product UPC, which encompasses style, material, color and size.  Advertising, on the other hand, really only cares about the style, maybe with pictures in different colors, but not unique identifying codes.  And time is not generally a consideration, though locations might be.

If the inventory management data mart had been created first, the more summary data needed by Advertising could be extracted from that.  But if the advertising data mart is created first, the inventory management data mart has to be built from scratch because the advertising data is not sufficiently granular.  Subsequently, the processes that produce the advertising data mart should be reengineered to work with the inventory management data.

For this reason, even when a first business intelligence project is for a specific department or business process, a top-down analysis should be done at the outset in anticipation of future requirements.  The goal of the analysis is to produce what Kimball refers to as the data warehouse bus architecture matrix, in which the business processes of the organization are listed, and related to the dimensions of their information.  The granularity of each dimension needs to be carefully considered during this process.

(Educational sidebar: All data is made up of “attributes”.  Attributes used for organizing and finding data are “key” attributes, and can be used as “dimensions” of the data.  Non-key numerical attributes on which you can operate mathematically to analyze the business are called “facts”.)

The dimensions in the matrix then should be related to primary data sources.  Every data mart must get its data from somewhere – the files and/or databases associated with the enterprise’s legacy applications.  Some of this data is “primary”, meaning it is original data, such as transaction inputs.  Other data is secondary – derived from primary inputs by some other application.  The data warehouse should be populated from primary data sources, to guarantee data consistency across all data marts.  It should be able to reproduce secondary data, to demonstrate that the rules of derivation have been accounted for.

The architecture of a data warehouse as defined by the top-down method, is highly integrated.  It looks quite different from the architecture of a bottom-up design, which is not.  This type of data warehouse is not organized dimensionally, as is a data mart (technically called a “star schema”).  Rather, it is rigorously structured over many tables, with a broad selection of data attributes from which to choose the dimensions and facts of a particular application.  Such a database is not easily queried by non-expert users, and even available query-and-report tools may not be much help, as they generally work best with a star schema.  If the database is large, as it would be for an enterprise data warehouse, the performance (in terms of response time) may be very slow.

Readers familiar with the SAP Business Warehouse product know that this is an exemplary implementation of Inmon’s top-down concepts.  It relies heavily on extensive metadata and a dense layer of middleware to make it work.  (Not that there’s anything wrong with that!).  Thousands of development hours, and millions of investment dollars (or Deutschmarks), made this possible, and if your company buys the SAP Enterprise Resource Planning solution, you get it all for free.  Really. 

But if you haven’t gone that route, a true top-down enterprise data warehouse is probably not a cost-effective solution.  Even if you don’t limit the scope of the project to a single business process, you will get results more quickly by following a methodology closer to Kimball’s.

The Worcester Group, Inc. approach is to create the data warehouse bus architecture matrix first, then identify and prioritize data mart projects.  We then design a dimensioned operational data store, based on the bus matrix, that will deliver the data to satisfy the highest priority projects.  The data marts for individual applications can then easily be extracted from this.  An additional benefit is that many projects that were not high priority will be accommodated by this approach, their development time will be significantly shortened, and therefore their return on investment will be improved.

As the business grows, there may be new business processes added to the matrix.  Most of these will involve the same dimensions previously encountered.  If new dimensions need to be added, this will have no effect on the previously identified processes.  The bus matrix thus insures the data warehouse will be scalable and robust.  It serves as the cornerstone of the data warehouse.