Basics of a Data Warehouse

Summary

Why should you build a Data Warehouse? What is it good for, copying the source system into another database other than having the data twice? We just report against the ERP system and hence we always have current data! What is a Data Warehouse actually?

At the beginning these are very common questions one should be able to answer at any time. Otherwise we are building a solution nobody needs. So let us go straight by the book and list the pros and cons.

What are the advantages of Reporting against the ERP system without a DWH?

  • Data is current. No matter when somebody executes a report, you will get the data as it is right now. Just imagine somebody ordered a product a second before now he would look at the open orders report - and not seeing his recent order.
  • No different system the user has to be trained on, has to be administrated etc just to view reports. It is all inside the ERP system's application.

And the disadvantages? For this we should have a look at the list of possible reasons to build a Data Warehouse

Business reasons for building a Data Warehouse

To enable end users to build their own reports

A user needs a new kind of report, so what does he have to do in the ERP system? He needs to find somebody who knows the involved database objects (tables, views, procedures) and their relationship with others to get masterdata. As the ERP system is designed to represent the processes of the company, the data model is very complex, too complex for average users to understand. In a DWH the data model is simplified so much that end users do not need help to build the report for himself.

There might be multiple ERP systems

If the user needs to combine data from multiple systems, e.g. US ERP system plus Europe ERP system data or Sales system with Finance system data, bringing all the sources together into one DWH database and do the integration work for the end user already has its merits.

The ERP system does not contain all the data

Remember, the users should be enabled to build the reports themselves. Quite often they will ask for more data than available in an ERP system, e.g. market share research to compare the own performance with the market average. Public data like weather, am I selling more if the weather is nice?

Not reports but insight is looked for

With the term "report" usually simple operational reports are meant. But those can be built on the ERP system directly or even exist already. The idea behind a DWH is to gain insight, not to run a report. And insight is very often related to external data.

Historical data

As said, the ERP system is a replica of the processes found in the reality. A customer moved, so the customer address is updated. The ERP system does not care about the old address anymore. For analysis the old address might be key, e.g. are people living in that area good customers? You would need to compare last years sales numbers with last years address. Other typical example are sales orders. The order value might get updated, the order entry amount has to show the changes over the time, e.g. +100 initially, a week later it got changed by -10.

Stable data

One of the key questions is if the DWH should contain realtime data or not. There are pros and cons for both. Obviously if you want the analysis to include a booking made a second ago, it better is a realtime feed. But imagine you two people run the same query, they will show different data if bookings have been created in the meantime. In a pure play definition of the term "Data Warehouse" it is implied the data to remain static over the day. In reality for certain areas or use cases you might want to enable realtime feeds. Still considering that there are Operational reports better executed on the source system, analysis is what the DWH is built for.

IT reasons for building a DWH

IT is swamped with request to build more reports

As the ERP data model is too complex for users to understand, IT has to build the reports. All reports. Whenever the information requirement changed for an enduser. And quickly, the user needs to make decisions based on the data now. With a DWH the user can most of the work themselves.

Reports should not effect operational performance

Somehow it has to be guaranteed that the reports running do not effect the OLTP performance, entering sales orders is more important than reports. A simple way to guarantee that is having two servers, two databases, one for OLTP and one for analysis.

And what is a Data Warehouse actually? Try this statement:

DWH Definition
A Data Warehouse is an Information System for the unexperienced End User.

Impact of above

From this one sentence other facts can be derived

  • Simple to use for none-IT people. On the other hand different users have different skills and needs. One might be able to use his application only and will want to view standard reports only, another one knows more about Excel than anybody else and will have lots of requirements.
  • Complexity has to be hidden, the data has to be converted from database fields of the ERP system to Business Terms using certain rules. Example: To get Gross Revenue (Business Term) sum up all the rows of the table ORDERS, the column AMOUNT, but take the AMOUNT negative in case the ORDERS.IS_RETURN flag is set as this is a order return.
    Either all this knowledge is known to the report developer - or the data gets transformed while loading the Data Warehouse.
  • Response times are short. An Information System is like having a discussion about business. What was yesterdays revenue? A million only! Why that - show me per region! It dropped in one region only? Ohh, yes, all offices have been closed at this day...
    What would you think about a discussion like above if your counterparts looks at you absently for 1sec..2sec..3sec..4sec..5sec..........20sec and then gives you his response? Sooner or later you would not be motivated to discuss at all and thus not using the system, although your business would have benefit from. So responding in a timely manner is key!
    That does not necessarily mean, responses times have to be in the (sub) second area. It also depends on the expectation. If you ask "What will be my tax return for this year?" you do not expect an immediate answer rather than digging through all you bills etc. For other questions like "What was the amount on last paycheck" you expect an immediate answer. So it is twofold, setting expectations and responding within that timeframe.
  • Can be extended easily. Remember, a DWH is an Information System. So people expect Information from it. But the Information demand is changing over the time. If it cannot be extended with new information, this system will just be yet another application, not THE Information System.
  • Can answer typical business questions. Although an Information System should be very flexible, most of the time standard questions will be asked. These questions the users should get to very easily with shortest response times possible.
  • Measures can be combined to get new insights, seeing relationships between different measures is key. A trend of revenue figure and a trend with head counts is nice, putting both together in one graph provides new information. We can also refer to this as Data versus Information.
  • Every user query returns correct results. People will start to build their own reports. If these reports are not looking good, taking long time etc, it very likely will be the report creators or the tools fault. But if a simple query returns wrong data, e.g. because everybody has to know a business rule for correct calculation, the DWH data model is to be blamed for. We have to make sure that all queries return correct data always, by defining a useful data model, by using the right tools (semantic layer).

And a Data Warehouse for sure is not...

  • A large database....as the bigger it is, the more complex and slower it is
  • A copy of the source database....as there everybody has to know the business rules whereas in the Data Warehouse the numbers are already converted to Business Terms.
  • A Product....it is more a philosophy of sharing information and thus make all employees know the facts. This is a huge change for an organization!



Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.