Before you build a data warehouse, you must define its goals and requirements by conducting a comprehensive business investigation. This process helps you understand the core business needs and the problems the data warehouse must solve.
Business investigation
A thorough business investigation and requirements analysis is the foundation for building a successful data warehouse. Before you start the data warehouse project, ask relevant business personnel to describe their specific operations. This process helps clarify the needs of analysts and operations staff in each team and results in the creation of related documents.
Use surveys and interviews to gather the following information:
The organizational structure and division of responsibilities of the users.
For example, users can include staff from the data analysis, operations, and maintenance departments. Each department has different needs for the data warehouse. You should investigate each department separately.
The overall business architecture, the connections between business segments, and the information flow.
You need to define the framework for your business data.
The main features of and data collected by each existing business segment.
This tutorial uses the e-commerce business of Company A as an example. The following figure shows its business data frame. The e-commerce business of Company A is divided into four modules: investment, supply chain, marketing, and service. The needs and data applications are different for each module. Before building the data warehouse, you must first clarify which business segments it will serve and the specific business needs it must meet.
In addition, you must understand the existing data feature modules within each business segment. Data feature modules are usually tightly coupled with business segments. They correspond to one or more tables and can serve as data sources for the data warehouse. The following table shows the data feature modules for a marketing business segment.
Data feature module | Company A E-commerce Marketing Management |
Product Management | Y |
User Management | Y |
Purchase Flow | Y |
Transaction Orders | Y |
User Feedback | Y |
Y indicates that the data feature module is included. N indicates that it is not included.
In this tutorial, assume the user is a marketing data analyst for an e-commerce business. The data requirements include the total sales of a specific category, such as kitchenware, by province for the previous day, the names of the top 10 products by sales in that category, and the distribution of customer purchasing power (average spending per person) by province. This data is used for marketing analysis. The ultimate business goal is to use marketing analysis to achieve precision marketing for the category and increase total sales. Based on the business investigation, this tutorial focuses on analyzing the Transaction Orders data feature module of the marketing business segment.
Requirements analysis
A data warehouse built solely on the results of a business investigation, without considering the data needs of analysts and business operations staff, will have poor availability. After you complete the business investigation, you must gather the requirements of data users. Then, you need to perform an in-depth analysis of these requirements.
There are two ways to analyze requirements:
Gather requirements by communicating with analysts and business operations staff.
Analyze existing reports in the reporting system.
During the requirements analysis phase, you must identify the metrics from business analyses or reports, along with their definitions and granularity. Granularity can be used as an input for dimensions. Answering the following questions is helpful for subsequent data modeling:
What dimensions and granularity are used to aggregate business data, and what are the measures? For example, transaction volume is a dimension, and the number of orders is a measure of transaction volume.
How should the data warehouse detail layer and data warehouse summary layer be designed? How should the common dimension layer be designed? Are there any common metrics that can be used?
Does data need to be made redundant or stored in the data warehouse summary layer?
For example, a data analyst needs to know the total transaction amount for the kitchenware category in Company A's e-commerce business. When you receive this requirement, you must analyze what to aggregate (measure), by what criteria (dimension), and over what scope (granularity). For example, category is the dimension, amount is the measure, and the scope is the entire table. In addition, you should consider factors such as how to design the detail and aggregate data, whether it is a report for a common layer, and whether the data needs to be stored in an aggregate table.
The output of a requirements analysis is typically a document that records atomic and derived metrics.