Analysis and Discussion on the basic architecture of Web Data Warehouse

The purpose of

data warehouse is to construct an integrated data environment for analysis and provide decision support (Decision, Support) for enterprise. In fact, the data warehouse itself does not produce any data, but also does not need any "consumption" data, data from an external source, and open to external applications, which is why the name "warehouse", and the reason is not called "factory". Therefore, the basic framework of data warehouse mainly includes the process of data inflow and outflow, and can be divided into three layers – source data, data warehouse, data applications:


can be seen from the chart of data warehouse data from different data sources, and provide a variety of data applications, applied to the upper open data into data warehouse from top to bottom, while the data warehouse is a middle platform integrated data management.

data warehouse for data conversion and data in the data warehouse and the data flow from the source can be considered ETL (Extra extraction, Transfer conversion, Load loading) process, ETL is the data warehouse line, also can be considered as a data warehouse in the blood, it maintains the data in data warehouse and The NEW supersedes the old.. Management data warehouse daily maintenance work and most of the energy is to keep the normal and stable ETL.

below mainly introduces the data warehouse architecture of each module, of course, the data warehouse introduced here mainly refers to the website data warehouse.

Data sources for

data warehouses

, in fact, a previous article has introduced data warehouses, various types of source data — the source data types of data warehouses, so this is no longer detailed.

data warehouse for the website, click stream log is one of the main sources of data, it is the basis of data analysis of database website; website is not less, the record of this website operation data and user operation results, analysis of the website Outcome of this kind of data is more accurate; the other is outside the site the Department may produce documents and other types of useful data for the company decision-making.

Data storage of

data warehouse

source data is exported through the daily task of ETL, and is stored in the data warehouse in the form of characteristics after conversion. In fact, this process has been controversial, is what the data warehouse need to store the details of the data, one point of view is the data warehouse for analysis, so as long as the multi-dimensional storage specific demand analysis model; the other side is the data warehouse to establish and maintain the details of the data, then according to the demand aggregation and data processing in detail generate specific analysis model. I prefer a back view: data warehouse does not need to store all of the raw data, but the data warehouse needs to store the details of the data, and import data must be after the reorganization and the transformation of subject oriented. Simply >