Traditional Culture Encyclopedia - Traditional stories - The meaning of data warehouse, the difference between data warehouse and database.
The meaning of data warehouse, the difference between data warehouse and database.
At present, there is no unified definition of the word data warehouse. W.H.Inmon, a well-known data warehouse expert, gave the following description in his book Building a Data Warehouse: A data warehouse is a subject-oriented, integrated, non-volatile and time-varying data set, which is used to support management decisions. We can understand the concept of data warehouse from two levels. First of all, data warehouse is used to support decision-making and analysis-oriented data processing, which is different from the existing operating database of enterprises. Secondly, data warehouse is an effective integration of multiple heterogeneous data sources. After integration, it is reorganized according to the theme, including historical data, and the data stored in the data warehouse is generally not modified.
A database is a place where data (raw materials of information) are loaded.
Data warehouse is a kind of system, and it also uses database to load things.
The difference between data warehouse system (loading things with database) and other basic business systems (such as financial system, sales system, human resources system, etc.). , also use the database to load things) is as follows:
The basic business system is characterized by self-management. For example, if the financial system produces cabbage, a database will be loaded, and the human resources system produces pork, and then a database will be loaded. If I want to cook a dish, I need to go to various databases to get it, which is more troublesome (the reality is that most of the time my uncle who grows vegetables sends it to me, but what I send is not necessarily what I want, and what I want at different times is different, which often makes both sides unhappy). On the other hand, there are some primitive things in every database. I want to take them to cook and go through a very troublesome cleaning process. If I'm not careful, there may be a big caterpillar hidden inside.
Then, the data warehouse system is to build a big supermarket, collect the things produced by farmers' uncles everywhere, clean them and put them away in different categories. In this way, when you want what kind of food, just take it directly from the supermarket.
In the early days, I didn't understand what a data warehouse was.
From a macro point of view, the data warehouse is the place where all the company's data are piled up. The reason why all the data are piled together is to find something valuable in the middle.
Data warehouse is more of a concept. Don't think that data warehouse is a software product called data warehouse.
A data warehouse is actually a database. The related business system database is called OLTP database (for business processing), and this database is called OLAP database (for business analysis).
The concept of data warehouse is based on the following basic requirements:
There are many business systems in the company, so it is not convenient to query the historical data of the business systems. Different business systems often have different management departments and different regions. Can you collect all these data and find out if there are any meaningful business rules?
The database of data warehouse is often very large, because the more data in all data sets of a company, the more valuable discoveries can be found. For example, casually above 100G g.
The composition of data warehouse is very complex, including historical data of business system, personnel and financial data, and some basic data, such as holiday data, geographic information, national information and so on.
The concept of data warehouse includes the program that collects data from the business production system, and it cannot affect the operation of the business system. (belonging to the so-called "ETL" process)
A data warehouse includes long-term historical data of business systems, such as five years, for analysis. (so-called "ODS" data)
A data warehouse includes business flow data relabeled for a business value, such as sales. (the so-called "fact table" and "dimension table").
The concept of data warehouse may also include report generation tools (so-called "BI" tools). These tools can achieve the so-called DSS (decision analysis) effect a few years ago.
The analysis of customer historical data in data warehouse may be related to CRM system.
In a word, a company wants to make full use of the existing historical business data, so it does a data warehouse project. As for the combination of capital letters that scare people, it is only technology to achieve this goal.
Keep in mind the basic requirements of data warehouse, and don't be scared by suppliers.
Data warehouse can be said to be a decision support system, which can help the boss understand the whole picture of the enterprise. After seeing the data provided by the data warehouse, the boss can find out the problems or difficulties or success factors of the enterprise based on his own management experience, and then he can trace the data continuously until the most specific details are determined, so as to continuously improve the management level of the boss or management and the management level of the enterprise. The best example we know is the story of beer and diapers in a large supermarket in America.
A store manager of Wal-Mart in the United States once found that the weekly sales of beer and diapers will increase year-on-year, but it is not clear why. Later, Wal-Mart used business intelligence (BI) technology to find that almost all the customers who bought these two products were men aged 25 to 35 with babies at home, and every time they bought them, they were on weekends. After analyzing the relevant data, Wal-Mart learned that these people are used to watching football matches and drinking beer at night while taking care of their children, and using disposable diapers to save trouble. After getting this result, Wal-Mart decided to put the two products together, and as a result, the sales of both products increased significantly.
Database is the foundation of data warehouse. A data warehouse is actually composed of many tables in a database. It is necessary to filter, extract, summarize and count the database that stores a large amount of operational business data and convert it into a new database. Then the data will be presented. The boss is concerned about the results of the data display.
Another important concept of data warehouse/data mart is that data is transferred from different databases, cleaned, confirmed, integrated and designed into a dimensional framework by ETL tools (such as POWERCENTRE, Decision Stream, SQL Server 2000 DTS and SQL Server 2005 SSIS). It is very important to ensure the correctness, accuracy and integrity of data.
Our current project has been running stably for more than 6 years and has been developed by ourselves. Recently, we have slowly started to use datastage. Many large-scale projects use tools because they are characterized by fast development speed and relatively acceptable efficiency, which allows you to spend more energy on business, database optimization and data testing, regardless of data quality itself.
Data quality is closely related to a series of project engineering processes, such as design (architecture, model, etc. ), understanding of business relations, project management (including communication with customers, compliance with development and testing procedures). This is also the main reason why many projects use ETL tools, but the data quality has not been greatly improved.
The function of data warehouse lies in the centralized management of data. The ultimate goal of centralized management is analysis and prediction.
The so-called ETL. However, this is a necessary process to build a data warehouse. The extraction, transformation and loading of data is the basic work of centralized management, and the description of these data and actions will be described by response metadata.
In the process of data warehouse modeling, we usually adopt multidimensional models, such as star, snowflake and so on. This is characterized by high efficiency and low data redundancy. Therefore, I think it is a one-sided explanation to confuse OLAP with data warehouse.
We can also choose business logic model to build data warehouse, which was done a long time ago. Its characteristics are low efficiency and high data redundancy, but it can realize business logic design that is very difficult to express.
Based on data warehouse, the most important thing is analysis and prediction. In my opinion, history is the essence of present and future data warehouses. .
Data mining and OLAP based on data warehouse are both for analysis and prediction. In order to let users better grasp the present and predict the future, his most effective statement, in my opinion, is the basis for decision makers and managers to analyze and predict in decision management.
In addition, the data warehouse will also serve the purpose of classifying and archiving historical data (just like a library), and then historical information can be easily queried through retrieval conditions; Similar information has been updated in OLTP.
As for its analytical function, just like meteorological archaeological research, the meteorological information at that time was kept in glaciers with different depths. Otherwise, what can be used to predict the climate change trend?
However, there must be considerable management and technical reserves and strong support from management. With the demand and the necessary conditions, you can get started, otherwise your data warehouse is not a supermarket but a garbage dump, "garbage in, then garbage out"!
Therefore, I think it is the improvement of enterprise information construction and scientific management level that gives birth to the inevitable emergence of data warehouse. Don't follow the trend and speculate on the concept. The key is to calmly analyze whether the actual situation of your enterprise has reached the stage of deploying data warehouse!
As for how to convince managers, it needs your efforts. Don't explain the problem from the standpoint of your technicians. The CEO is not interested in technical issues. Think from their point of view and answer questions such as "We have invested so much money and manpower, but at the same time we are facing great risks of system upgrading. What is the purpose? " Remember, CEOs and CFOs (even CIOs) prefer to speak with numbers. You can provide them with valuable decision support reports by analyzing the company's management decision-making process, and department managers (or similar personnel) don't have to make relevant analysis reports every quarter. The saved energy can be used for more valuable things. This is how much money can be saved by greatly improving the utilization rate of human resources in enterprises. I'm afraid the CEO won't use you to prompt!
- Related articles
- What are the advantages of changing fingerprint locks?
- What are the advantages of e-commerce procurement?
- PetroChina, Sinopec, Shell, which one is more resistant to burning a little better power?
- What kinds of wastewater containing phosphorus?
- China's traditional culture is about hard work.
- Overview of Huangshi, the Deputy Center City of Wuhan City Circle
- Traditional shopping can guarantee the quality of the product, but it will be restricted in the choice.
Traditional shopping can guarantee the quality of the product, but it will be restricted in th
- The traditional method of pickling radish does not need a grain of salt. Do you know how to do it?
- How does business reengineering redesign business processes?
- Guangdong engineering costing which school is good