Traditional Culture Encyclopedia - Traditional customs - How can enterprises better build a data warehouse?
How can enterprises better build a data warehouse?
With the deepening of computer applications, a large amount of data is stored in computers, and the storage, management, use, and maintenance of information become more and more important, while traditional database management systems are difficult to meet their requirements. In order to solve the problem of large data volume, heterogeneous data integration and the response speed of accessing data, data warehouse technology is used to provide an effective method for end-users to deal with the required decision-making information.
1 Data Warehouse
A data warehouse is a subject-oriented, integrated, non-volatile, and time-varying collection of data that provides support to managers for decision making. A data warehouse is a structured data environment that serves as a data source for decision support systems and online analytical applications.
In terms of the current development of data warehousing, data can be stored in different types of databases, and data warehousing is the storage of heterogeneous data sources at a single site organized in a unified model to support management decisions. Data warehouse technologies include data cleansing, data integration, on-line analytical processing (OLAP) and data mining (DM).OLAP is a multidimensional query and analysis tool that supports decision makers to analyze data from multiple perspectives and at multiple levels around a decision topic.OLAP focuses on interactivity, fast response time, and the provision of multidimensional views of the data while DM focuses on the automated discovery of patterns and useful information hidden in the data. OLAP analysis results can provide DM with analytical information as the basis for mining; DM can expand the depth of OLAP analysis and can discover more complex and detailed information that OLAP cannot.OLAP is on-line analytical processing, and DM is a method and technology for obtaining knowledge by analyzing data in databases and data warehouses, i.e., by building models to discover patterns and relationships hidden in an organization's database. The combination of these two can meet the requirements of enterprises for data organization and information extraction, and help the top management of enterprises to make decisions. In the developed countries of Europe and America, online analytical processing and data mining applications based on data warehousing were firstly successful in traditional data-intensive industries such as finance, insurance, securities and telecommunication, etc. Powerful companies such as IBM, oracle, Teradata, Microsoft, Netezza and SAS have launched data warehouse solutions one after another.
In recent years, the popularity of "distributed data warehousing" has begun to spread, is in multiple physical locations to apply the global logical model. The data is logically divided into multiple domains, but there are no duplicates in different locations. This distributed approach allows for the creation of secure zones for different physical data or 24/7 service for users in different time zones around the globe. In addition, there are data warehouse hosting services initiated by Kognitio, where the DBMS vendor develops and runs the data warehouse for the customer. This initially appeared in business units, which purchased hosted services instead of using data warehouses provided by the IT department within the organization.
2 Data Mining Technologies
DataMining, also known as Knowledge Discoveryin Database (KDD), is the process of extracting implicit, unknown, non-trivial, and potentially application-valuable and ultimately user-understandable patterns from large databases or data warehouses. schema process. It is a new area of database research with great application value, and is an application of techniques such as artificial intelligence, machine learning, mathematical statistics and neuronal networks in the specific field of data warehousing. The core module technology of data mining has been developed over decades, which includes mathematical statistics, artificial intelligence, and machine learning. From a technical point of view, data mining is the process of extracting information and knowledge implicitly unknown to people, but potentially useful, from large amounts of incomplete, noisy, fuzzy, and random actual data. From the perspective of business application, data mining is the brand-new business information processing technology, whose main feature is to extract, transform, analyze and pattern the large amount of business data in the commercial database, and extract the key knowledge from it to assist business decision-making.
From a technical point of view, data mining can be applied to the following aspects:
(1) Association rule discovery is to discover association rules that satisfy certain conditions in a given set of things, which, in simple terms, is to excavate the interrelationships hidden in the data to provide guidance for the business theme.
(2) Sequential pattern analysis is similar to association rule discovery, but its focus is on analyzing the relationship between data. Patterns are ordered in time. Sequential pattern discovery is the discovery of all ordered sequences in a database of time-related things that satisfy a user-given minimum support domain value.
(3) Classification analysis and clustering analysis, Classification rule mining is actually the process of discovering ***ness from data objects and classifying them into different classes based on classification models. Clustering time is to divide n data objects in d-dimensional space, into k classes, so that the similarity between data objects within a class is higher than data objects in other classes. Cluster analysis can discover the characteristics of a group of data objects that are not labeled by a category and summarize the characteristics of a category.
(4) Automatic trend prediction, data mining can automatically look for potential predictive information inside large databases. A typical example of using data mining for prediction is target marketing. Data mining tools can identify customers who are most likely to respond to future mailings based on large amounts of data from past mailings.
3 Online analysis (OLAP) processing technology
Online analysis (OLAP) is the data warehouse to achieve the important tools to provide support for decision-making, is *** enjoy multi-dimensional information, for a specific problem of online data access and analysis of rapid software technology. Is to enable analysts, managers or executives from a variety of perspectives on the transformation from the raw data, can be truly understood by the user, and truly reflect the characteristics of the enterprise dimension of the information for rapid, consistent, interactive access to obtain a deeper understanding of the data of a class of software technology (OLAP Committee's definition). the characteristics of the OLAP include: ① rapidity: the system should be able to within 5s on the most of the user's analysis requirements to respond; ② analyzable: can handle any logical and statistical analysis related to the application; ⑨ multidimensionality: multidimensionality is a key attribute of OLAP. The system must provide a multi-dimensional view and analysis of the data, including full support for hierarchical and multi-level dimensions; ④ informational: the system should be able to obtain timely information, and can manage large volumes of information.
The data structure of OLAP is multidimensional, and currently exists in the following ways: ① Hypercube structure (Hypercube), which refers to describing an object with three or more dimensions, each perpendicular to each other. Measurements of the data occur at the intersection of the dimensions, and all parts of the data space have the same dimensional properties (contraction of the Hypercube structure. This structure has greater data density, fewer dimensions of data, and can incorporate additional analytic dimensions); and (ii) Multicube, where the hypercube structure is turned into a subcube structure. Oriented towards a particular application on the dimension segmentation, it has strong flexibility and improves the efficiency of analyzing data (especially sparse data). Analysis methods include: slicing, dicing, rotating, drilling and so on.
OLAP is also known as *** enjoy the rapid analysis of multi-dimensional data FASMI, applied in data-intensive industries, such as marketing and sales analysis, analysis of e-commerce, marketing based on historical data, budgeting, financial reporting and consolidation, management reporting, rate of interest, quality analysis, etc..
4 Summary
Decision support systems realized using data mining and on-line analysis techniques of data warehousing is an effective way to make up for the lack of capacity of traditional assisted decision-making systems, which is of great practical significance.
- Previous article:Can Chinese traditional patterns be freely combined?
- Next article:Appreciation of Japanese traditional art painting
- Related articles
- How to prevent earthquakes in dry-fence buildings
- My world magic stage infinite magic bow and arrow. How to do it specifically?
- What are the methods of inventory management?
- When did the local tax and the national tax merge?
- How to use umbrella net fish cage
- The little-known birthplace of the She nationality is actually hidden in the deep mountains at the junction of Guangdong and Fujian, which is as beautiful as a paradise.
- Shenyang where the light show 2023
- The custom of the 23rd year.
- There are many ways to eat peanuts. Do you know frosted peanuts? How to make delicious ground peanuts?
- Question about the black and white striped scarf. (Urgent! ~)