Traditional Culture Encyclopedia - Traditional stories - What are the main things to master in BI?

What are the main things to master in BI?

Business Intelligence, also known as BI, is an acronym for the English word Business Intelligence. The concept of Business Intelligence was first introduced in 1996. At that time, BI was defined as a class of data warehouse (or data mart), query reports, data analysis, data mining, data backup and recovery and other parts of the composition of the technology and its applications for the purpose of helping enterprises make decisions. At present, business intelligence is usually understood as a tool that transforms existing data in an organization into knowledge to help the organization make informed business decisions. The data we are talking about here includes orders from the enterprise's business systems, inventory, transaction accounts, customers and suppliers from the enterprise's industry and competitors, as well as from the enterprise's other external environment in a variety of data. The business operations decisions that BI can assist with can be at the operational level as well as at the tactical and strategic levels. In order to transform data into knowledge, technologies such as data warehouses, online analytical processing (OLAP) tools and data mining need to be utilized. Therefore, from a technical point of view, business intelligence is not a new technology, it is just a comprehensive use of data warehousing, OLAP and data mining technologies.BI is a factory:

>> BI's raw material is a huge amount of data;

>> BI's product is the information and knowledge processed from the data;

>

>> BI pushes these products to business decision makers;

>> Business decision makers utilize the products from the BI factory to make the right decisions and promote the growth of the enterprise;

This is Business Intelligence, or BI - connecting data with decision makers, turning data into value.

The two main categories of BI applications are information applications and knowledge applications, and their characteristics are shown in the table below:

Information BI applications:

refers to the processing of raw data from the data query, reports and charts, multi-dimensional analysis, data visualization and other applications, the **** the same characteristics of these applications is: the data will be converted into information acceptable to decision makers to show to decision makers.

For example, it is possible to transform data into information that is accessible to decision makers.

An example is the processing of bank transaction data into bank financial statements.

They are only responsible for providing information, not actively analyzing the data.

For example, bank financial statement tools do not have the ability to y analyze the relationship between customer churn and bank interest rates, but instead rely on the decision maker to combine the information with human thought to arrive at knowledge.

Knowledge-based BI applications:

This refers to the use of data mining techniques and tools to uncover the relationships implicit in the data, and the use of computers to directly process the data into knowledge that can be presented to decision makers.

It will take the initiative to explore the data association relationship in the data, discover the implicit knowledge that the decision maker's human brain can not quickly discover, and present it in front of the decision maker in the form of comprehensible.

(3) Overview of BI Primary Application Patterns - Querying

Querying is the simplest BI application, belonging to the legacy of MIS systems, and is still the most direct way for decision makers to get information, despite its relatively old-fashioned origins.

Nowadays, the data query interface has completely got rid of the traditional SQL command line, a large number of drop-down menus, input boxes, list boxes and other elements, and even the mouse drag-and-drop interface will be the background to do the hard work of the SQL statement packaged into a beautiful and incomparable data acquisition system, but the essence of the data query still has not left the major elements:

>> what to check

<

>> Where to look up

>> Filtering conditions

>> Presentation methods

Currently the more popular foreign data query applications have completely unleashed the flexibility of data querying, such as the figure on the right is the data query interface of Cognos ReportNet Query Studio, allowing users to use a pure browser interface, allowing users to use a pure browser interface to query data.

The Query Studio allows you to define the elements of a data query with drag-and-drop mouse operations through a pure browser interface and present the data in a variety of ways, including reports and charts.

(4) BI primary application model overview - reporting (Reporting)

Reporting is one of the most enthusiastic BI applications in the country, which is inseparable from the historical status of reporting in China's enterprises and institutions. Our reports are known for their bizarre formatting, centralized data, and odd rules, which have made countless foreign reporting tools and BI tools beat their chests.

The two main elements of the report are data and format, if there is no format, the report application is almost equivalent to the data query application. It can be said that the report is the query out of the data in accordance with the specified format.

The reporting application includes two modules: report presentation and report creation. Report presentation is to allow decision makers to see the report, and allow decision makers to select the report data through the definition of conditions, such as selecting the report year, department, organization, etc.; report production is oriented to the report developers, the flexibility of the formatting definition, the flexibility of the data mapping, the richness of the calculations, etc. have an impact on the quality of the BI report application.

It should be clarified that Microsoft Excel is not considered a BI reporting tool because Excel does not have the ability to connect to a data source, it is at best a Spread Sheet, but Excel's powerful formatting capabilities have made report creators go out of their way to do so, and even later on, almost all BI vendors provided add-ins for Microsoft Excel, and through the add-ins, the report creators could create reports for their own use. Microsoft Excel plug-ins, through the plug-ins, Excel can be connected to the BI data source, transformed into a BI reporting tool, the ugly duckling into a swan.

5) BI advanced application model overview - online analysis (OnLine Analytical Processing, OLAP)

OLAP, that is, Online Analytical Processing, is a new way of looking at the data brought by BI, is one of the core technologies of BI.

OLAP, or Online Analytical Processing, is a new way of looking at data that is one of the core technologies of BI.

As we know, data is stored in a database as a data table, for example, the sales data of a store is stored in a data table as shown below:

Sale time

Sale location

Products

Quantity of sales

Sales amount

2004-11-1

Beijing

Soap

10

342.00

2004-11-6

Guangzhou

Oranges

30

123.00

2004-12-3

Beijing

Banana

20

12.00

2004-12-13

Shanghai

Oranges

50

189.00

2005-1-8

Beijing

Soap

10

342.00

2005-1-23

Shanghai

Toothbrush

30

150.00

2005-2-4

Guangzhou

Toothbrush

20

100.00

Decision makers often want to know the distribution, percentage of

>> Which product had the largest increase in sales in 2005 over 2004?

>> What was the percentage distribution of sales by product in 2004? ......

Faced with such a need, it is necessary to perform a large number of SUM operations with SQL statements, and every time you get the result of a problem, you need to SQL SUM. Faced with the above 7 records, we can easily get the result, but when we face millions or even billions of records, for example, mobile company call data, every time SQL SUM. However, when we are dealing with millions or even billions of records, such as call data from a mobile company, each SQL SUM requires a lot of time to calculate, and the decision maker often puts forward the analysis requirements on the first day, and then waits until the next day to get the results, which is an "offline analysis" and is very inefficient.

In order to improve the efficiency of data analysis, OLAP technology completely breaks the record as the unit of data browsing, and the data will be separated into "dimensions (Dimension)" and "measure (Measure)":

&

>> Dimension is the perspective from which the data is viewed, e.g., "time of sale", "place of sale", "product" in the example above;

> Measure is the angle from which the data is viewed. gt;> Metrics are the quantitative values that are being looked at, such as "Number of Sales" and "Sales Amount" in the example above;

In this way, we can transform the above flat list of data into one with three dimensions Cube:

And the process of exploring the data is to identify a point in the cube and look at the metrics for that point:

Of course, the data cube is not limited to three dimensions, and the three dimensions are used to illustrate the problem simply because that is the limit of what can be represented graphically.

Dimensions can be hierarchical, for example, time can be summarized from day up to month and year, product can be summarized up to food and daily necessities, location can be summarized up to North and South China, and the user can drill down (Drill Down) and summarize up (Roll Up) arbitrarily along the hierarchy of the dimensions:

By doing so, we can get rid of the SQL SUM speed constraints, quickly locate the details of the data that meet different conditions, and more quickly get a certain level of summary data. OLAP technology provides decision makers with a multi-angle, multi-level, high-efficiency data exploration, decision makers are no longer bound by a fixed drop-down menu, query conditions, but by decision makers to lead the acquisition of data by thinking, any combination of analysis angle and analysis goals, this breaks the traditional interaction with the data. Analysis of the target, this breaks the traditional interactive analysis and high efficiency to make OLAP become the core application of the BI system.

(*) Fourth spray: BI advanced application mode -- data visualization and data mining

(6) BI application mode overview -- data visualization (Visualization)

Data visualization applications are committed to presenting information in as many forms as possible, with the aim of enabling decision makers to quickly access the knowledge embedded in the information through the graphical representation of this intuitive, such as trends, distribution, density and other elements. It is worth mentioning that GIS software vendors, represented by MapInfo, are also making efforts to combine BI applications. MapInfo pioneered the concept of Location Intelligence, which relies on geographic information systems to show the value of attributes in each region, such as population density, industrial output value, the number of hospitals per capita, etc., and this visualization application is partly overlapped with the BI data visualization application. This visualization application partly overlaps with BI data visualization application and forms a strong complement, sometimes in a project with each other.

The Cognos Visualizer product, shown above, presents data and information in a rich, almost claptrap-like format, with nearly fifty graphical representations, including maps, pie charts, waterfall charts, and more, in both 2D and 3D. All of the graphical elements are movable, for example, users can click on a province on the map to drill down to information about the various cities in that province, and this interactivity is a significant difference between BI and ordinary image generation software.

(7) Overview of BI application models - Data Mining

Data mining is the most advanced BI application, because it can replace part of the human brain function.

Data mining belongs to the knowledge discovery (Knowledge Discovery) in the structured data special case.

The purpose of data mining is to analyze a large amount of data by computer, find out the hidden laws and knowledge between the data, and present it to the user in an understandable way.

The three main elements of data mining are:

>> Techniques and Algorithms: Currently commonly used data mining techniques include -

Auto Cluster Detection (Auto Cluster Detection)

Decision Trees (Decision Trees)

The data mining technique is to analyze a large amount of data by computer to find out the latent patterns and knowledge between the data and present them to the user in an understandable way. Decision Trees)

Neural Networks

>> Data: Because data mining is a process of mining the unknown in the known,

so it needs a large amount of data accumulation as a source of data, the greater the accumulation of data

the greater the amount of data, the more the data mining tool will have more reference points.

The more data is accumulated, the more reference points the data mining tool will have.

>> Predictive modeling: that is, the business logic that requires data mining is simulated by the

computer, which is also the main task of data mining.

Compared with the information class BI applications, data mining as a representative of the knowledge class BI applications are still immature, but from another point of view, the development of data mining space is still very large, is the future development of BI key direction, SAS, SPSS and other knowledge class BI application vendors image gradually tall, quietly occupy a new profit growth point.

The above figure is the famous IBM Intelligent Miner in the analysis of customer consumption behavior. It can analyze a large amount of customer data, and then automatically divide customers into several groups (automatic category detection), and display the consumption characteristics of each group, so that decision makers can make a clear picture of the consumption habits of different customers, make promotional plans or advertising plans.

The above functions, if realized by information BI applications alone, would require decision makers to do a lot of OLAP analysis and data querying based on their experience, and they may not be able to find the hidden patterns in the data. For example, the above customer categorization, for a bank with 4 million users, if there is no data mining tool, it will exhaust people alive.

(8) BI base - data warehouse technology (Data Warehouse)

Before we start spraying this topic, let's take a look at the official definition of the data warehouse:

Data Warehouse (Data Warehouse) is a subject-oriented ( Subject Oriented), integrated, and integrated. Subject Oriented), integrated (Integrate), relatively stable (Non-Volatile), reflecting historical changes (Time Variant) collection of data to support management decisions. The above is the official definition of a data warehouse.

"Operational database" such as the bank bookkeeping system database, each business operation (for example, you saved 5 dollars), will immediately be recorded in this database, in the long run, full of stomach accumulation of fragmented data, this kind of dirty work is not idle database called "Operational database", oriented to business operations.

"Data warehouse" for decision support, analytical data processing, unlike operational databases; in addition, the data warehouse is an effective integration of multiple heterogeneous data sources, integrated in accordance with the theme of reorganization, and contains historical data, and data stored in the data warehouse is generally no longer modified.

The relationship between operational databases, data warehouses, and databases is like the relationship between C:, D:, and hard disks; the database is the hard disk, the operational database is C:, and the data warehouse is D:, and the operational databases and data warehouses are all stored in databases, except that the table structure is designed for different modes and uses.

So why add such a layer of "data warehouse" between operational database and BI?

One is because the operational database is busy day and night to respond quickly to the business as the main goal, there is no energy to serve the BI side of the data needs, and the BI side of the data needs are usually summarized, a select sum (xx) group by xx can make the operational database to consume a lot of resources, the business process to keep up with the time, the trouble is big, for example, you saved 5000 yuan of money, found that ten minutes For example, you deposited 5000 dollars, found that ten minutes after the money is not yet accounted for, how do you feel? Must be the bank's leadership in the pie chart?

Two is because there are generally more than one application in the enterprise, corresponding to a number of operational databases, such as human resources library, financial library, sales documents library, inventory goods library, etc., BI in order to provide a panoramic view of the data, it is necessary to integrate these dispersed data, for example, in order to achieve a fusion of sales and inventory information OLAP analysis, BI tools must be able to efficiently obtain the two databases of data, then the most important thing is that you can use the data in the database. For example, in order to achieve an OLAP analysis that combines sales and inventory information, BI tools must be able to efficiently obtain data from both databases, and the most efficient way to do this is to integrate the data into the data warehouse first, and the BI application will uniformly fetch the data from the data warehouse.

Consolidating data from scattered operational databases into a data warehouse is a big deal, and has spawned a market for data consolidation software. This integration is not simply stacking tables on top of each other, but must extract the dimensions of each operational database, set *** the same dimensions as *** the use of dimensions, and then database tables containing specific metrics according to the theme of the unification of a number of large tables (the term "fact table", Fact Tables), in accordance with the The data warehouse table structure is built according to the dimension-measure model, and then the data is extracted and transformed. Subsequent extractions are typically performed incrementally on new data when the operational database load is low (e.g., early in the morning), so that the data in the data warehouse builds up.

Most BI applications do not require access to real-time data, such as decision makers, just need to see last week's weekly report every Monday, 95% of BI applications do not require real-time, allowing data to have a lag ranging from 1 hour to 1 month, which is characteristic of the application of the decision support system, this lag interval is the time of the work of the data extraction tool. Of course, the BI application will usually contain very few real-time data requirements, then only for these special needs, the BI Querying software directly connected to the business database on it, but must limit the load, prohibit complex queries.

The current database products are optimized for data warehousing, for example, in the installation of MySQL's high version, the installation into the sequence will ask you whether you want to let the database instance as Transaction-Oriented, or Decision Support, the former is an operational database, the latter is the data warehouse (decision support, again), for the database instance as Transaction-Oriented, or Decision Support, the former is an operational database, the latter is the data warehouse (decision support, again). The database will provide targeted optimizations for both forms.

(9) BI lace

BI knowledge is roughly this, write some lace as a conclusion.

BI key: BI can not deal with unstructured data, can only deal with digital information, but in the enterprise, there are a lot of unstructured data such as text, streaming media, pictures, etc., which also contain a lot of value, but in the face of these data, the current BI tools can not help. The more reliable is IBM Intelligent Miner for Text, but it seems to be very weak in dealing with Chinese.

BI vendors and products:

First of all, let's get to know the big foreign names! In terms of data warehousing, there are IBM DB2, Oracle, Sybase IQ, NCR Teradata, and so on; in terms of BI applications, there are Cognos, Business Objects, MicroStrategy, Hyperion, IBM, and so on; in terms of data mining, there are IBM, SAS, SPSS, and so on. The giant Microsoft also inserted a leg in the BI field, launched the SQL Server Analysis Server, Reporting Services and other BI-related products to seize the hill!

We tend to capacity to only look at foreign BI big brothers and ignore the gradual emergence of the domestic BI newcomers, now the more famous domestic BI Power-BI Aowei Zhidong, BlueQuery and Run dry reports, etc., it is worth mentioning that Aowei Zhidong Power-BI is a standardized BI, in the country already has a certain market share. The company's Power-BI is a standardized BI that has gained a certain market share in China.

China's BI market development:

Time period

Domestic BI applications

Before 2002

A large number of BI software is seen as a reporting work that can extract data from multiple data sources, and is full of reports.

At first, company sales pitched the product as "we're the strongest in BI ......", which didn't work well; then those sales finally got the hang of it, and came up with "we can do all kinds of reports! can do all kinds of reports!" And then the orders kept coming.

2002-2003

The value of OLAP was finally discovered by some discerning eyes, some of the competitive pressures of the enterprise in order to improve competitiveness, the urgent need to mine the value of historical data, quickly discovered the advantages of OLAP, then sales finally do not have to say "we can do any report ". But the state organs, monopoly-type enterprises, is still the report, and thought BI is the report.

2004

With the implementation of more and more successful BI projects, OLAP was finally able to see the light of day, and it was then that the formation of a reasonable BI application structure of data query + report display + OLAP analysis. Some of the data visualization needs have been raised from time to time by users, in some competitive, large data volume of enterprises, there has been a data mining applications.

2005

The provision of information has been unable to meet the requirements of many enterprises, especially banks, communications, securities and other highly competitive, risk-intensive industries, a large number of emergence of the demand for data mining, BI applications have finally formed a whole of information + knowledge.

The problems encountered by BI tools in China:

* Complex table samples: China is the most complex country in the world report. China's table design thinking is different from the West, which tends to use only one report to illustrate a problem, while Chinese reports tend to focus on as many problems as possible in a single report, which directly leads to the complex format and bizarre style of Chinese reports.

* Big data volume: China is the most populous country in the world. Taking China Mobile as an example, the number of users in just one province of China is equivalent to the population of a medium-sized country in Europe, which is truly massive data! Foreign databases, data warehouses and BI application software, are in China to withstand the test of large data volume carrying capacity. For the U.S., a customer analytics application may be able to produce results in two seconds, but in China, such a data volume, it is not a matter of two seconds.

* Data writeback: China is the world's most peculiar BI system requirements of the country. Originally, the BI system is to faithfully reproduce the source data for the principle, but this principle in China has encountered difficulties, many leaders have put forward data modification needs, "the report does not look good in the figures, we need to be able to change ah, and sometimes need to be adjusted ah, so that the higher-ups look at the good thing! "A leader said so. At present, the only two BI products that can meet this requirement are Microsoft and MicroStrategy. Microsoft is considered to have eaten through the Chinese market.