Traditional Culture Encyclopedia - Traditional festivals - RFM model analysis and customer segmentation

RFM model analysis and customer segmentation

RFM model analysis and customer segmentation

According to the research of arthur hughes, an American database marketing research institute, there are three magical elements in the customer database, which constitute the best indicators of data analysis: recency, frequency and currency.

RFM model: R(Recency) indicates the distance of the customer's last purchase, F(Frequency) indicates the number of purchases of the customer in the recent period, and M (Monetary) indicates the purchase amount of the customer in the recent period. Generally, the original data consists of three fields: customer ID, purchase time (date format) and purchase amount. It is processed by data mining software, weighted (considering the weight) to get RFM score, and then customer segmentation, customer grade classification, customer grade value score sorting and so on are carried out to realize database marketing!

Here we borrow the classification diagram of RFM customer RFM @ data mining and data analysis again.

The software tools used in this analysis are: IBM SPSS Statistics 19, IBM SPSS Modeler14.1,Tableau7.0, EXCEL and PPT.

Because RFM analysis is only a small part of the project, it also faces the processing ability of massive data, which requires the memory and hard disk capacity of the computer.

Here are some experiences of massive data mining and data processing: (only for PC operating platforms)

Generally, the data we get are all text files in compressed format, which need to be decompressed, and they are all storage units above G bytes. Generally, it is best to store it in a mobile hard disk with an external power supply; If the customer doesn't tell you, you probably don't know how many records and fields there are;

The default installation of Modeler mining software generally requires data exchange with disk C, and at least 100G space should be reserved, otherwise there will be insufficient space in the process of reading data.

Massive data processing should be patient. It is common to wait for the results for more than 30 minutes, especially in the process of sampling, merging data, data reconstruction and neural network modeling. Otherwise, it would be a tragedy to interrupt for a minute, hehe;

The preparation stage and data preprocessing time of data mining account for 70% of the whole project. I said here that if it is a very large data set, the time may account for more than 90%. On the one hand, processing is time-consuming, on the other hand, it may only be processed by this computer, and several computers cannot operate at the same time;

Bring more differences, which is the experience I have always emphasized. Therefore, massive data need to use sampling technology to view the data and pre-operate. Remember: sometimes even if the sample data is normal, there may be problems with all the data. It is recommended to use' |' to store data separator;

How to emphasize a data mining project, as well as the mining engineer's understanding of the industry and insight into the business, can not be overemphasized. Good data mining must be market-oriented, of course, IT personnel also need to have a good communication mechanism with market personnel.

Data mining will face the understanding of data dictionary and semantic layer, and the management and understanding of metadata will get twice the result with half the effort, otherwise, when the data reconstruction is completed, the problem will be pushed over again, which will be a tragedy;

Every time I do massive data mining, I visit Weibo the most. Really not as fast as me, so I have to wait in Weibo, haha!

The main idea of transforming traditional RFM analysis into telecom service RFM analysis;

The RFM model and customer segmentation here are only a small part of the data mining project. Suppose we get a data set of customer recharge behavior for one month (actually there are six months of data), let's first build an analysis process with IBM Modeler software:

The data structure fully meets the requirements of RFM analysis, with 30 million transaction records in a month!

First, we use the RFM summary node and RFM analysis node of the RFM model of the mining tool to generate R (recency), F (frequency) and M (currency).

Then the RFM analysis node is used to complete the reconstruction and arrangement of the basic data of RFM model;

Now we get the recent score, frequency score, currency score and RFM score of RFM model. Here, the RFM score is divided into five equal parts, and the RFM scores obtained by weighting 100, 10 and 1 show 125 RFM Rubik's cube blocks.

The traditional RFM model is completed here, but there are too many targeted marketing segments (125), and it is necessary to identify customer characteristics and behaviors, so it is necessary to further subdivide customer groups.

In addition, the RFM model is actually just a data processing method, and it can also be completed by data reconstruction technology, but it is simpler and more direct to solidify the RFM module here, but we can use RFM to establish data and use this module to reconstruct data instead of RFM.

We can import the obtained data into Tableau software for descriptive analysis: (Data mining software is very retarded in descriptive and tabulating output, haha)

We can also compare and analyze different blocks: means analysis, block category analysis and so on.

At this time, we can see the convenience of Tableau visualization tool.

Next, we continue to use mining tools to cluster R, F and M fields. Kohonen, K-means and two-step algorithm are mainly used for clustering analysis:

At this time, it is necessary to consider whether to directly use the three variables of R (recency), F (frequency) and M (currency) or to convert them. Because the measurement scales of R, F and M are different, it is best to standardize the three variables, such as Z score (linear interpolation method, comparison method, benchmarking method, etc.)! Another consideration: how to consider the weights of R, F and M, which are obviously different in actual marketing!

Some studies show that Hughes and Arthur think that the weight of RFM variables is the same, so they don't give a different division. Through the empirical analysis of credit cards, Stone and Bob think that each index has different weights, and should be given the value with the highest frequency, the second closest and the lowest.

Here we use the weighting method: simple weighting method, WR=2 WF=3 WM=5 (the actual situation needs to be determined by experts or marketers); Choosing which clustering method and the number of clusters need to be tested and evaluated repeatedly, and at the same time, we should compare which of the three methods is more ideal!

The following figure shows the results of fast clustering:

And the clustering results of kohonen neural algorithm:

Next, we need to determine the significance of clustering results and class analysis: here we can use C5.0 rules to determine the characteristics of different clusters:

Two-step and two-stage clustering feature map:

Use the evaluation analysis node to judge the model recognition ability of C5.0 rules;

The results are not bad. We can choose three clustering methods respectively, or choose a clustering result that is easier to explain. Here, after Kohonen's clustering results are selected to write the clustering fields into the data set, we can import the data into SPSS software for analysis of means and output it to Excel software, which is convenient to use!

After outputting the results, import the data into Excel, compare the three categories of R, F and M with the average value of this field, and give the trend of comparison with the average value by using the conditional format of Excel software! According to the Rubik's cube classification of RFM model, customer types are identified: through RFM analysis, customer groups are divided into six levels: important retained customers, important retained customers, general important customers, general customers and worthless customers; (it is possible that the first level does not exist);

Another consideration is to calculate the standardized scores of R, F and M according to the clustering results, and then sort the comprehensive scores to identify the customer value level of each category;

At this point, if we are satisfied with the RFM model analysis and customer segmentation, the analysis may be over! If we still have a customer background information base, we can use clustering results and RFM scores as independent variables for other data mining modeling work!