Traditional Culture Encyclopedia - Traditional customs - Data preprocessing step

Data preprocessing step

Data cleaning, data integration, data conversion and data simplification.

1. data cleaning: to "clean up" data by filling in missing values, smoothing noise data, identifying or deleting abnormal values, and solving inconsistency problems. Mainly achieve the following goals: format standardization, abnormal data removal, error correction and duplicate data removal.

2. Data integration: Data integration routines combine data from multiple data sources and store them uniformly. The process of building a data warehouse is actually data integration.

3. Data transformation: data is transformed into a form suitable for data mining through smooth aggregation, data generalization and standardization.

4. Data reduction: In data mining, the amount of data is often large, and it takes a long time to analyze a small amount of data. Using data reduction technology, a reduced representation of data set can be obtained, which is much smaller, but still close to maintaining the integrity of the original data, and the result is the same or almost the same as before reduction.