Traditional Culture Encyclopedia - Traditional stories - What is data collection?

What is data collection?

Data acquisition, also known as data acquisition, is very important today when computers are widely used. It is a bridge between the computer and the external physical world.

Data collection generally needs to follow the following principles:

1. The data acquisition task cannot affect the operation of the business system. Generally speaking, the core business system works frequently during the day, and it is difficult to carry the requirements of data extraction. In this case, the data extraction work should be arranged in non-working hours in principle. Data acquisition task scheduling must be able to set the priority schedule of data acquisition tasks.

2. The data generation cycle of different business systems is different, which will affect the data collection cycle. Data collection should be based on the business system and the periodic requirements of data exchange, and set the data collection time period table.

3. In principle, the execution time of the data acquisition task should be proportional to the data acquisition cycle time, that is, the acquisition task with short (long) data acquisition cycle time interval requires short (long) execution time of the acquisition task. The data collected during the day should be able to be extracted, cleaned, loaded and processed within 3-5 hours; For monthly data collection, data extraction, cleaning, loading and processing can be relaxed to 48 hours.

4. Using ETL tools will consume a lot of resources and time for tasks with large data collection and complicated data conversion operations. It is suggested to write a special data acquisition interface program to complete the data acquisition task and improve the efficiency of data acquisition.

5. For the full-caliber acquisition task with the data source as the unit, you can initialize the data with the data source as the unit. When there is a problem in the data collection operation of a data source, only the data source can be comprehensively collected and recovered, and the data collection of other data sources will not be affected.

At present, 10 1 heterogeneous data acquisition technology can directly collect heterogeneous data without the cooperation of software vendors. This kind of data collection does not need to coordinate with various manufacturers, does not need to spend expensive interface fees, and the construction period will not be too long. It is the first choice for data collection business of large enterprises in many fields.