Traditional Culture Encyclopedia - Traditional culture - What are the disadvantages of traditional data warehouses

What are the disadvantages of traditional data warehouses

Traditional databases store data in chunks. Simply put, the more fields you have in a table, the more data space it takes up, and then queries are likely to span chunks of data. In large systems a table has hundreds of fields, and it is possible to have hundreds of millions of pieces of data in the table. Therefore, it will bring the bottleneck of database query. The number of records in a table in a database has a very large impact on the performance of the query. The general solution is to split tables or libraries to balance the pressure of database operations, which then brings new problems, such as: distributed transactions, generation of globally unique IDs, cross-database queries, and so on.

With a column-based storage model, the entire database is automatically indexed because the selection rules in a query are defined by columns. The data for each field is stored in aggregates according to columns, which can be dynamically added, and no data is stored if the columns are empty, which saves storage space. Each iduan of data stored in accordance with the aggregation can greatly reduce the amount of data read, the hit rate of the query will be improved, so that the search is more direct, without having to consider the sub-base] sub-table, to improve the hit rate, reduce IO and other bottlenecks.

The Hbase database supports automatic data partitioning and storage, and supports highly concurrent read and write operations, making massive data storage automatically more scalable.

Hadoop itself supports extracting data from the database via JDBC. Most database systems have batch export and import functions. In either case, it is easy to import data from an entire database into Hadoop on a recurring or incremental basis. The software license cost of the database system is also reduced as the database system stores less data. Figure 1 shows an application scenario where Hadoop works with a relational database to process computational tasks. In this case, the relational database system is used to process real-time data, thus ensuring data consistency during transactions. If the same database system is required to generate complex analytic reports from

large-volume data is extremely computationally resource intensive, reducing the performance of the system and its ability to handle real-time data work.

Hadoop is designed to store massive amounts of data, process it in any way, and deliver it to any system on demand. Data can be routinely exported from relational database systems into Hadoop, relational database systems can be adapted to specialize in interactive tasks, and complex analytics can be left to Hadoop on an offline basis with no impact on the implementation system.