Traditional Culture Encyclopedia - Traditional culture - The difference between Hbase and traditional databases

The difference between Hbase and traditional databases

What is the difference between HBase and traditional relational databases?

Answer: Mainly reflected in the following aspects: 1. Data type.

The relational database adopts the relational model and has rich data types and storage methods.

HBase adopts a simpler data model, which stores data as uninterpreted strings. Users can serialize structured data and unstructured data in different formats into strings and save them to HBase. Users need to write their own

Programs parse strings into different data types.

2. Data operations.

Relational databases contain a wealth of operations, such as insertion, deletion, update, query, etc., which involve complex multi-table connections, usually achieved with the help of primary and foreign key associations between multiple tables.

HBase operations do not have complex relationships between tables, only simple insertions, queries, deletes, clears, etc. Because HBase avoids complex relationships between tables by design, and usually only uses a single table.

primary key query, so it cannot implement join operations between tables like in relational databases.

3. Storage mode.

Relational databases are stored in row mode, where tuples or rows are stored contiguously in disk pages.

When reading data, you need to scan each tuple sequentially and then filter out the attributes required by the query.

If only a few attribute values ??for each tuple are useful for querying, row-based storage wastes a lot of disk space and memory bandwidth.

HBase is based on column storage. Each column family is saved by several files. The files of different column families are separated. Its advantages are: it can reduce I/O overhead and support a large number of concurrent user queries, because it only needs to process

Columns that answer these queries, rather than processing a large number of rows of data that are not relevant to the query; data in the same column family will be compressed together, and higher data compression can be achieved due to the higher similarity of data within the same column family.

Compare.

4. Data index.

Relational databases can often build complex multiple indexes on different columns to improve data access performance.

Different from relational databases, HBase has only one index - the row key. Through clever design, all access methods in HBase are either accessed through the row key or scanned through the row key, so that the entire system will not slow down.

Since HBase sits on top of the Hadoop framework, Hadoop MapReduce can be used to generate index tables quickly and efficiently.

6. Data maintenance.

In a relational database, the update operation will replace the original old value in the record with the latest current value. The old value will no longer exist after it is overwritten.

When performing an update operation in HBase, the old version of the data will not be deleted, but a new version will be generated, and the old version will still be retained.

7. Scalability.

It is difficult to achieve horizontal expansion of relational databases, and the space for vertical expansion is also relatively limited.

In contrast, distributed databases such as HBase and BigTable were developed for flexible horizontal scalability, so performance can be scaled easily by adding or reducing the amount of hardware in the cluster.

However, compared to relational databases, HBase also has its own limitations. For example, HBase does not support transactions, so it cannot achieve cross-row atomicity.

Note: I originally wanted to ask this question and then copy it.

I couldn't find it, so I had to type it by hand. If you can copy it and use it, please give it a thumbs up.