Traditional Culture Encyclopedia - Traditional culture - Why hive is a hadoop data warehouse? Understand from all aspects.

Why hive is a hadoop data warehouse? Understand from all aspects.

Hive is a data warehouse tool based on Hadoop, which can map structured data files into a database table, provide simple sql query function, and convert sql statements into MapReduce tasks to run. Its advantages are low learning cost, no need to develop special MapReduce applications, and simple MapReduce statistics can be realized quickly through SQL-like statements, which is very suitable for statistical analysis of data warehouses. It provides a series of tools for data extraction and transformation loading (ETL), which is a mechanism for storing, querying and analyzing large-scale data stored in Hadoop. ?

(1).hive is used by FaceBook to solve the statistics of massive structured logs. ?

(2).hive is a data warehouse tool based on hadoop, which can map structured data files into a table and provide query functions similar to SQL. ?

(3).hive is a data warehouse based on hadoop:

Use HQL statement as query interface

Use HDFS for storage

Use mapreduce for calculation. ?

(4) the essence. Hive is a program that converts HQL into MapReduce. ?

(5) Good flexibility and expansibility: support UDF and customize the storage format. ?

(6) Suitable for off-line processing. ?

(7) Query and manage large data sets stored in distributed storage (database: add and delete to search, hive does not support adding and deleting this). Management is mainly the management of forms.