BIG DATA ANALYSIS WITH HIVE
What is HIVE?
Hive is a datawarehouseing infrastructure
for Hadoop. The primary responsibility is to provide data
summarization, query and analysis. It supports analysis of large
datasets stored in Hadoop’s HDFS as well as on the Amazon S3 filesystem.
The best part of HIVE is that it supports SQL-Like access to structured
data which is known as HiveQL (or HQL) as well as big data analysis
with the help of MapReduce. Hive is not built to get a quick response to
queries but it it is built for data mining applications. Data mining
applications can take from several minutes to several hours to analysis
the data and HIVE is primarily used there.
HIVE Organization
The data are organized in three different formats in HIVE.
Tables: They are very
similar to RDBMS tables and contains rows and tables. Hive is just
layered over the Hadoop File System (HDFS), hence tables are directly
mapped to directories of the filesystems. It also supports tables stored in other native file systems.
Partitions: Hive tables can have more than one partition. They are mapped to subdirectories and file systems as well.
Buckets: In Hive data may be divided into buckets. Buckets are stored as files in partition in the underlying file system.
Hive also has metastore which
stores all the metadata. It is a relational database containing various
information related to Hive Schema (column types, owners, key-value
data, statistics etc.). We can use MySQL database over here.
What is HiveSQL (HQL)?
Hive query language provides the basic SQL like operations. Here are few of the tasks which HQL can do easily.
- Create and manage tables and partitions
- Support various Relational, Arithmetic and Logical Operators
- Evaluate functions
- Download the contents of a table to a local directory or result of queries to HDFS directory
Here is the example of the HQL Query:
SELECT upper(name), salesprice
FROM sales;
SELECT category, count(1)
FROM products
GROUP BY category;
When you look at the above query, you can see they are very similar to SQL like queries
No comments:
Post a Comment