3.2: table called “emp”. Our main objective is

 

3.2: Hive: Hive is data warehouse tool
which is used to process and analysis of structured data in the form of tables
and databases. Hive is working on top layer of Hadoop ecosystem. Hive has
mainly three basic functions:-

a)     
Data
summarization

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!


order now

b)    
Query

c)     
Analysis
of data

Hive supports query language called HiveQL,
which translate SQL like query into the MapReduce jobs.  Once we execute the query it will pass to Job
Tracker and then Job Tracker pass the job to Task Tracker and finally the Map
and Reduce task will execute and fetch the data from HDFS.    

 

 

 

 

 

 

             

                  Figure 3.2.1 Working of Hive

To exhibit the working structure of
MapReduce in Hive first we take the database called “employee” and inside the employee
database we have table called “emp”. Our main objective is to fine the total number
of rows in the “emp table”. We will execute the following HiveQL query:-

 

Hive > use employee;

Hive > Select Count (*) from emp;

 

 

 

 

 

 

 

 

Figure 3.2.2 MapReduce job for HiveQL query

 

To
execute this query, Hive perform the MapReduce job as shown in 3.2.2 and the
total time required is 1 min and 22 second.

 

3.3 Apache Pig: Apache Pig is an
abstraction over MapReduce. Pig is used for analysis of large or distributed
data set without using writing the program of MapReduce. We can perform all the
data manipulation operations in Hadoop using Apache Pig by using scripting
language called Pig Latin.  Every pig
program has three parts:-

a)     
Loading

b)    
Transforming

c)     
Dumping
of the data.

 

 

 

 

 

 

Figure
3.3.1 Working of Apache Pig

The following command
we will execute to display the MapReduce execution on Apache Pig:-