Map Join Reduce

OPTIMIZATION OF MULTISET DATA ANALYSIS ON HADOOP USING MAP JOIN REDUCE

A PROJECT REPORT
Submitted by

SHENBAGA PRIYA.B
09ITR105

SILAMBARASAN.R
09ITR108

VIGNESWARI.A
09ITR125 in partial fulfilment of the requirements for the award of the degree of

BACHELOR OF TECHNOLOGY IN INFORMATION TECHNOLOGY
DEPARTMENT OF INFORMATION TECHNOLOGY SCHOOL OF COMMUNICATION AND COMPUTER SCIENCES

KONGU ENGINEERING COLLEGE
(Autonomous)

PERUNDURAI ERODE – 638 052

APRIL 2013

ABSTRACT

Data analysis is the process of inspecting, cleaning, transforming and modeling data with the goal of highlighting useful information, suggesting conclusions and supporting decision making, which is considerable in cloud computing which allows a large amount of data to be processed over very large clusters. MapReduce is used to handle data in the cloud environment especially in distributed environment because of its excellent scalability and good fault tolerance. But, compared to parallel databases, the efficiency of MapReduce is not efficient when it is adopted to perform complex data analysis which includes joining of multiple data sets in order to compute certain aggregates. A system called Map Join Reduce, which performs complex data analytical task effectively when compared to existing, is proposed. Filtering-join-aggregation model, an extension of MapReduce’s filtering aggregation programming model is introduced. First it performs filtering logic to the data sets and processed in pipelined manner, then groups the output and produces the final result. The significance of our proposal is that, aggregate multiple data sets in one go and thus reduce checkpoints which perform often in existing system and shuffling of intermediate results which results in efficiency of data processing in distributed applications.

INTRODUCTION

In Information Technology, big data is a collection of data sets which is too large and complex that it becomes difficult to process using

References: 1. Afrati.F.N and Ullman.J.D.(2010) ‘Optimizing Joins in a Map-Reduce Environment,’ Proc. 13th Int’l Conf. Extending Database Technology(EDBT ’10). 2. Chuck Lam. (2010) ‘Hadoop in action’, Manning publications. 3. Dawei Jiang, Anthony K. H. Tung, and Gang Chen. (2011) ‘MAP-JOIN-REDUCE: Toward Scalable and Efficient Data Analysis on Large Clusters’, IEEE Transactions on Knowledge and Data Engineering, Vol. 23, No. 9. 4. Dean.J and Ghemawat.S. (2004) ‘MapReduce: Simplified Data Processing on Large Clusters,’ Proc. Operating Systems Design and implementation (OSDI), pp. 137-150. 5. Yang.H.C, Dasdan.A, HsiaoR.L, and Parker.D.S. (2007) ‘Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters,’ Proc. ACM SIGMOD Int’l Conf. Management of Data (SIGMOD ’07).

Map Join Reduce

You May Also Find These Documents Helpful

Nt1330 Unit 3 Problem Analysis Paper

Nt1330 Unit 3 Problem Analysis Paper

Team A Service Request WK4 V2

Team A Service Request WK4 V2

Parallel Data Mining and Assurance Service Model Using Hadoop in Cloud

Parallel Data Mining and Assurance Service Model Using Hadoop in Cloud

Unit 3 Assignment 1 Task Tracker

Unit 3 Assignment 1 Task Tracker

Assignment 1 Business Rules and Data Models

Assignment 1 Business Rules and Data Models

Week 6 Discussion 2

Week 6 Discussion 2

Database Environment Week 2

Database Environment Week 2

Hadoop Thesis Statement

Hadoop Thesis Statement

Resume

Resume

Design of Product Placement Layout in Retail Shop Using Market Basket Analysis

Design of Product Placement Layout in Retail Shop Using Market Basket Analysis

Airavat: Security and Privacy for MapReduce

Airavat: Security and Privacy for MapReduce

Big Data Architecture, Goals and Challenges

Big Data Architecture, Goals and Challenges

Industry 4.0 Analysis

Industry 4.0 Analysis

Big Data Big Reward

Big Data Big Reward

Data Warehousing and Olap

Data Warehousing and Olap

Related Topics