Nt1310 Unit 4 Exercise 1

3. Problem Formulation
As it is evident from the related work discussed in the section 2, when small files are stored on HDFS, disk utilization is not a bottleneck. In general, small file problem occurs when memory of NameNode is highly consumed by the metadata and BlockMap of huge numbers of files. NameNode stores file system metadata in main memory and the metadata of one file takes about 250 bytes of memory. For each block by default three replicas are created and its metadata takes about 368 bytes [9]. Let the number of memory bytes that NameNode consumed by itself be denoted as α. Let the number of memory bytes that are consumed by the BlockMap be denoted as β. The size of an HDFS block is denoted as S. Further assume that there are N …show more content…
FIndex: Local index file for set of merged small files.
Phase 4: Uploading of files to HDFS: Both of the files, local index file and merged file are written to HDFS which avoid overhead involved in keeping the information at NameNode. NameNode keeps the information of merged file and index file only. File correlations are considered when storing the files to improve the access efficiency.
Phase 5: File caching strategy: The caching strategy is used to cache local index file and correlated files. Based on the strategy, communications with HDFS are drastically reduced thus to improve the access efficiency, when downloading files. When a requested file misses in cache, the client needs to query NameNode for file metadata. According to the metadata, the client connects with appropriate DataNodes where blocks locate. When the local index file is firstly read, based on the offset and length, the requested file is split from the block, and is returned to the client.
5. Theoretical Validation Of the Proposed Technique
Suppose there are N small files, which are merged into K merged files whose lengths are denoted as LM1, LM2, …, and LMK. The computational formula of the consumed memory of NameNode in file merging and caching technique is given

Nt1310 Unit 4 Exercise 1

You May Also Find These Documents Helpful

Nt1310 Unit 3 Exercise Summary

Nt1310 Unit 3 Exercise Summary

Nt1330 Unit 1 Exercise 1

Nt1330 Unit 1 Exercise 1

Nt1310 Unit 3 Assignment 4

Nt1310 Unit 3 Assignment 4

Nt1310 Unit 4 Exercise 1 Answer Key

Nt1310 Unit 4 Exercise 1 Answer Key

Nt1330 Unit 1 Study Guide

Nt1330 Unit 1 Study Guide

Printable Flashcard On IT302 Quiz 1 Fr

Printable Flashcard On IT302 Quiz 1 Fr

it 260 exam 1

it 260 exam 1

Operating System and Max File/volume Size

Operating System and Max File/volume Size

Prg 211 Array Structure

Prg 211 Array Structure

About Hris

About Hris

Hadoop Distributed File System Case Study

Hadoop Distributed File System Case Study

Clustering Techniques in Oodbms (Using Objectstore)

Clustering Techniques in Oodbms (Using Objectstore)

Modern Database Management Study Guide

Modern Database Management Study Guide

Mapreduce Case Study Solution

Mapreduce Case Study Solution

Optimizing Cauchy Reed-Solomon Codes for Fault-Tolerant Network Storage Applications

Optimizing Cauchy Reed-Solomon Codes for Fault-Tolerant Network Storage Applications

Related Topics