Preview

Nt1310 Unit 4 Exercise 1

Best Essays
Open Document
Open Document
1486 Words
Grammar
Grammar
Plagiarism
Plagiarism
Writing
Writing
Score
Score
Nt1310 Unit 4 Exercise 1
3. Problem Formulation
As it is evident from the related work discussed in the section 2, when small files are stored on HDFS, disk utilization is not a bottleneck. In general, small file problem occurs when memory of NameNode is highly consumed by the metadata and BlockMap of huge numbers of files. NameNode stores file system metadata in main memory and the metadata of one file takes about 250 bytes of memory. For each block by default three replicas are created and its metadata takes about 368 bytes [9]. Let the number of memory bytes that NameNode consumed by itself be denoted as α. Let the number of memory bytes that are consumed by the BlockMap be denoted as β. The size of an HDFS block is denoted as S. Further assume that there are N
…show more content…
FIndex: Local index file for set of merged small files.
Phase 4: Uploading of files to HDFS: Both of the files, local index file and merged file are written to HDFS which avoid overhead involved in keeping the information at NameNode. NameNode keeps the information of merged file and index file only. File correlations are considered when storing the files to improve the access efficiency.
Phase 5: File caching strategy: The caching strategy is used to cache local index file and correlated files. Based on the strategy, communications with HDFS are drastically reduced thus to improve the access efficiency, when downloading files. When a requested file misses in cache, the client needs to query NameNode for file metadata. According to the metadata, the client connects with appropriate DataNodes where blocks locate. When the local index file is firstly read, based on the offset and length, the requested file is split from the block, and is returned to the client.
5. Theoretical Validation Of the Proposed Technique
Suppose there are N small files, which are merged into K merged files whose lengths are denoted as LM1, LM2, …, and LMK. The computational formula of the consumed memory of NameNode in file merging and caching technique is given

You May Also Find These Documents Helpful

  • Satisfactory Essays

    Over the last week the market has saw December corn futures fall $.08 on the heels of the Monday USDA report. The September report confirmed what most analysts have predicted for the last several weeks; there will almost certainly be a massive corn crop this season. The general consensus this week is that corn will continue to decrease throughout the harvest season. Kevin McNew of Grain Hedge believes if there is a late winter rally coming it will be “a game of small moves,”. I decided that it would be beneficial to calculate how much per bushel I need to make my break-even point. As a result of using the Missouri Extension numbers for average yield and average cost per acre I found that the minimum break-even price per bushel comes in at about…

    • 323 Words
    • 2 Pages
    Satisfactory Essays
  • Good Essays

    Nt1330 Unit 1 Exercise 1

    • 524 Words
    • 3 Pages

    -DHCP means Dynamic Host Configuration Protocol and is a part of application layer. A DHCP server has a number of IP address in its memory to assign for the computers/hosts on a network.…

    • 524 Words
    • 3 Pages
    Good Essays
  • Satisfactory Essays

    a. The time in hours, minutes, and seconds is to be passed to a function named totsec().…

    • 720 Words
    • 3 Pages
    Satisfactory Essays
  • Satisfactory Essays

    1. BY 12/4 THE DEALER HAD NOT MADE CONTACT, INSTEAD GOT A CALL FROM MS. MELTON.…

    • 596 Words
    • 3 Pages
    Satisfactory Essays
  • Powerful Essays

    Nt1330 Unit 1 Study Guide

    • 1178 Words
    • 5 Pages

    A Database is generally used for storing data in a structured way in an efficient manner for insert, update and retrieval of data in well defined formats. On the other hand, in file system the data stored in unstructured manner with an unrelated data.…

    • 1178 Words
    • 5 Pages
    Powerful Essays
  • Satisfactory Essays

    It contains recent modified or changed files allowing them to be accessed faster in less time. 7. What is Konqueror? 8.…

    • 680 Words
    • 10 Pages
    Satisfactory Essays
  • Good Essays

    it 260 exam 1

    • 419 Words
    • 2 Pages

    It is the File Server element that conserves disk space by eliminating duplicate copies of...…

    • 419 Words
    • 2 Pages
    Good Essays
  • Satisfactory Essays

    HFS+ is file system developed by apple to replace their Hierarchical file system as the primary file system used in Mac computers It is also used by IPod and it is referred to as Mac OS extended.…

    • 706 Words
    • 3 Pages
    Satisfactory Essays
  • Good Essays

    Prg 211 Array Structure

    • 788 Words
    • 4 Pages

    There is a need for revisions to the current Naming Scheme program which uses loops and input fields to develop the correct file name. The new structure uses multiple parallel arrays to gather the required data. Each variable in the array obtains one of five specific attributes of…

    • 788 Words
    • 4 Pages
    Good Essays
  • Powerful Essays

    About Hris

    • 5386 Words
    • 22 Pages

    One of the primary disadvantages of traditional file systems is the time it takes to access data. It can take minutes if not hours to locate a few files in a large paper filing system.…

    • 5386 Words
    • 22 Pages
    Powerful Essays
  • Better Essays

    Hadoop clusters are built with inexpensive computers. If one computer or node fails, the cluster can continue to operate without losing data or interrupting work by simply re-distributing the work to the remaining machines in the cluster. HDFS manages storage on the cluster by breaking files into small blocks and storing duplicated copies of them across the pool of nodes. The figure below illustrates how a data set is typically stored across a cluster of five nodes. In this example, the entire data set will still be available even if two of the servers have…

    • 1572 Words
    • 7 Pages
    Better Essays
  • Better Essays

    In order to achieve these goals, the working set of the application should be optimal. The way to achieve an optimal working set is via data clustering. With good data clustering more data can be accessed in fewer pages; thus a high data density rate is obtained. A higher data density results in a smaller working set as well as a better chance of cache affinity. A smaller working set results in fewer page transfers. The following sections in this paper will explain several clustering patterns/techniques for achieving better performance via cache affinity, higher data density and a smaller…

    • 1188 Words
    • 5 Pages
    Better Essays
  • Powerful Essays

    * Selecting structures, including indexes and the overall database architecture, for storing and connecting files to make retrieving related data more efficient…

    • 2835 Words
    • 12 Pages
    Powerful Essays
  • Good Essays

    Altogether, this means repetitive merges and disk access, causing degraded performance for Hadoop. Therefore, an alternative merge algorithm is critical for Hadoop to mitigate the impact of repetitive merges and extra disk accesses.…

    • 844 Words
    • 4 Pages
    Good Essays
  • Good Essays

    [7] R. L. Collins and J. S. Plank. Assessing the performance of erasure codes in the wide-area. DSN-05: Int. Conf. on Dependable Sys. and Networks, Yokohama, 2005. [8] P. Corbett et al. Row diagonal parity for double disk failure correction. 4th Usenix Conf. on File and Storage Tech., San Francisco, 2004. [9] L. Dairaine, J. Lacan, L. Lanc´ rica, and J. Fimes. Content-access QoS e in peer-to-peer networks using a fast MDS erasure code. Comp. Comm., 28(15):1778–1790, 2005. [10] G. Feng, R. Deng, F. Bao, and J. Shen. New efficient MDS array codes for RAID part I: Reed-Solomon-like codes for tolerating three disk failures. IEEE Trans. Comp., 54(9):1071–1080, 2005. [11] G. Feng, R. Deng, F. Bao, and J. Shen. New efficient MDS array codes for RAID part II: Rabin-like codes for tolerating multiple (≥ 4) disk failures. IEEE Trans. Comp., 54(12):1473–1483, 2005. [12] S. Frolund, A. Merchant, Y. Saito, S. Spence, and A. Veitch. A decentralized algorithm for erasure-coded virtual disks. DSN-04: Int. Conf. on Dependable Sys. and Networks, Florence, 2004. [13] A. Goldberg and P. N. Yianilos. Towards an archival intermemory. ADL98: IEEE Adv. in Dig. Libr., Santa Barbara, 1998, pp. 147–156. [14] G. R. Goodson, J. J. Wylie, G. R. Ganger, and M. K. Reiter. Efficient byzantine-tolerant erasure-coded storage. DSN-04: Int. Conf. on Dependable Sys. and Networks, Florence, 2004. [15] J. L. Hafner. WEAVER Codes: Highly fault tolerant erasure codes for storage systems. FAST-2005: 4th Usenix Conf. on File and Storage Tech., San Francisco, 2005, pp. 211–224. [16] J. L. Hafner.…

    • 7154 Words
    • 29 Pages
    Good Essays

Related Topics