Preview

Google Files Systems

Powerful Essays
Open Document
Open Document
1348 Words
Grammar
Grammar
Plagiarism
Plagiarism
Writing
Writing
Score
Score
Google Files Systems
The Google File System

The Google File System
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google

Niek Linnenbank
Faculty of Science Vrije universiteit nlk800@few.vu.nl

March 17, 2010

The Google File System Outline

1 Introduction 2 Architecture 3 Measurements 4 Latest Work 5 Conclusion

The Google File System Introduction

Size of the Internet

6,767,805,208 people on earth 1,733,993,741 people on the internet 5,000,000 terabytes of data (Eric Schmidt, 2005)

The Google File System Introduction

Top 10 Search Provider in US, January 2010
RANK 1 2 3 4 5 6 7 8 9 10 PROVIDER ALL SEARCH GOOGLE SEARCH YAHOO SEARCH MSN SEARCH AOL SEARCH ASK.COM SEARCH MY WEB SEARCH SEARCH COMCAST SEARCH YELLOW PAGES SEARCH NEXTAG SEARCH BIZRATE SEARCH SEARCHES (000) 10,272,099 6,805,424 1,488,476 1,116,546 251,762 194,161 112,356 59,608 35,101 34,736 20,123 SHARE 100.0 66.3 14.5 10.9 2.5 1.9 1.1 0.6 0.3 0.3 0.2

The Google File System Introduction

The Google Way

Google does web indexing (and more) Cheap commodity hardware Patented PageRank(tm) technology

The Google File System Introduction

Google Filesystem

Scalable distributed filesystem Designed for cheap clusters Capable of storing hundreds of terabytes

The Google File System Architecture

Assumptions

Component failures are the norm Inexpensive commodity hardware Large files Files mutated with appends Workload typically large streaming reads and appends

The Google File System Architecture

Design

One master process keeps file metadata. Files are split into chunks. Multiple chunkservers to store chunks. Multiple clients may access concurrently. POSIX-a-like API (create, read, write, append, delete)

The Google File System Architecture

Design client chunk data

chunk locations

chunk server

master

chunk server

chunk server

chunk server

chunk server

chunk server

chunk server

chunk server

The Google File System Architecture

You May Also Find These Documents Helpful

  • Best Essays

    Nt1310 Unit 4 Exercise 1

    • 1486 Words
    • 6 Pages

    As it is evident from the related work discussed in the section 2, when small files are stored on HDFS, disk utilization is not a bottleneck. In general, small file problem occurs when memory of NameNode is highly consumed by the metadata and BlockMap of huge numbers of files. NameNode stores file system metadata in main memory and the metadata of one file takes about 250 bytes of memory. For each block by default three replicas are created and its metadata takes about 368 bytes [9]. Let the number of memory bytes that NameNode consumed by itself be denoted as α. Let the number of memory bytes that are consumed by the BlockMap be denoted as β. The size of an HDFS block is denoted as S. Further assume that there are N…

    • 1486 Words
    • 6 Pages
    Best Essays
  • Good Essays

    * Comcast has a huge percentage change from 2011 to 2012 due to the Olympic coverage that was on NBC a Comcast affiliate.…

    • 1583 Words
    • 7 Pages
    Good Essays
  • Good Essays

    Comcast would add nearly 30 million subscribers and achieve a much higher penetration rate in the market, which also gives them the opportunity to increase price, and thus profitability. Nonetheless, Pay TV is declining and the rate of penetration decline is accelerating as viewing television content online become increasingly popular. Powerful competitive forces such as Netflix, Amazon Instant Video, Hulu, and now CBS’s standalone product “CBS All-Access” and HBO’s partnership with Apple has formed strong threats to Comcast. With the changing consumer demands and the distinctive needs recognized regarding different customer segments, the question of whether its current business model will still be viable in the marketplace has…

    • 481 Words
    • 2 Pages
    Good Essays
  • Powerful Essays

    Mis 535 Course Paper

    • 2829 Words
    • 12 Pages

    A main server that is big enough to route files in it for storage can help a computer and website run faster. Just as an external hard drive can store data on it away from a computer to free up the main hard drive can help a computer run faster. Cloud storage operates in the same concept, instead of having a hard drive, there is a huge server with a lot of storage. In this paper, the company that I have chosen has issues with server space and need a newer solution to developing a better service to the company and its production facilities. I will explain what it will take to boost service speed and help the company to ship products out faster with just a solution in server space.…

    • 2829 Words
    • 12 Pages
    Powerful Essays
  • Powerful Essays

    Financial Project

    • 1342 Words
    • 6 Pages

    The telecommunication industry has experienced substantial growth during the last 20 years, and offers frequent technological upgrades that has enabled these companies to find new revenue sources and growth opportunities. The telecommunication industry is responsible for radio, television, and broadband services, but the biggest factor of their business is through the cellular telephone market, which has also grown at an incredible rate over the past 20 or so years. In this report, I will be comparing two of the biggest competitors in this industry, Verizon and Sprint.…

    • 1342 Words
    • 6 Pages
    Powerful Essays
  • Satisfactory Essays

    Unit 2 Explore and Discuss

    • 1383 Words
    • 6 Pages

    C.1.a) POSIX - a family of standards specified by the IEEE for maintaining compatibility between operating systems.…

    • 1383 Words
    • 6 Pages
    Satisfactory Essays
  • Satisfactory Essays

    Bio 116 Hw Asssignment

    • 752 Words
    • 4 Pages

    The approximate human population is 7 billion. An estimation of 200 thousand people is being added each day.…

    • 752 Words
    • 4 Pages
    Satisfactory Essays
  • Good Essays

    The proposed system will use distributed file system to facilitate information and documents sharing which can either permanently store information or only share information. The distributed file system will provide storage and retrieval, naming, sharing and protection of documents. Offices will share information through remote information sharing that allows a document to be transparently accessed by any office irrespective of the document’s location. The proposed system will also facilitate information sharing by the use of diskless…

    • 681 Words
    • 3 Pages
    Good Essays
  • Best Essays

    Comcast

    • 1539 Words
    • 5 Pages

    • XFINITY® Internet: Comcast is the nation’s largest Internet service provider that offers some of the fastest speeds of up to 105 Mbps, as well as a reliable and safe online experience.…

    • 1539 Words
    • 5 Pages
    Best Essays
  • Satisfactory Essays

    Perform a risk assessment. Perform a risk assessment. Fill out the control table for Classic Catalog Company…

    • 262 Words
    • 2 Pages
    Satisfactory Essays
  • Satisfactory Essays

    1. When you designed a file system in the first section of this lab, why did you choose the structure that you selected?…

    • 350 Words
    • 2 Pages
    Satisfactory Essays
  • Best Essays

    Stratified B-trees and Versioned Dictionaries. Andy Twigg, Andrew Byde, Grzegorz Miło´s, Tim Moreton, John Wilkesy and Tom Wilkie Acunu, yGoogle firstname@acunu.com Abstract External-memory versioned dictionaries are fundamental to file systems, databases and many other algorithms. The ubiquitous data structure is the copy-onwrite (CoW) B-tree.…

    • 4093 Words
    • 17 Pages
    Best Essays
  • Powerful Essays

    Netapp Company Reivew

    • 2079 Words
    • 9 Pages

    Michael Linett & Zerowait Corporation. (2007, December 11). Articles. Retrieved from An Honest Look At Storage Trends: http://blog.zerowait.com/?p=523…

    • 2079 Words
    • 9 Pages
    Powerful Essays
  • Powerful Essays

    Windows Azure Storage: A Highly Available Cloud Storage Service with Strong Consistency Brad Calder, Ju Wang, Aaron Ogus, Niranjan Nilakantan, Arild Skjolsvold, Sam McKelvie, Yikang Xu, Shashwat Srivastav, Jiesheng Wu, Huseyin Simitci, Jaidev Haridas, Chakravarthy Uddaraju, Hemal Khatri, Andrew Edwards, Vaman Bedekar, Shane Mainali, Rafay Abbasi, Arpit Agarwal, Mian Fahim ul Haq, Muhammad Ikram ul Haq, Deepali Bhardwaj, Sowmya Dayanand, Anitha Adusumilli, Marvin McNett, Sriram Sankaran, Kavitha Manivannan, Leonidas Rigas Microsoft workflow for many applications. A common usage pattern we see is incoming and outgoing data being shipped via Blobs, Queues providing the overall workflow for processing the Blobs, and intermediate service state and final results being kept in Tables or Blobs. Abstract Windows Azure Storage (WAS) is a cloud storage system that provides customers the ability to store seemingly limitless…

    • 16124 Words
    • 65 Pages
    Powerful Essays
  • Good Essays

    There are distinct non-sequential file structure models in the computer system, generally, the file structure can be subdivided into two major types which are the random (hashed) file and the index file. This will compare the both types of file structure models and its advantages and disadvantages.…

    • 708 Words
    • 3 Pages
    Good Essays