Preview

The Sudden Surge of Use of Big Data

Good Essays
Open Document
Open Document
904 Words
Grammar
Grammar
Plagiarism
Plagiarism
Writing
Writing
Score
Score
The Sudden Surge of Use of Big Data
1

First two lectures about big data. So why this surge?
Isn’t about data at all
Systems being able to process data
Tools exploit and derive valuable nuggets
NOT NEW: Walmart Wallstreet decades
Why not out? Competition
Teradata 20 years, contest MR patent
Digital exhaust incr
Know your Stats, gaga tweet sentiment analysis
Correlation causation

2

Old: Centralized systems (data came from humans)
Sun hardware
Oracle software
Moore’s law -> data grew. Data exhausts growing larger. Can’t keep up.
RDBMS is not going away. Predictable queries on tabular data
Unstructured data doesn’t fit in tables nicely. Complicated data.
Even if it did
Sentiment analysis of the natural language captured in all those tweets
Mapreduce Joke!
Data processing and not transaction processing
Complex data at volume
Terabytes hard not to get
Actively throw data away data due to lack of storage

3

DJ Patil is Data Sci at in residence at Greylock
Made a quote based on his experience of being a data sci and manager @ LinkedIn poor use of scarce labour very few people who can do the interesting parts of this job spending 80% of time cleaning boring bits

4

Survey by Ventana Research
Determine how important were these metrics
Evaluating their large-scale data tech projects

5

Economically finish
Moving from storage to compute costly, pressure N/W infrastructure
Move code to data
Archiving
ONCE Data old, send to tapes or disks due to economics of storage
Many sci argue that data stored, never looked back, Costly to retrieve from archives
Justify economics of storing ROI or return on bytes
Asking questions not from ETL tools but raw data
Data loss due to aggregation ETL
You want to ask questions, build schema for that? With such scale of data?
No schema! Data not transformed
Copied
Not pulling data but pushing work to store

Hadoop by Apache
Open source
Used to organize huge amount of unstructured data
Used in

You May Also Find These Documents Helpful

Related Topics