This Course tools abstr. desk cloud Bill Howe, UW 2tools abstr. What are the abstractions of data science? “Data Jujitsu” “Data Wrangling” Translation: “We have no idea what “Data Munging” this is all about”
4/28/13 Bill Howe, UW 3tools abstr. What are the abstractions of data science? matrices and linear algebra? relations and relational algebra? objects and methods? files and scripts? data frames and functions? 4/28/13 Bill Howe, UW 4Data Access Hitting a Wall desk cloud Current practice based on data download (FTP/GREP) Will not scale to the datasets of tomorrow • You can GREP 1 MB in a second • You can FTP 1 MB in 1 sec • You can GREP 1 GB in a minute • You can FTP 1 GB / min (~1$) • You can GREP 1 TB in 2 days • You can GREP 1 PB in 3 years. • … 2 days and 1K$ • … 3 years and 1M$ • Oh!, and 1PB ~5,000 disks • At some point you need indices to limit search parallel data search and analysis • This is where databases can help [slide src: Jim Gray] 5hackers analysts