ReSavr
Recent
Home
bigdata
Good books on big data? Introductory and advanced. (for a trained statistician/s...
This comment was posted to reddit on Jan 25, 2015 at 8:06 pm and was deleted within 14 hour(s) and 47 minutes.
Good books on big data? Introductory and advanced. (for a trained statistician/software engineer)
Here are some topics/URLs to read:
8 fallacies of distributed computing:
http://www.rgoarchitects.com/Files/fallacies.pdf
Gossip Protocols
Anti-entropy algorithms
Merkle Trees
2-phase commit
Paxos
http://en.wikipedia.org/wiki/Paxos_%28computer_science%29
http://research.microsoft.com/en-us/um/people/lamport/pubs/paxos-simple.pdf
http://pine.cs.yale.edu/pinewiki/Paxos
http://www.cs.cornell.edu/courses/cs7412/2011sp/paxos.pdf
Inverted index vs. index
http://en.wikipedia.org/wiki/Inverted_index
Phi Accural Failure Detection
http://ddg.jaist.ac.jp/pub/HDY+04.pdf
Codd's "A Relational Model of Data for Large Shard Data Banks"
http://www.seas.upenn.edu/~zives/03f/cis550/codd.pdf
Jim Gray's "The Transaction Concept"
http://research.microsoft.com/en-us/um/people/gray/papers/theTransactionConcept.pdf
Gregor Hohpe's "Starbucks Does Not Use Two-Phase Commit"
http://www.eaipatterns.com/ramblings/18_starbucks.html
Flixster sharding strategies
http://lsvp.wordpress.com/2008/06/20
Berkley "The Case for Shared Nothing"
http://db.cs.berkeley.edu/papers/hpts85-nothing.pdf
IDC research paper "The Expanding Digital Universe"
http://www.emc.com/collateral/analyst-reports/expanding-digital-idc-white-paper.pdf
Amazon's Dynamo paper
http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html
Dwight Merriman (video on CAP spectrum)
http://bit.ly/7r6kRg
Facebook Cassandra paper
http://www.cs.cornell.edu/projects/ladis2009/papers/lakshman-ladis2009.pdf
Twitter Cassandra analytics
http://engineering.twitter.com/2010/07/cassandra-at-twitter-today.html
Cloudkick using Cassandra for metrics
https://www.cloudkick.com/blog/2010/mar/02/4_months_with_cassandra
Staged Event-Driven Architecture (SEDA)
http://www.eecs.harvard.edu/~mdw/proj/seda
BigTable
http://research.google.com/archive/bigtable.html
Hadoop-related tech:
map reduce
whirr
flume
oozie
mahout
hue
hbase
flume
hive
pig
spark - see the white paper spark is based on:
https://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf
comparison of spark and stratosphere:
http://www.diva-portal.org/smash/get/diva2:605106/FULLTEXT01.pdf
yarn
Books:
Hadoop: The Definitive Guide
Cassandra: The Definitive Guide
MongoDB Applied Design Patterns
Graph Databases (Robinson)
Learning Spark (Karau et al) - this one's not published yet (I have an early release copy)
/r/bigdata
Thread
Versions (2)
Original
Edit 1
Previous
Next
More Random Comments
Should I ask for a rent rebate if a major appliance isn't working for a month?
First racket advise
Friend is suicidal and I don't know how to help
Rant Wednesday
Heating oil prices, lock in now?
Chainsaw Man Manga Fans Are The Most Insufferable Fans Right Now
A robber caught by police and demonstrating how he breaks the car window glass using spit and ceramic bits
Ich habe arrangiert geheiratet
r/movies decides to use an article about how actors of color deserve to be cast in large franchises without being subjected to racist harassment to voice their opinions on the evils of diversity
(f) Miss Summer already
What is there to steal from an EV?
Managing addictions and scumbag behavior
I really hope the upcoming Pixel Tablet brings more to the table than media consumption and home controls.
Telus is receiving 12+ cheques from me at once
US tech worker’s account of racist abuse in Sweden goes viral