ReSavr
Recent
Home
bigdata
Good books on big data? Introductory and advanced. (for a trained statistician/s...
This comment was posted to reddit on Jan 25, 2015 at 8:06 pm and was deleted within 14 hour(s) and 47 minutes.
Good books on big data? Introductory and advanced. (for a trained statistician/software engineer)
Here are some topics/URLs to read:
8 fallacies of distributed computing:
http://www.rgoarchitects.com/Files/fallacies.pdf
Gossip Protocols
Anti-entropy algorithms
Merkle Trees
2-phase commit
Paxos
http://en.wikipedia.org/wiki/Paxos_%28computer_science%29
http://research.microsoft.com/en-us/um/people/lamport/pubs/paxos-simple.pdf
http://pine.cs.yale.edu/pinewiki/Paxos
http://www.cs.cornell.edu/courses/cs7412/2011sp/paxos.pdf
Inverted index vs. index
http://en.wikipedia.org/wiki/Inverted_index
Phi Accural Failure Detection
http://ddg.jaist.ac.jp/pub/HDY+04.pdf
Codd's "A Relational Model of Data for Large Shard Data Banks"
http://www.seas.upenn.edu/~zives/03f/cis550/codd.pdf
Jim Gray's "The Transaction Concept"
http://research.microsoft.com/en-us/um/people/gray/papers/theTransactionConcept.pdf
Gregor Hohpe's "Starbucks Does Not Use Two-Phase Commit"
http://www.eaipatterns.com/ramblings/18_starbucks.html
Flixster sharding strategies
http://lsvp.wordpress.com/2008/06/20
Berkley "The Case for Shared Nothing"
http://db.cs.berkeley.edu/papers/hpts85-nothing.pdf
IDC research paper "The Expanding Digital Universe"
http://www.emc.com/collateral/analyst-reports/expanding-digital-idc-white-paper.pdf
Amazon's Dynamo paper
http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html
Dwight Merriman (video on CAP spectrum)
http://bit.ly/7r6kRg
Facebook Cassandra paper
http://www.cs.cornell.edu/projects/ladis2009/papers/lakshman-ladis2009.pdf
Twitter Cassandra analytics
http://engineering.twitter.com/2010/07/cassandra-at-twitter-today.html
Cloudkick using Cassandra for metrics
https://www.cloudkick.com/blog/2010/mar/02/4_months_with_cassandra
Staged Event-Driven Architecture (SEDA)
http://www.eecs.harvard.edu/~mdw/proj/seda
BigTable
http://research.google.com/archive/bigtable.html
Hadoop-related tech:
map reduce
whirr
flume
oozie
mahout
hue
hbase
flume
hive
pig
spark - see the white paper spark is based on:
https://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf
comparison of spark and stratosphere:
http://www.diva-portal.org/smash/get/diva2:605106/FULLTEXT01.pdf
yarn
Books:
Hadoop: The Definitive Guide
Cassandra: The Definitive Guide
MongoDB Applied Design Patterns
Graph Databases (Robinson)
Learning Spark (Karau et al) - this one's not published yet (I have an early release copy)
/r/bigdata
Thread
Versions (2)
Original
Edit 1
Previous
Next
More Random Comments
One Piece manga experience
No longer trust my mom - no idea where my toddler will go when I have second baby this spring
Why is it racist to only prefer to date within your race?
my therapist asked me *why* I want to be a woman
What would you change in the ending ?
[Rifle] $240 HI POINT 4095TS .40S&W CARBINE
"Uncircumcised" is a complete misnomer and it shouldn't be used
Why do they get so cold when ending things?
Official Discussion - The French Dispatch [SPOILERS]
I have an average view duration of just 1 minute on a 13 minute long video
Definitely 8 or 9
Pelosi Rejects Stock-Trading Ban for Members of Congress: 'We Are a Free-Market Economy'
What’s the most hurtful thing your parents ever said to you?
?????????????
[spoilers all] Dwarven Romance Fail.