ReSavr
Recent
Home
bigdata
Good books on big data? Introductory and advanced. (for a trained statistician/s...
This comment was posted to reddit on Jan 25, 2015 at 8:06 pm and was deleted within 14 hour(s) and 47 minutes.
Good books on big data? Introductory and advanced. (for a trained statistician/software engineer)
Here are some topics/URLs to read:
8 fallacies of distributed computing:
http://www.rgoarchitects.com/Files/fallacies.pdf
Gossip Protocols
Anti-entropy algorithms
Merkle Trees
2-phase commit
Paxos
http://en.wikipedia.org/wiki/Paxos_%28computer_science%29
http://research.microsoft.com/en-us/um/people/lamport/pubs/paxos-simple.pdf
http://pine.cs.yale.edu/pinewiki/Paxos
http://www.cs.cornell.edu/courses/cs7412/2011sp/paxos.pdf
Inverted index vs. index
http://en.wikipedia.org/wiki/Inverted_index
Phi Accural Failure Detection
http://ddg.jaist.ac.jp/pub/HDY+04.pdf
Codd's "A Relational Model of Data for Large Shard Data Banks"
http://www.seas.upenn.edu/~zives/03f/cis550/codd.pdf
Jim Gray's "The Transaction Concept"
http://research.microsoft.com/en-us/um/people/gray/papers/theTransactionConcept.pdf
Gregor Hohpe's "Starbucks Does Not Use Two-Phase Commit"
http://www.eaipatterns.com/ramblings/18_starbucks.html
Flixster sharding strategies
http://lsvp.wordpress.com/2008/06/20
Berkley "The Case for Shared Nothing"
http://db.cs.berkeley.edu/papers/hpts85-nothing.pdf
IDC research paper "The Expanding Digital Universe"
http://www.emc.com/collateral/analyst-reports/expanding-digital-idc-white-paper.pdf
Amazon's Dynamo paper
http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html
Dwight Merriman (video on CAP spectrum)
http://bit.ly/7r6kRg
Facebook Cassandra paper
http://www.cs.cornell.edu/projects/ladis2009/papers/lakshman-ladis2009.pdf
Twitter Cassandra analytics
http://engineering.twitter.com/2010/07/cassandra-at-twitter-today.html
Cloudkick using Cassandra for metrics
https://www.cloudkick.com/blog/2010/mar/02/4_months_with_cassandra
Staged Event-Driven Architecture (SEDA)
http://www.eecs.harvard.edu/~mdw/proj/seda
BigTable
http://research.google.com/archive/bigtable.html
Hadoop-related tech:
map reduce
whirr
flume
oozie
mahout
hue
hbase
flume
hive
pig
spark - see the white paper spark is based on:
https://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf
comparison of spark and stratosphere:
http://www.diva-portal.org/smash/get/diva2:605106/FULLTEXT01.pdf
yarn
Books:
Hadoop: The Definitive Guide
Cassandra: The Definitive Guide
MongoDB Applied Design Patterns
Graph Databases (Robinson)
Learning Spark (Karau et al) - this one's not published yet (I have an early release copy)
/r/bigdata
Thread
Versions (2)
Original
Edit 1
Previous
Next
More Random Comments
How is this possible? Is money just an imagination? Suddenly everyone is free from borrowed piece of paper. Money is a paradox.
My Thoughts Towards Pixelberry Regarding The Red Carpet Diaries Series (Especially Book 2 & 3)
The tide goes so far out on the Ningaloo Reef that the coral sticks out of the water
Is growing a blog as simple as just making posts or is there more to it than that?
AITA For throwing away my entire lifes worth of gymnastics training that I never wanted?
$30 sale but the total that they took the fees and taxes out of was $28.85
Older couples who have raised children to adulthood, how was it? Do you regret your decision to have kids?
About to go with a 2019 15" MacBook Pro, tell me why I'm dumb
Accept Your Judgement: A Deep Dive
Meanwhile in a comment section...
What’s a weird secret you have?
Whats Next?
Didn't take enough or too much?
Lolicon logic
Absolute idiocy at Forrest Chase (sorry for poor pic quality)