ReSavr
Recent
Home
bigdata
Good books on big data? Introductory and advanced. (for a trained statistician/s...
This comment was posted to reddit on Jan 25, 2015 at 8:06 pm and was deleted within 14 hour(s) and 47 minutes.
Good books on big data? Introductory and advanced. (for a trained statistician/software engineer)
Here are some topics/URLs to read:
8 fallacies of distributed computing:
http://www.rgoarchitects.com/Files/fallacies.pdf
Gossip Protocols
Anti-entropy algorithms
Merkle Trees
2-phase commit
Paxos
http://en.wikipedia.org/wiki/Paxos_%28computer_science%29
http://research.microsoft.com/en-us/um/people/lamport/pubs/paxos-simple.pdf
http://pine.cs.yale.edu/pinewiki/Paxos
http://www.cs.cornell.edu/courses/cs7412/2011sp/paxos.pdf
Inverted index vs. index
http://en.wikipedia.org/wiki/Inverted_index
Phi Accural Failure Detection
http://ddg.jaist.ac.jp/pub/HDY+04.pdf
Codd's "A Relational Model of Data for Large Shard Data Banks"
http://www.seas.upenn.edu/~zives/03f/cis550/codd.pdf
Jim Gray's "The Transaction Concept"
http://research.microsoft.com/en-us/um/people/gray/papers/theTransactionConcept.pdf
Gregor Hohpe's "Starbucks Does Not Use Two-Phase Commit"
http://www.eaipatterns.com/ramblings/18_starbucks.html
Flixster sharding strategies
http://lsvp.wordpress.com/2008/06/20
Berkley "The Case for Shared Nothing"
http://db.cs.berkeley.edu/papers/hpts85-nothing.pdf
IDC research paper "The Expanding Digital Universe"
http://www.emc.com/collateral/analyst-reports/expanding-digital-idc-white-paper.pdf
Amazon's Dynamo paper
http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html
Dwight Merriman (video on CAP spectrum)
http://bit.ly/7r6kRg
Facebook Cassandra paper
http://www.cs.cornell.edu/projects/ladis2009/papers/lakshman-ladis2009.pdf
Twitter Cassandra analytics
http://engineering.twitter.com/2010/07/cassandra-at-twitter-today.html
Cloudkick using Cassandra for metrics
https://www.cloudkick.com/blog/2010/mar/02/4_months_with_cassandra
Staged Event-Driven Architecture (SEDA)
http://www.eecs.harvard.edu/~mdw/proj/seda
BigTable
http://research.google.com/archive/bigtable.html
Hadoop-related tech:
map reduce
whirr
flume
oozie
mahout
hue
hbase
flume
hive
pig
spark - see the white paper spark is based on:
https://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf
comparison of spark and stratosphere:
http://www.diva-portal.org/smash/get/diva2:605106/FULLTEXT01.pdf
yarn
Books:
Hadoop: The Definitive Guide
Cassandra: The Definitive Guide
MongoDB Applied Design Patterns
Graph Databases (Robinson)
Learning Spark (Karau et al) - this one's not published yet (I have an early release copy)
/r/bigdata
Thread
Versions (2)
Original
Edit 1
Previous
Next
More Random Comments
UFC 248: Main Event Octagon Interviews
Koliko ste džukela?
Atheist looking for help RPing a religious zealot
I am having panic attack after panic attack, I’m dependent on a person and they’ve walked out of my life. I’m having very dark and irrational thoughts.
Nazi gets punched back to 1933
Retail Workers of Reddit, What's the oddest conversation you've "overhead" or listened to?
"I'm Eating Out A Pussy," She said
Australian Government secures 54 million surgical masks and P2/N95 for medical professionals
The original and still the best
Is there a fast way to calm down from intense feelings?
My (46F) husband (50M) has a porn collection of one of his students
‘It’s not just a Lighthouse problem, it’s a Saskatoon problem:’ executive director opens up about the struggles, future of the facility
[Spoiler] Co Main Event loser has been discharged from the hospital
My (46F) husband (50M) has a porn collection of one of his students
When did the popular kid’s life fall apart in a few seconds?