What is your process for caching Network data to SQLite?

We've doubled-down on this sort of approach in the refactor were just finishing. It's really a great way of working with your data, but I'll warn you that it's not all wine and roses. Assuming you're building something like the G+ or Twitter stream (an infinite list with new items constantly appearing at the "top") some things that you might want to have in the back of your head are;

  1. How and when do you "clean" the database? If the user loads 10 million Tweets, do you allow all those to stay in the db?
  2. How do you handle a "gap" in your data? Assume you've got tweets 1, 2, 3 in your database, and you ask the API for the newest tweets and it gives you back 7, 8, 9. You know that you've missed some in between: there's a gap in your data set.
  3. How do you handle loading more content at the "bottom" of the list? If your API is working with "pages" of data, what you consider page #3 is now drastically different than where the API considers page #3 to be.

I'd love to hear if anyone else has run into these sorts of concerns, and what their solutions look like. For us, I can briefly say our solutions are;

  1. We have a few "hero" models in our database. In the Twitter example this would include tweets. Once and a while we check how many records are in those tables, and if it's too many, we toss everything except the most recent few. We then clean up any models that got abandoned by this process... things like users, avatars, etc.
  2. When we load up a list we always ask the API for the "top most" page of results. If there's no intersection between that set and the set in our database, we remember how many results were "above that gap" and wait for the user to either scroll above that position, or quit the app. In either case, we toss everything below the gap. The user can always scroll down and load pages "at the bottom" to get back to where they were.
  3. A major change in our API was to query for "the page after record 7" as opposed to "page #3". We call these cursors. For our lists that have their data backed by the database, we also store the "top most" and "bottom most" cursor, so we can always resume pagination at either "end" of the data set.

Having this set of tools has made it really nice and easy to add some visual sugar on top of our lists. For example, we can restore the user to the exact content they were looking at last time they left the app and if there happens to be new content "above the fold", we can generate a nice subtle call to action for them to view that content. As near as I can tell, Twitter seems to play these same tricks.

One other hint that's worked out really well is hiding as much as we can behind Repositories. There will be a method in the repository like "getRecentTweets". When this method gets called, a few things happen;

  1. We query the database and instantly return the best results we can (use a ContentProvider/Loader or RXJava or just a callback)
  2. We have a "refresh strategy" that checks if those results are due for an update. Usually this is something like "is the oldest record more that 5 mins old?"
  3. If we're due for a refresh, call the API, and the results to the db, and notify the calling code again (we're using a Loader to handle this, but there's lots of options)
/r/androiddev Thread Parent