Pet Peeves on 16S rRNA gene sequencing - Noah Fierer

There are enough serious methodological shortcomings of 16s experiments that this would be an easy list to write, but the mark has been missed in this case.

I'm happy to give credit where it's due; two and a half of the eight points made here are worth mentioning. The rest are like worrying about mold around the sink when the house is on fire.

6 - Contamination has been shown to be a serious issue - the scope and extent of which has not been well characterized - and it is frequently not given appropriate consideration.

8 - If your target material is DNA, you're bound to capture some dead stuff. Targeting RNA addresses this problem to some extent, but not enough investigators check to see how much of a difference this makes because RNA is more difficult to do these experiments with and has problems of its own. Like contamination (which it sort of is), it's clear that including dead organisms in your measurement of a microbial population is an issue, but we don't know how widespread it is or how much it effects results.

3 (meh...) - It's true that there are a variety of metrics which are intended for use in traditional ecology and which are inappropriate for 16s-based microbial ecology are used more often than they should be. There are more suitable metrics devised with these experiments in mind which are increasingly widely used. I wouldn't let this one bother you; most people seem to be coming around. After all, its easy and free to fix.

Here's what gets me:

Rarefaction is widely used, but it is a crappy normalization method. In fact, nobody is really sure what the best way to normalize 16s data is.

It's not like that one really matters, though. There are much bigger problems.

The number of reads for a given taxonomy is a horrible estimator of abundance. Things like amplification bias, sequencing bias, variable operon counts (1-10 or more per genome), and growth rate biases (some cells divide rapidly after sample collection, others not so much) can easily combine to be several orders of magnitude.

Compared to these factors, normalizing for read depth is a minor problem.

Diversity is generally supposed to indicate how many different organisms there are. Additionally, some metrics attempt to capture how different the organisms are from one another and/or how evenly distributed the population is. We know that sequencing more deep correlates with diversity (as we capture more and more of the lowly abundant taxa) up to a point. If you plot diversity over read depth, and the value does not converge by the time you get to your actual read depth, your diversity value is meaningless and at the very least does not indicate what it is intended to indicate. This is not something people check for very often.

Also, it can be shown that read counts correlate with qPCR counts of bacteria in a sample, even though this isn't supposed to be true. This skews diversity metrics and decreases the resolution of abundance metrics.

Finally - and this what really gets me - we're comparing and describing these samples in terms of the identities of the organisms, when obviously the important thing is their functional contributions to the system. There may be populations with dramatically different compositions in terms of identity, but with very similar functions. Likewise, because the same species can have different genes and significantly different functional contributions, populations which appear very similar at the level of identity can be very different in important ways in terms of their functional profile.

It's like if we were trying to describe an army, and we used the proportions of people named Steve, Mike, and Bob as the basis of our descriptions. Two Steves and two Mikes give the same value (in terms of relative abundance) for the population as 200 Steves and 200 Mikes. It also matters if Steve has a saber or a 50 cal. It also matters if half the Steves are dead. And it matters if we sample 100 soldiers or 10,000 soldiers, because we're bound to miss the special forces and fighter pilots if we only sample 100.

All in all, 16s NGS microbiome experiments give us very inaccurate, potentially misleading data, which we summarize using metrics that often don't represent what we think they do, and then we compare these values in a way that misses important parts of the underlying biology. It is important to remember that we are knee deep in all sort of BS in this field.

But, when we understand and appreciate the caveats, we can really learn things. I think the author ought to have been harder on the field in general, but I appreciate the sentiments nonetheless.

/r/Microbiome Thread Link - fiererlab.org