Multiple hypothesis correction and feature selection

First, given a full set of genes you might want to start by setting some minimum expression cutoff.

Second, while it doesn't have any statistical significance associated with it, this method of selecting genes doesn't depend on statistical significance. Statistical significance is all about having enough evidence to disprove a certain hypothesis - you don't care about doing that if you are just selecting a subset of genes.

Going back to the first point, there's no statistical significance associated with filtering out genes with very low expression, but it is still generally a helpful technique to cut down on the number of genes you are examining at a time.

/r/bioinformatics Thread