LASSO is outperformed by Forward Selection when n is small. Any reason for this?

Think about what they are doing, they are fundamentally different methods of model creation. LASSO is taking every predictor and attempting to fit a coefficient to each penalized by the L2 norm. If you have a lot of collinear features, this is a textbook recipe for instability and L2 penalty will not prevent it. Forward Selection on the other hand is just adding predictors at random and only keeping ones with gain, so once it's seen the "true predictor" as you call it, there is no reduction in loss to add other highly correlated features, avoiding the issue somewhat (due to randomness it could actually still suffer a similar problem, there is no guarantee it doesn't see a bunch of collinear features first and refuse to add the "true predictor" later).

What is your alpha parameter set to in glmnet? If it's 1, you're using pure LASSO regression (L2 penalty). Try alpha = 0 to use pure ridge regression (L1 penalty). This will penalize the model by the number of coefficients added, incentivizing it to throw away the coefficient estimates for collinear features if they can be mostly explained somewhere else. Probably the optimal model is a mixture of the two penalties.

/r/rstats Thread