This comment was posted to reddit on Apr 02, 2015 at 3:00 pm and was deleted within 1 day, 12 minutes.

Good answer. To expand, the nesting structure determines how you come up with the V matrix.

Observations of Y_i, Y_j that are assumed to completely independent given X_i, X_j will have zeros plugged in for the appropriate entries of the V matrix. In the standard OLS case where everything is independent and has common residual variance, then your V is as you mention, just sigmahat^{2} * I, where you've pooled observations together to estimate the residual variance.

If you still have independence but you have heteroskedasticity, a simple thing to do is to keep the diagonal matrix structure for V, but for each diagonal entry, you don't use a pooled variance estimate. You instead plug in an empirical estimate of the residual variance for each observation, e.g. (y_i - x_i^{T} betahat)^{2.} This is admittedly not a great estimate of the actual residual variances because each entry is based on one observation, but it's at least unbiased and works reasonably well.

For nested data, e.g. students working in groups, you might assume that there can be correlation in the error structure between students who worked in the same group, but not between students who worked in different groups. Then you use a block diagonal matrix for V, with the empirical covariances between students in the same group as blocks in the matrix, and zeros everywhere else corresponding to the assumption of independent errors when students aren't in the same group. Or you could make more assumptions about residuals, like a common correlation within each group and common student variances. You'd do some pooling to estimate these quantities and then plug those estimates into this block-diagonal matrix form for V, much like how you plugged in the common variance estimate for each diagonal entry in the simple OLS case.

Can't get too wacky though: you certainly don't want to throw away all independence assumptions and estimate the variance in Y conditional on X with V=XX^{T} (the entire empirical covariance matrix). If you plug that in, everything will cancel out.