Are NN-based evaluation functions already superhuman level?

İt wouldn't be that strong. I was experimenting with a chess engine called Sunfish (it's not really strong, 2000 on lichess). I implemented both NNUE and Lc0s second generation models (independently) and in both occasions the result was not too different from the static eval. NNUE was sth like 1800 and Lc0's eval was sth like 2100

Though you should note that raw data from NN evaluations are not the primary way chess engines evaluate positions. They are somehow added to the static eval, depending on the eval function.

As four your question, raw NNUE eval is not really strong at all. Its like when you take a look at the board and guess who is winning at the moment. So is the Lc0's nets, but with a higher success rate (and IMO Lc0's net at depth 1, seldepth 1 can easily beat a 1200 on lichess).

/r/chess Thread