Current Bell Curve

Okay maybe exponential was wrong but you omitted "without new technics" btw.

I'm in no way an expert but at first glance the paper explains that previous models training were computationally suboptimal and that it doesn't make sense to create bigger models as of now which isn't contradictory with my statement.

As for the scaling it's stating that the data size scaling isn't as big as the compute power required or the model size you'll end up with, ie: 10x in compute for 25x bigger model but only 4x the data.

Now, the training paradigm have changed between gpt and gpt2 and gpt 3.5 was a confirmation of that but roughly what I was saying was that we won't see much larger models in the next future because it's not efficient to do so and it won't make sense to simply throw money at the problem, hence the plateau.

Here's deepming estimations if you like numbers: https://i.redd.it/83kb9w4ywyq81.png

/r/OpenAI Thread Parent Link - i.redd.it