[D] What are current alternatives to gradient-based NN training?

I haven't seen much research into alternatives to gradient descent. The general opinion in the field is that if you can get a gradient, you should use it.

I've seen more research into new ways (other than backprop) to implement gradient descent. The idea is that backprop works very well but parallelizes very poorly, so it would be nice if you could use local learning rules instead. Forward-Forward learning is an interesting paper.

There's also a bunch of research into learned optimizers. The idea is that instead of hand-crafting an algorithm to update the weights, you just train another neural network to do it. It could discover multiple optimization algorithms and choose the best one depending on the situation, or even come up with new ones we haven't thought of yet. But so far the biggest roadblock has been the compute costs of training.

/r/MachineLearning Thread