What is the place of maths in basic/average neural net theory ? (interested but wondering if it could make a good "research" subject)

I would say neural networks fall into applied mathematics and not pure mathematics; that is, it uses math as a tool, but doesn't offer too much in the sense of theory (yet -- maybe the field is just not there right now). It will depend on your project requirements whether that's a good fit.

To understand neural networks, the math is broad, but you don't have to understand it at a very deep level to be able to code one up from scratch. I suggest that the mathematical foundations of neural nets are statistics, linear algebra, and calculus.

Statistics helps define the framework of how to do machine learning. Machine learning has its roots in statistical classification and statistical regression. The task goes like this: you have some data that are pairs of inputs and outputs. You want to create a model of the data that predicts what the outputs are based on only the inputs. So there are two operations you want to do on a model. One, train or fit the model to a given dataset of inputs and outputs. Two, given only an input, use the trained model to predict the corresponding output. In order to compare how one model is better fit than another, you choose a statistically-motivated measure of performance called a loss function. A loss function takes in a model and the dataset and outputs a number: how badly did the model do? If your model fits the data very well, the loss is low. If it sucks, the loss is high.

Linear algebra provides the basic data structures of neural nets. There are lots of possible models out there for machine learning, but neural networks are just one class of models. This specific type says that the inputs are going to be vectors, and we're going to apply a bunch of matrix multiplications by some matrices (which are parameters to the neural network) on the input (plus some other functions) to get the output. So the data and parameters are organized into vectors and matrices. Understanding how vectors and matrices work allows you to implement the predict operation (given an input, compute its output).

But, you still haven't trained the neural net, so this is where calculus comes in. Fitting a model to data is an optimization problem, because you want to find the parameters to the model that minimize the value of the loss function. Well, calculus can provide the tools to find that minimum. Remember that the minima of a differentiable function appear where the derivative equals 0. If you're somewhere where the function is not at a minimum, you can take a step in the direction of a minimum by following the derivatives; this is called gradient descent. It turns out (by no accident) that the loss function and neural nets are differentiable with respect to their parameters, so you can do the same thing to find which parameters to the neural net minimize the loss function. You just have to be comfortable with dealing with calculus on vectors and matrices. That's how you get the parameters of the neural network to be able to do the prediction.

So this is pretty much what you need to know (without all the actual details) to implement a neural net. If you wanted to do research to drive the field forward, to build more effective neural nets that have better performance on some AI task, you would have to have a deeper understanding of statistics, linear algebra, or calculus to provide the mathematical motivation.

/r/AskComputerScience Thread