My paper: Tensorizing neural networks, NIPS-2015

Huge caveat that I have never published a paper, and I dont have a pHD, here were my thoughts as I read the paper: - personally, I dislike the abuse of the word 'tensor', as a verb, to mean something different than my intuitive of understanding of 'to make something into a tensor'. Fully connected layers are already tensorized: the input is a tensor, the output is a tensor, the weight tensor is a ... tensor :-P - similarly I dont really like the name 'tensor-train' decomposition, but it seems that this name was previously created, by Oseledets, and you are just re-using the existing algorithm. The Oseledets paper has 215 citations, which sounds like a lot to me, so it sounds like it's reasonably mainstream. So, TT-decomposition it is :-) - however, since TT-decomposition already exists, then it sounds to me like you are not proposing a new mathematical algorithm as such, so much as applying it in a new situation, that sounds like a pretty good match for its original purpose? - therefore, it sounds to me like the main contribution of the paper is something like an experimental paper, rather than a theoretical paper? - therefore I would expect to see strong experimental results, in the summary, in my opinion - having said that, it looks like the back-propagation formulae may possibly be novel? But this is not pointed out in the summary. (Is it novel? I'm not sure) - as far as the experimental results, I ignored cifar, since I vaguely remember its not a very representative case, and looked directly at imagenet/vgg - returning briefly to the theoretical assymptotic results, table 1: - it seems that the forward pass looks like it might be quite a lot slower? though the memory used is reduced a bit - it seems that the backward pass is very slow, and the memory usage is increased a lot? I think it depends on what the parameters are, ie r etc. So, presumably with appropriate values for r etc, you could get the pass not too horribly slow, and memory usage not too high, but what is the effect on the validation accuracy etc? So, I'd expect to see a table that shows different values of r, the backwrd/forward pass times for those, and the validation accuracy etc, after a fixed number of iterations (same for both) - table 3 provides part of this: the forward pass times. It doesnt provide any information on what happened to the validation accuracy, or prediction accuracy. It ignores the backward pass. Looking at the actual numbers returned, for forward pass: - on cpu, there is a speedup, but cpu is not really used a lot currently, per my understanding, for imagenet/vgg - on gpu, the differnece is about 30%, which doesnt sound like a huge difference to me? (given the unspecified effect on validation error, and training time)

My impression from reading the paper is, you found a technique which sounds like it could be useful to improve performance of the FC-layer, and put a lot of time into doing experiments on various geometries, and so on, but when you did the experiments, the numbers you were getting out weren't quite what you were hoping? I do however have the impression that it sounds like it could be useful, for prediction only, on low-end devices, even if take a lot of training power, but I couldnt clearly find the evidence I was looking for in the paper to find out at what cost, in terms of predictive accuracy etc?

/r/MachineLearning Thread