Questions thread #6 2016.05.23

Couple days ago I found out my definition of an epoch differed from what is the standard. So I changed my code to reflect what I believe is the normal way of training with data:

for epoch in xrange(epochs):
    np.random.shuffle(feats)
    np.random.shuffle(targs)

    for start in xrange(0,len(feats), batch_size):
        end = start + batch_size
        batch_f, batch_t = feats[start:end],targs[start:end]

        self.sess.run(self.train_step_m, feed_dict = {self.neurons_inp: batch_f, self.targets: batch_t,
                            self.eta: eta, self.kp_prob: kp_prob, self.lmbda:lmbda})

        if epoch%score_pt==0:
            print("error: "+str(self.cost_sum(feats, targs))+"  score:"+str(self.score(feats, targs)))

But I'm finding that the standard epochs performs quite badly. It gets stuck at saddle points and won't get out even if training for a while (trains slower too). This continues to happen even if I reduce the size of the data I'm working with significantly (down to 20 000 samples). Using my old technique I get the overfitting one would expect. I'm thinking it's possible my method could be better, because it's more likely for the same samples to be reused and thus the same step being taken which might help getting out of a saddle point? At the same time I'm considering I made an error but I can't think of anything... here's my original method

batch_inds = np.random.randint(len(feats)-batch_size/2, size=(epochs,))

for epoch in xrange(epochs):
    start = batch_inds[epoch]
    end = start+batch_size
    batch_f, batch_t, = feats[start:end], targs[start:end]


    self.sess.run(self.train_step_m, feed_dict = {self.neurons_inp: batch_f, self.targets: batch_t,
                        self.eta: eta, self.kp_prob: kp_prob, self.lmbda: lmbda})

    if epoch%score_pt==0:
                print("error: "+str(self.cost_sum(feats, targs))+"  score:"+str(self.score(feats, targs)))

With my method I get close to .95 score after 1min. The 'standard' method reaches a saddle point (with a score of ~0.1) after 1:30min and stays there...

/r/MachineLearning Thread