[D] Simple Questions Thread

I am trying to train a model using Google Colab and it has reached the time limit twice, so I started creating checkpoints, however, I have never continued from the checkpoint.

Do I just restart the runtime and without doing anything just do:

trainer.train("/content/chkpnt/checkpoint-1000")

or do I need to run even my trainer args like so:

training_args = Seq2SeqTrainingArguments(
output_dir= "./results",
evaluation_strategy="steps",
eval_steps=100,
learning_rate=2e-5,
per_device_train_batch_size=16,
per_device_eval_batch_size=16,
weight_decay=0.01,
save_total_limit=3,
num_train_epochs=1,
fp16=False

)

tokenizer.add_special_tokens({'pad_token': '[PAD]'})

trainer = Seq2SeqTrainer( model=model, args=training_args, train_dataset=train_tokenized_books, eval_dataset=eval_tokenized_books, tokenizer=tokenizer, data_collator=data_collator, )

trainer.train()

and then do the trainer.train() ?

/r/MachineLearning Thread