I am trying to train a model using Google Colab and it has reached the time limit twice, so I started creating checkpoints, however, I have never continued from the checkpoint.
Do I just restart the runtime and without doing anything just do:
trainer.train("/content/chkpnt/checkpoint-1000")
or do I need to run even my trainer args like so:
training_args = Seq2SeqTrainingArguments(
output_dir= "./results",
evaluation_strategy="steps",
eval_steps=100,
learning_rate=2e-5,
per_device_train_batch_size=16,
per_device_eval_batch_size=16,
weight_decay=0.01,
save_total_limit=3,
num_train_epochs=1,
fp16=False
)
tokenizer.add_special_tokens({'pad_token': '[PAD]'})
trainer = Seq2SeqTrainer( model=model, args=training_args, train_dataset=train_tokenized_books, eval_dataset=eval_tokenized_books, tokenizer=tokenizer, data_collator=data_collator, )
trainer.train()
and then do the trainer.train()
?