I’m trying to understand what is the appropriate batch size that I can fit into the GPU while performing fine-tuning for a pretrained model.
I loaded a T5-3b model and checked the memory consumption on the GPU (using nvidia-smi) BEFORE starting to train. The memory consumption was 12GB.
Then, I’ve started the training process, with a batch size of 1, and I got out of memory error.
I wonder what happened that caused this behavior. Wasn’t the model completely loaded before the training has started?
I went through the logs and found the following:
INFO - allennlp.training.trainer - Worker 0 memory usage MB: 32792.304
INFO - allennlp.training.trainer - GPU 0 memory usage MB: 11958
I saw in the code that the “Worker” memory usage is the “peak memory usage”. I’m not sure I understand how it works when working with GPU - are the parameters being swapped between the CPU and the GPU while training?