Pretrained model GPU memory consumption


I’m trying to understand what is the appropriate batch size that I can fit into the GPU while performing fine-tuning for a pretrained model.

I loaded a T5-3b model and checked the memory consumption on the GPU (using nvidia-smi) BEFORE starting to train. The memory consumption was 12GB.
Then, I’ve started the training process, with a batch size of 1, and I got out of memory error.

I wonder what happened that caused this behavior. Wasn’t the model completely loaded before the training has started?

I went through the logs and found the following:
INFO - - Worker 0 memory usage MB: 32792.304
INFO - - GPU 0 memory usage MB: 11958

I saw in the code that the “Worker” memory usage is the “peak memory usage”. I’m not sure I understand how it works when working with GPU - are the parameters being swapped between the CPU and the GPU while training?


When you load the model, all you have done is stored the model weights on the GPU. In order to run inference or train the model, you additionally have to perform computations on the GPU, which takes additional GPU memory. If you’re training, the whole computation graph (including intermediate tensors) needs to be stored on the GPU, so you can perform backpropagation. The model itself is only a part of the GPU memory usage.

So, how to reduce the GPU memory comsuption to make model run well ?