How to Control GPU Placement of Inputs and Modules

Hi, I’m working on a neural IR model that can be summarized as:

  1. A transformer that encodes input A (e.g., BERT/Roberta)
  2. A transformer that encodes input B
  3. Compute the inner product, and make a prediction based on the inner product

I’m having some issues with GPU memory usage (on 1080Tis) so thought I might:

  1. Put transformer_a on GPU0
  2. Put transformer_b on GPU1
  3. Modify the device placement for my two corresponding text fields to match

I originally tried distributed training, but was running into similar memory issues since each GPU still has a copy of each transformer. If I can’t get this to work, I’ll probably fall back to that with smaller batch sizes.

I poked around the source code/docs/examples, and there doesn’t seem to be an “out of the box” way to do this. If there is, would love to hear about it, otherwise do these steps seem reasonable?

  1. Override the TokenIndexer class to accept a device placement
  2. Override the trainer so that when it places the tensors on a GPU, it uses the defined device placement, otherwise falling back to the current assignment
  3. Override the trainer to make it so when it moves the model to the GPU, it moves each sub-model to the right place (perhaps a method like move_to_preferred_device())
  4. Probably do this several times and find places where there will inevitably be device placement mismatches (guessing this will be an issue when saving the model).

Yeah, we unfortunately don’t have a good story around model parallelism at this point. This kind of hacking is probably your best bet. It’s possible that you could get away with just moving things yourself in your Model class. E.g., you could tell the trainer to just use the CPU, and in your model’s forward method move your inputs to the right devices. You’ll have to be careful about moving model parameters, though. The optimizer will expect them to be on the CPU, so you’ll have to reach in somewhere to fix that, somehow.

Thanks for the tip on checking the optimizer. I’m hoping that by controlling the initial device placement I might avoid that, although either way I probably have to copy/paste/override the trainer. If I get something working that isn’t horribly hacky, I’ll update here :slight_smile:

1 Like