Coref-spanbert-large-2020.02.27 model is limiting in its usage due to high GPU memory requirement

Coreference resolving using coref-spanbert-large-2020.02.27.tar.gz model is too limiting from the GPU memory requirement perspective. Due to memory requirement, batch-size of texts for predict_batch_json has to be generally kept small. This is limiting the usage of this model in a production scenario (due to the combined effect of its inference speed and small batch-size).
Is there any plan to port older coref-bert-lstm-2020.02.12.tar.gz model in AllenNLP1.1? Or is there some alternate thought to reduce GPU memory requirement for ‘coref-spanbert’ during prediction usage?