Language Model softmax predictions

I’m interested in getting the actual softmax predictions from a model based heavily on the existing LanguageModel model. From what I can see, the softmax step is pushed out to the _SoftmaxLoss class. The actual probabilities are calculated within the forward method there, but that method returns the final calculated loss. I can imagine a few hacky work-arounds like adding a boolean ‘return_probs’ parameter to the forward function and using that in the language model’s decode function, but it seems like there ought to be a more direct way to do this. Am I missing some more elegant or obvious way to do this?

I suppose I can just add a separate function to return what I want. I guess I was a little confused about the interface between AllenNLP and the underlying torch module in a way that made that seem more complicated than it is.

If you just want this for one token at a time, like we have in our language modeling demo on demo.allennlp.org, you should look at our NextTokenLm and MaskedLanguageModel classes, that actually output the probabilities. They also can load large pretrained language models from pytorch-transformers.

I’d like to calculate loss by full batch and sequence for the sake of efficiency, but then recover top_k predictions and recall@N per predicted token during an evaluation step.

I can set it up easily enough to re-calculate these predictions and metrics in the decode step and return them per-instance with a predictor, but I’d also like overall metrics, but I’m not quite clear on how I would do this extra metric calculation during evaluation without running it during training.

Would I use some conditional logic in the forward step of the model to calculate the metric only during evaluation? Does the model have access to some variable that would indicate whether it’s in training or evaluation?

Yes, you can check this in forward by looking at self.training. if not self.training: do_eval_stuff().

1 Like

That’s just what I was looking for–thanks!