Using ELMo as a masked language model

here is my scenario.

I want to use ELMo as a language model, but it seems the currently ELMo model does not contain the final weight to the number of words (i.e., word vocab size)

My problem:
I have a sentence: “I take an umbre_lla from the shop”. I want to correct this sentence to be “I take an umbrella from the shop”. What I can think of is to use a something like masked language model to predict the “umbrella” word again at the 4th position.
But it seems I cannot use bert/roberta because they will simply break down the word into pieces during tokenization.

Thus, I want to use the ELMo final layer to predict the word.

We don’t keep around the softmax layer because it would add a whole lot of parameters to the model that aren’t used in normal usage. You’re probably better off trying to figure out how to use a masked language model for this, anyway, as those are more directly trained for what you want to do. We have a masked language model that does load softmax layer weights; see here.