here is my scenario.
I want to use ELMo as a language model, but it seems the currently ELMo model does not contain the final weight to the number of words (i.e., word vocab size)
I have a sentence: “I take an umbre_lla from the shop”. I want to correct this sentence to be “I take an umbrella from the shop”. What I can think of is to use a something like masked language model to predict the “umbrella” word again at the 4th position.
But it seems I cannot use bert/roberta because they will simply break down the word into pieces during tokenization.
Thus, I want to use the ELMo final layer to predict the word.