Extending vocabulary of a PretrainedTransformerEmbedder before fine-tuning


I wish to fine-tune a model which uses PretrainedTransformerEmbedder as its embedding layer. Huggingface APIs offer a way to extend the vocabulary of the tokenizer as well as the model (ref: https://huggingface.co/transformers/main_classes/tokenizer.html#transformers.PreTrainedTokenizer.add_tokens ). But I could not find a wrapper around this in AllenNLP. Is there a way for me to use this workflow of extending vocabulary of the embedding layer (including but not limited to PretrainedTransformerEmbedder) directly using a config file?


This is something that we should support, and we currently don’t have a good story here. Can you open a feature request issue on github with more detail about what exactly you want to do?

Created an issue https://github.com/allenai/allennlp/issues/4397.