There are three different
max_length parameters that you can set when developing a BERT model using AllenNLP:
From the docs, it seems like (1) and (2) are used in concert to split the document into segments of this many tokens while the
tokenizer parameter is used to truncate the sequence. Does anyone know whether these can be used in concert with each other? If so, how? What is the returned dictionary where overflow tokens are added (mentioned here)? What is the default behavior when max length in the
token_indexer is none? Thanks!
tokenizer: If set to a number, will limit the total sequence returned so that it has a maximum length. If there are overflowing tokens, those will be added to the returned dictionary
token_indexer: If not None, split the document into segments of this many tokens (including special tokens) before feeding into the embedder. The embedder embeds these segments independently and concatenate the results to get the original document representation. Should be set to the same value as the max_length option on the PretrainedTransformerEmbedder.