Attention Mask Dimension Error

I’m trying to use the PretrainedTransformerMismatched token indexer and token embedder in order to train a model to do NER tagging, using both character embeddings and pretrained BERT embeddings. I’m taking inspiration from this older example, and this discussion.

I load the data and seem to begin training fine. However, after the training seems to be running smoothly for a couple minutes on other examples, I hit a runtime error getting raised from transformers/modeling_bert.py, stating that the size of my attention_scores and attention_mask tensors do not match. I tried to trace this back through the allennlp code in modules/token_embedders/pretrained_transformer_embedder.py but didn’t have much success.

Any help or guidance would be extremely appreciated!

Here’s the stack trace:

Traceback (most recent call last):
  File "src/run.py", line 26, in <module>
    main()
  File "/Users/paigefink/opt/anaconda3/envs/ling-575/lib/python3.8/site-packages/allennlp/commands/__init__.py", line 92, in main
    args.func(args)
  File "/Users/paigefink/opt/anaconda3/envs/ling-575/lib/python3.8/site-packages/allennlp/commands/train.py", line 135, in train_model_from_args
    train_model_from_file(
  File "/Users/paigefink/opt/anaconda3/envs/ling-575/lib/python3.8/site-packages/allennlp/commands/train.py", line 195, in train_model_from_file
    return train_model(
  File "/Users/paigefink/opt/anaconda3/envs/ling-575/lib/python3.8/site-packages/allennlp/commands/train.py", line 259, in train_model
    model = _train_worker(
  File "/Users/paigefink/opt/anaconda3/envs/ling-575/lib/python3.8/site-packages/allennlp/commands/train.py", line 463, in _train_worker
    metrics = train_loop.run()
  File "/Users/paigefink/opt/anaconda3/envs/ling-575/lib/python3.8/site-packages/allennlp/commands/train.py", line 525, in run
    return self.trainer.train()
  File "/Users/paigefink/opt/anaconda3/envs/ling-575/lib/python3.8/site-packages/allennlp/training/trainer.py", line 732, in train
    train_metrics = self._train_epoch(epoch)
  File "/Users/paigefink/opt/anaconda3/envs/ling-575/lib/python3.8/site-packages/allennlp/training/trainer.py", line 500, in _train_epoch
    batch_outputs = self.batch_outputs(batch, for_training=True)
  File "/Users/paigefink/opt/anaconda3/envs/ling-575/lib/python3.8/site-packages/allennlp/training/trainer.py", line 406, in batch_outputs
    output_dict = self._pytorch_model(**batch)
  File "/Users/paigefink/opt/anaconda3/envs/ling-575/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/Users/paigefink/ling-575-clinical-nlp/src/models/ner_lstm_crf.py", line 60, in forward
    encoded = self._encoder(embedded, mask)
  File "/Users/paigefink/opt/anaconda3/envs/ling-575/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/Users/paigefink/opt/anaconda3/envs/ling-575/lib/python3.8/site-packages/allennlp/modules/text_field_embedders/basic_text_field_embedder.py", line 88, in forward
    token_vectors = embedder(**tensors, **forward_params_values)
  File "/Users/paigefink/opt/anaconda3/envs/ling-575/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/Users/paigefink/opt/anaconda3/envs/ling-575/lib/python3.8/site-packages/allennlp/modules/token_embedders/pretrained_transformer_mismatched_embedder.py", line 74, in forward
    embeddings = self._matched_embedder(
  File "/Users/paigefink/opt/anaconda3/envs/ling-575/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/Users/paigefink/opt/anaconda3/envs/ling-575/lib/python3.8/site-packages/allennlp/modules/token_embedders/pretrained_transformer_embedder.py", line 124, in forward
    embeddings = self.transformer_model(**parameters)[0]
  File "/Users/paigefink/opt/anaconda3/envs/ling-575/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/Users/paigefink/opt/anaconda3/envs/ling-575/lib/python3.8/site-packages/transformers/modeling_bert.py", line 785, in forward
    encoder_outputs = self.encoder(
  File "/Users/paigefink/opt/anaconda3/envs/ling-575/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/Users/paigefink/opt/anaconda3/envs/ling-575/lib/python3.8/site-packages/transformers/modeling_bert.py", line 406, in forward
    layer_outputs = layer_module(
  File "/Users/paigefink/opt/anaconda3/envs/ling-575/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/Users/paigefink/opt/anaconda3/envs/ling-575/lib/python3.8/site-packages/transformers/modeling_bert.py", line 368, in forward
    self_attention_outputs = self.attention(hidden_states, attention_mask, head_mask)
  File "/Users/paigefink/opt/anaconda3/envs/ling-575/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/Users/paigefink/opt/anaconda3/envs/ling-575/lib/python3.8/site-packages/transformers/modeling_bert.py", line 313, in forward
    self_outputs = self.self(
  File "/Users/paigefink/opt/anaconda3/envs/ling-575/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/Users/paigefink/opt/anaconda3/envs/ling-575/lib/python3.8/site-packages/transformers/modeling_bert.py", line 238, in forward
    attention_scores = attention_scores + attention_mask
RuntimeError: The size of tensor a (162) must match the size of tensor b (160) at non-singleton dimension 3

To be clear, this error does not arise the first time this function is called - it is successfully executed 30 times on different inputs before it errors.

Here’s my config file:

local bert_model = "bert-base-uncased";
local max_length = 128;
{
  dataset_reader: {
    type: 'drug_ner_reader',
    token_indexers: {
       tokens: {
          type: "pretrained_transformer_mismatched",
          model_name: bert_model,
          max_length: max_length
        },
      token_characters: {
        type: 'characters',
        min_padding_length: 1,
      },
    },
    lazy: false,
    only_drugs: true,
  },
  data_loader: {
    batch_sampler: {
      type: 'bucket',
      batch_size: 10
    }
  },
  train_data_path: 'data/train_split/',
  validation_data_path: 'data/validation_split/',
  model: {
    type: 'ner_lstm_crf',
    embedder: {
      token_embedders: {
        tokens: {
          type: "pretrained_transformer_mismatched",
          model_name: bert_model
        },
        token_characters: {
          type: 'character_encoding',
          embedding: {
              embedding_dim: 16,
              vocab_namespace: "token_characters",
          },
          encoder: {
              type: 'lstm',
              input_size: 16,
              hidden_size: 16
          }
        }
      },
    },
    encoder: {
      type: 'lstm',
      input_size: 784,
      hidden_size: 784,
      bidirectional: true
    }
  },
  trainer: {
    num_epochs: 2,
    patience: 2,
    grad_clipping: 5.0,
    validation_metric: '-loss',
    optimizer: {
      type: 'adam',
      lr: 0.003
    }
  }
}

And here’s the forward method of my model:


    def forward(self,
                tokens: Dict[str, torch.Tensor],
                label: torch.Tensor = None,
                metadata: List[Dict[str, Any]] = None) -> Dict[str, torch.Tensor]:
        mask = get_text_field_mask(tokens)

        embedded = self._embedder(tokens)
        encoded = self._encoder(embedded, mask)
        classified = self._classifier(encoded)

        viterbi_tags = self._crf.viterbi_tags(classified, mask)
        viterbi_tags = [path for path, score in viterbi_tags]
        broadcasted = self._broadcast_tags(viterbi_tags, classified)

        output: Dict[str, torch.Tensor] = {}

        if label is not None:
            log_likelihood = self._crf(classified, label, mask)
            self._f1(broadcasted, label, mask)
            output['loss'] = -log_likelihood

        output['logits'] = classified

        if metadata:
            output['sentence'] = [instance_metadata['sentence'] for instance_metadata in metadata]

        return output

Please let me know if there’s any other information I can provide that would be useful.

This part of your stack trace seems off:

According to your config, your self._encoder looks like it should be an LSTM, but the stack trace is showing that it’s a BasicTextFieldEmbedder. Is there a bug in your constructor, where you’re assigning the wrong things to self?

Thanks so much for the response! I really appreciate the help.

I see what you’re saying about the stack trace. I think something might have been wrong with the stack trace I sent before, but I can confirm that both in the constructor and where this gets called in forward, this is an LSTM and not BasicTextFieldEmbedder. (I printed it out in both places during the training to double check.)

And here’s my constructor, just for reference:

class NerLstmCRF(Model):
    def __init__(self,
                 vocab: Vocabulary,
                 embedder: TextFieldEmbedder,
                 encoder: Seq2SeqEncoder) -> None:
        super().__init__(vocab)

        self._embedder = embedder
        self._encoder = encoder
        self._classifier = torch.nn.Linear(
            in_features=encoder.get_output_dim(),
            out_features=vocab.get_vocab_size('labels')
        )
        self._crf = ConditionalRandomField(
            vocab.get_vocab_size('labels')
        )

        self._f1 = SpanBasedF1Measure(vocab, 'labels')

I ran again, and here’s the current stack trace:

Traceback (most recent call last):
  File "src/run.py", line 26, in <module>
    main()
  File "/Users/paigefink/opt/anaconda3/envs/ling-575/lib/python3.8/site-packages/allennlp/commands/__init__.py", line 92, in main
    args.func(args)
  File "/Users/paigefink/opt/anaconda3/envs/ling-575/lib/python3.8/site-packages/allennlp/commands/train.py", line 135, in train_model_from_args
    train_model_from_file(
  File "/Users/paigefink/opt/anaconda3/envs/ling-575/lib/python3.8/site-packages/allennlp/commands/train.py", line 195, in train_model_from_file
    return train_model(
  File "/Users/paigefink/opt/anaconda3/envs/ling-575/lib/python3.8/site-packages/allennlp/commands/train.py", line 259, in train_model
    model = _train_worker(
  File "/Users/paigefink/opt/anaconda3/envs/ling-575/lib/python3.8/site-packages/allennlp/commands/train.py", line 463, in _train_worker
    metrics = train_loop.run()
  File "/Users/paigefink/opt/anaconda3/envs/ling-575/lib/python3.8/site-packages/allennlp/commands/train.py", line 525, in run
    return self.trainer.train()
  File "/Users/paigefink/opt/anaconda3/envs/ling-575/lib/python3.8/site-packages/allennlp/training/trainer.py", line 732, in train
    train_metrics = self._train_epoch(epoch)
  File "/Users/paigefink/opt/anaconda3/envs/ling-575/lib/python3.8/site-packages/allennlp/training/trainer.py", line 500, in _train_epoch
    batch_outputs = self.batch_outputs(batch, for_training=True)
  File "/Users/paigefink/opt/anaconda3/envs/ling-575/lib/python3.8/site-packages/allennlp/training/trainer.py", line 406, in batch_outputs
    output_dict = self._pytorch_model(**batch)
  File "/Users/paigefink/opt/anaconda3/envs/ling-575/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/Users/paigefink/ling-575-clinical-nlp/src/models/ner_lstm_crf.py", line 59, in forward
    embedded = self._embedder(tokens)
  File "/Users/paigefink/opt/anaconda3/envs/ling-575/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/Users/paigefink/opt/anaconda3/envs/ling-575/lib/python3.8/site-packages/allennlp/modules/text_field_embedders/basic_text_field_embedder.py", line 88, in forward
    token_vectors = embedder(**tensors, **forward_params_values)
  File "/Users/paigefink/opt/anaconda3/envs/ling-575/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/Users/paigefink/opt/anaconda3/envs/ling-575/lib/python3.8/site-packages/allennlp/modules/token_embedders/pretrained_transformer_mismatched_embedder.py", line 74, in forward
    embeddings = self._matched_embedder(
  File "/Users/paigefink/opt/anaconda3/envs/ling-575/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/Users/paigefink/opt/anaconda3/envs/ling-575/lib/python3.8/site-packages/allennlp/modules/token_embedders/pretrained_transformer_embedder.py", line 124, in forward
    embeddings = self.transformer_model(**parameters)[0]
  File "/Users/paigefink/opt/anaconda3/envs/ling-575/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/Users/paigefink/opt/anaconda3/envs/ling-575/lib/python3.8/site-packages/transformers/modeling_bert.py", line 784, in forward
    encoder_outputs = self.encoder(
  File "/Users/paigefink/opt/anaconda3/envs/ling-575/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/Users/paigefink/opt/anaconda3/envs/ling-575/lib/python3.8/site-packages/transformers/modeling_bert.py", line 405, in forward
    layer_outputs = layer_module(
  File "/Users/paigefink/opt/anaconda3/envs/ling-575/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/Users/paigefink/opt/anaconda3/envs/ling-575/lib/python3.8/site-packages/transformers/modeling_bert.py", line 367, in forward
    self_attention_outputs = self.attention(hidden_states, attention_mask, head_mask)
  File "/Users/paigefink/opt/anaconda3/envs/ling-575/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/Users/paigefink/opt/anaconda3/envs/ling-575/lib/python3.8/site-packages/transformers/modeling_bert.py", line 312, in forward
    self_outputs = self.self(
  File "/Users/paigefink/opt/anaconda3/envs/ling-575/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/Users/paigefink/opt/anaconda3/envs/ling-575/lib/python3.8/site-packages/transformers/modeling_bert.py", line 238, in forward
    attention_scores = attention_scores + attention_mask
RuntimeError: The size of tensor a (144) must match the size of tensor b (142) at non-singleton dimension 3

Maybe I had edited the file mid-run or something that caused the issue in the earlier stack trace?

Ok, this looks like a bug now. Can you open an issue on github, so we are sure to keep track of this and make sure it’s fixed? What would be particularly helpful is knowing on what inputs exactly it fails. If you’re able to narrow it down to one input that causes the error and give us a minimal error reproduction, we’d be able to diagnose and fix the problem a lot easier.

Will do. I’ll try to collect as much information as possible for the issue description.
Thanks again for the prompt reply!