Multitask example for POS, NER

** The intent ** : To use multitask for POS and NER

** The code ** :

def text_to_instance(self, tokens: List[Token], labels: List[str]=None, labels1: List[str]=None):
     fields = {}
     text_field = TextField(tokens, self.token_indexers)
     fields['tokens'] = text_field
     if labels:
         fields['labels'] = SequenceLabelField(labels, text_field)
         fields['labels1'] = SequenceLabelField(labels1, text_field)
     return Instance(fields)

**The error ** : ConfigurationError: 'A gold label passed to Categorical Accuracy contains an id >= 2, the number of classes.'

Debugging:

** the model **

embedder:  BasicTextFieldEmbedder(
  (token_embedder_tokens): Embedding()
)
encoder:  PytorchSeq2SeqWrapper(
  (_module): LSTM(128, 128, batch_first=True, bidirectional=True)
)
hidden2labels:  Linear(in_features=256, out_features=57, bias=True)
hidden2labels1:  Linear(in_features=256, out_features=2, bias=True) 
<--- This is not OK

** An example of an instance ** :

Instance with fields:
 	 tokens: TextField of length 14 with text: 
 		[They, marched, from, the, Houses, of, Parliament, to, a, rally, in, Hyde, Park, .]
 		and TokenIndexers : {'tokens': 'SingleIdTokenIndexer'} 
 	 labels: SequenceLabelField of length 14 with labels:
 		['O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'B-gpe', 'I-gpe', 'O']
 		in namespace: 'labels'. 
 	 labels1: SequenceLabelField of length 14 with labels:
 		('PRP', 'VBD', 'IN', 'DT', 'NNS', 'IN', 'NN', 'TO', 'DT', 'NN', 'IN', 'NNP', 'NNP', '.')
 		in namespace: 'labels'.  <-------- This is not OK

https://colab.research.google.com/drive/1TqsgdG4kbdEzyNAfnLGI45PCatunCoW6
based on that: https://github.com/mhagiwara/realworldnlp/blob/master/examples/ner/ner.ipynb

Can you help me on that?

As you point out in your instance example, the labels1 SequenceLabelField is also in the “labels” namespace. This is the default when no namespace is specified. (https://github.com/allenai/allennlp/blob/master/allennlp/data/fields/sequence_label_field.py#L36) Could you change

fields['labels1'] = SequenceLabelField(labels1, text_field)

to

fields['labels1'] = SequenceLabelField(labels1, text_field, label_namespace="pos_labels")

, and change

self.hidden2labels1 = torch.nn.Linear(in_features=encoder.get_output_dim(),
                                         out_features=vocab.get_vocab_size('labels1'))

to

self.hidden2labels1 = torch.nn.Linear(in_features=encoder.get_output_dim(),
                                         out_features=vocab.get_vocab_size('pos_labels'))

There’s some magic in the “_labels” suffix that prevents the vocab from automatically creating the padding and unknown tokens, so you want a namespace with that suffix. I think this also explains the mysterious 2 classes your error is referring to.

1 Like

Thank you Kevin! Your answer is very insightful !!