Sequence Labeling with Multiple Labels

Hi there,

I am trying to perform sequence-labeling whereby each item in the sequence may have multiple labels.

I have come across the MultiLabelField and have seen some examples of how to use it, e.g. for topic classification where some text input may have multiple labels:



In the corresponding text_to_instance function in the dataset_reader, the label field can be assigned like so:
fields["label"] = MultiLabelField(labels)

My problem is a little different; it is basically sequence-labelling but rather than assigning one label per token, a token may have multiple labels. Specifically, in enhanced UD dependency representations, a token may have multiple heads:

<word>, <POS> ==> <head_id>

# regular UD parse
The , DET ==> 2
team , NOUN ==> 7
who , PRON ==> 4
work , VERB ==> 2
there , ADV ==> 4
are , AUX ==> 7
helpfull , ADJ ==> 0

# enhanced UD parse
The , DET ==> [2]
team , NOUN ==> [4, 7, 9, 12, 15]
who , PRON ==> [2]
work , VERB ==> [2]
there , ADV ==> [4]
are , AUX ==> [7]
helpfull , ADJ ==> [0]

I’m working with a slightly modified universal_dependencies dataset reader and am quite unsure about how to store my specific head_indices field. In the regular case, the head_indices for this sentence are just a list of integers so a SequenceLabelField can be used:
[2, 7, 4, 2, 4, 7, 0]
But in the enhanced case, it is now a list-of-lists:
[[2], [4, 7, 9, 12, 15], [2], [2], [4], [7], [0]]

Currently, I cannot pass a list-of-lists to MultiLabelField which expects a sequence of strings or integers: labels: Sequence[Union[str, int]].

I’m just wondering does anyone have any advice on how to handle this data field or can anyone point me to a similar implementation?
Many thanks!

I thought we had run into this before, but I don’t see a built-in Field for it. Maybe what I’m thinking of is in some user’s repository, I’m not sure.

But, basically, I think you just want to combine the functionality of a MultiLabelField and a SequenceLabelField. You could do this in two ways:

  1. Writing your own SequenceMultiLabelField, taking pieces of both classes, and getting exactly the data type you want.
  2. Using a ListField[ListField[LabelField]], which will give you a padded 2d array of labels for each input sequence.

I’ll stop there for now, in case it’s already clear, but if it’s not or you have more questions, feel free to keep asking.

1 Like

Thanks for your help Matt,

I went ahead and created a SequenceMultiLabelField which takes in labels: Sequence[Sequence[Union[str, int]]].

It’s a bit hacky at the moment but it basically inherits much of the functionality of a MultiLabelField but processes the labels from a list-of-lists perspective. I used the dry-run command and it seems to be successfully picking up the multiple heads and labels:

Instance 0:
	Instance with fields:
 	 words: TextField of length 29 with text: 
 		[The, team, who, work, there, are, helpfull, ,, friendly, and, extremely, knowledgeable, and, will,
		help, you, as, much, as, they, can, with, thier, years, of, hands, on, practice, .]
 		and TokenIndexers : {'tokens': 'SingleIdTokenIndexer'} 
 	 pos_tags: SequenceLabelField of length 29 with labels:
 		['DET', 'NOUN', 'PRON', 'VERB', 'ADV', 'AUX', 'ADJ', 'PUNCT', 'ADJ', 'CCONJ', 'ADV', 'ADJ', 'CCONJ',
		'AUX', 'VERB', 'PRON', 'ADV', 'ADV', 'SCONJ', 'PRON', 'AUX', 'ADP', 'PRON', 'NOUN', 'ADP', 'NOUN',
		'ADV', 'NOUN', 'PUNCT']
 		in namespace: 'pos'. 
 	 head_tags: SequenceMultiLabelField with labels: [['det'], ['nsubj', 'nsubj', 'nsubj', 'nsubj', 'nsubj'], ['ref'], ['acl:relcl'], ['advmod'], ['cop'], ['root'], ['punct'], ['conj:and'], ['cc'], ['advmod'], ['conj:and'], ['cc'], ['aux'], ['conj:and'], ['obj'], ['advmod'], ['advmod'], ['mark'], ['nsubj'], ['advcl:as'], ['case'], ['nmod:poss'], ['obl:with'], ['case'], ['compound'], ['compound'], ['nmod:of'], ['punct']] in namespace: 'head_tags'.
 	 head_indices: SequenceMultiLabelField with labels: [[2], [4, 7, 9, 12, 15], [2], [2], [4], [7], [0], [9], [7], [12], [12], [7], [15], [15], [7], [15], [18], [15], [21], [21], [17], [24], [24], [15], [28], [28], [26], [24], [7]] in namespace: 'head_index_tags'.
 	 metadata: MetadataField (print field.metadata to see specific information).