I am trying to perform sequence-labeling whereby each item in the sequence may have multiple labels.
I have come across the
MultiLabelField and have seen some examples of how to use it, e.g. for topic classification where some text input may have multiple labels:
<text>,<labels>: text_1,[a,b,c] text_2,[a] ...
In the corresponding
text_to_instance function in the
dataset_reader, the label field can be assigned like so:
fields["label"] = MultiLabelField(labels)
My problem is a little different; it is basically sequence-labelling but rather than assigning one label per token, a token may have multiple labels. Specifically, in enhanced UD dependency representations, a token may have multiple heads:
<word>, <POS> ==> <head_id> # regular UD parse The , DET ==> 2 team , NOUN ==> 7 who , PRON ==> 4 work , VERB ==> 2 there , ADV ==> 4 are , AUX ==> 7 helpfull , ADJ ==> 0 ... # enhanced UD parse The , DET ==>  team , NOUN ==> [4, 7, 9, 12, 15] who , PRON ==>  work , VERB ==>  there , ADV ==>  are , AUX ==>  helpfull , ADJ ==>  ...
I’m working with a slightly modified
universal_dependencies dataset reader and am quite unsure about how to store my specific
head_indices field. In the regular case, the
head_indices for this sentence are just a list of integers so a
SequenceLabelField can be used:
[2, 7, 4, 2, 4, 7, 0]
But in the enhanced case, it is now a list-of-lists:
[, [4, 7, 9, 12, 15], , , , , ]
Currently, I cannot pass a list-of-lists to
MultiLabelField which expects a sequence of strings or integers:
labels: Sequence[Union[str, int]].
I’m just wondering does anyone have any advice on how to handle this data field or can anyone point me to a similar implementation?