Single-token embedding

I’m using a set of pre-trained embeddings loaded from a file for some experiments.

In one case, my data consists of sequences of the tokens I want to embed, which matches well with the TextField field and TextFieldEmbedder module. I can specify a token embedder for the TextFieldEmbedder and pass in the pretrained_file parameter with my pre-trained embedding file.

In the second case, however, I only want to embed a single token per instance. It’s a little less clear how this should be implemented. There isn’t an obviously corresponding Field type, and as the TextFieldEmbedder handles the embedder to indexer mapping, it’s less clear how to set these things up in a different context.

Should I just use a TextFieldEmbedder with a single token as input here, or is there a Field type and indexing/embedding approach that would be a better fit?

I was able to make this work with a LabelField corresponding to the field I wanted to embed in addition to the LabelField for the actual label the model is trying to predict. The LabelField and and embedder just need matching vocabulary namespaces specified, like
‘paper_id’: LabelField(paper_id, label_namespace=‘paper_id_labels’)
and in the corresponding embedder params in the config file:
“vocab_namespace”: “paper_id_labels”,

Using a TextFieldEmbedder with a single token as you suggest sounds totally reasonable to me. You could certainly write a custom SingleWordField and have custom embedding logic within it, but that sounds like extra work and complexity for no benefit.