For discussion: Multi-task Learning in AllenNLP v. 1+

Hi everyone! I am working on a project that requires multilingual multitask learning (in this post I will touch only multi-task part). I thought it might be useful to discuss this functionality before 1.0.0 release and in general.

The only piece from multi-task learning functionality that is left is InterleavingDatasetReader.
It was used in conjunction with HomogeneousBatchIterator which is gone. Another approach was callback-trainer based, which is also now gone.

Approaches
It might be nice to have a recommended way of doing multi-task learning in allennlp v.1 with some support from the core library.

Multi-task learning needs implementation on the data level and on the modeling level.

I see the following possible approaches to implement in on data level:

  1. MultitaskDatasetReader returns pytorch’s ConcatDataset where all the datasets are stored. Then HomogeneousBatchSampler samples batches in round_robing / sequential / etc. fashion (ideally with ability to provide custom data loading schedulers). This dataset and sampler go as arguments to a single train_dataloader. FlagField indicating the batch’s task becomes a part of each Instance. GradientDescentTrainer can be used in this setup without modification.
  2. Separate allennlp’s BucketBatchLoader is created for each dataset individually. Then batches are sampled using helper round_robin / sequential / etc. function inside MultiTaskTrainer. FlagField indicating the batch’s task becomes a part of each Instance.

And following approaches to support it on modeling level:

  1. MultiTaskModel's forward function determines which tasks’s _forward function to call (e.g. _forward_ner or _forward_nli) based on FlagField value indicating the batch’s task. This model class also aggregates the metrics and losses for all tasks… GradientDescentTrainer can be used in this setup without modification.
  2. MultipleModelsAggregate model accepts multiple model instances as an input to its constructor, shares specified parameters between them, and picks the model to execute inside the forward function based on the incoming batch. The same goes for predictors. In this setup existing models from allennlp-models together with their predictors can be reused in the multitask setup without code duplication as in 1).
  3. A dict of Models is read from the config file together with some indication of modules that are to be shared between Models. Than MultipleModelsTrainer decides which Model to train at the current iteration based on the incoming batch. Training a single model will be a special case of this more general MultipleModelsTrainer.

May there be other options that are worth considering? Maybe something with the lightweight Task class?

There are distributed training, multiple optimizers and other issues that should not be forgotten as well (e.g. single data loader vs multiple data loaders in distributed training).

In my project, I will probably follow the simplest data approach #1 and simplest modeling approach #1, but I already see myself copying code over (from existing models into my MultiTaskModel) and knocking around with metrics and predictors. It’s not too bad, but in modeling approach #2 or #3 one would get everything for free and could easily plug and play with different models in multitask setup.

1 Like

I looked around at permissions, and I think I fixed the settings so that you can edit your original post.

On your approaches, both of your option 1s look reasonable to me. I’m not sure which is better between data option 1 and 2; either one seems fine. Do you have a reason to prefer one over the other?

For the modeling approaches, the tricky thing if you want to use configuration files is sharing parameters. We don’t have a good way of doing this, and I don’t think we will add any special hacks to make this work. The best thing I can think of is to use Lazy with a particular model constructor, where your (specific, not generic) ModelAggregate specifies sub-model arguments and passes them through in lazy_model.construct(embedder=embedder). I can give more detail about how this might work if you need it. I think it doesn’t require any changes to allennlp to make it work.

1 Like