A Seq2Seq training config example

Thanks a lot for creating one of the best NLP library. It is really easy to use and extend and the code quality and standards are among the best I have seen in any library I have used.

I was hoping if you guys or one of the users have a running Seq2Seq example for tasks like WMT datasets or maybe summarization. I am looking to build upon the encoder-decoder model from AllenNLP and it would be great to have a competitive model for a seq2seq task to benchmark and debug my changes.

Once again congratulations on a great job with the library.


Hi Kushal–most of our examples are in https://github.com/allenai/allennlp/tree/master/training_config. BiDAF is an example of a Seq2Seq encoder: https://github.com/allenai/allennlp/blob/master/allennlp/models/reading_comprehension/bidaf.py. I don’t think we have an example that’s specific to summarization.

There’s a bit of an unfortunate terminology overlap here, @michaels. We use the term Seq2SeqEncoder to define an abstraction over a particular operation involving tensors done inside a model (taking a sequence of vectors as input and returning a sequence of vectors as output). The more common use of the term “seq2seq” is when you take an input sequence of symbols and build a complete model that returns an output sequence of symbols. This is done in machine translation, summarization, and many other tasks. BiDAF is a model that uses a Seq2SeqEncoder, but it’s not a “seq2seq” model in the second sense.

@Kushal_Arora, we don’t really have good examples of seq2seq models. It’s not something we’ve focused on for our research, so we haven’t spent time building them or making them efficient. There is some seq2seq code available in AllenNLP, but I can’t vouch for how good it is or how able you’d be to get a reasonably competitive baseline from it. This is an area where we’d really love some contributions to the library; it’s not likely that we’ll be investing in it ourselves any time soon.

Thank you Michaels and Matt for the response. I saw @saiprasanna did a lot of refactoring of the code so I was hoping if he might have an example config for something like WMT en-fr setup.

Also @mattg, if it is not too much trouble, can you point me to user Seq2Seq repos that you might be aware of. Reason for asking you this is that I tried setting up IWSLT14 de-en model but my BLEU score is 3 points lower than the reported results, so I just want to see what I might be doing wrong.

I am trying to setup a couple of NMT baseline for my experiments. I can try to send out a PR once I am done.


I’m not aware of any user repos doing seq2seq stuff. We’re in the (long) process of splitting out our models into separate repositories, and the plan is to have one called allennlp-seq2seq, with the work that @saiprasanna did moving to that repo. We’ll also likely give maintainer privileges to anyone willing to take ownership of these sub-repos.

@Kushal_Arora You can check the test config for transformers for an example https://github.com/allenai/allennlp/blob/master/allennlp/tests/fixtures/encoder_decoder/composed_seq2seq/experiment_transformer.json
Note that this is a very small transformer, ideally you would have to change the layers, attention heads etc.
And also remove pos embedding and use your own tokenizer (subword?) instead of spacy tokenizer.