Dependency Parser

Where can I find information about the specific dependency relations implemented by the dependency parser in the AllenNLP suite? I see that the set implemented here is different from the set described at and would like to understand which ones I have to work with in this system.

The values for the training and dev data for the current example of the dependency parser are set up to point to the Penn Treebank, which I believe you need a license to obtain, e.g.:

"train_data_path": std.extVar("PTB_DEPENDENCIES_TRAIN"),
"validation_data_path": std.extVar("PTB_DEPENDENCIES_VAL"),

Fortunately, it is pretty straightforward to switch to Universal Dependencies data by changing the values above to point towards a UD dataset. Below are some instructions to do just that.

Assuming you have an environment with AllenNLP set up and have cloned allennlp-models to somewhere on your filesystem. What you can do is cd to allennlp-models and copy the below code to a script, e.g. to download the latest UD v2.6 data:


mkdir -p data
cd data
echo "Downloading UD data..."$'\n'
curl --remote-name-all{/ud-treebanks-v2.6.tgz}
echo $'\n'
tar -xvzf ud-treebanks-v2.6.tgz
rm ud-treebanks-v2.6.tgz
echo $'\n'"Done"

Run the script:


Once you have the data downloaded you can change the "train_data_path" and "validation_data_path" values in training_config/structured_prediction/dependency_parser.jsonnet to something like the below:

"train_data_path": std.extVar("TRAIN_DATA_PATH"),
"validation_data_path": std.extVar("DEV_DATA_PATH"),

You can then set this variables to point to the UD data you’ve just downloaded:

export TRAIN_DATA_PATH=data/ud-treebanks-v2.6/UD_English-EWT/en_ewt-ud-train.conllu
export DEV_DATA_PATH=data/ud-treebanks-v2.6/UD_English-EWT/en_ewt-ud-dev.conllu

And then you can train the dependency parser:

allennlp train training_config/structured_prediction/dependency_parser.jsonnet -s logs/dependency_parser_en_ewt --include-package allennlp_models

You can view the types of dependency relations by viewing the vocabulary directory which was created in the below location: