Dependency Parser

Where can I find information about the specific dependency relations implemented by the dependency parser in the AllenNLP suite? I see that the set implemented here is different from the set described at universaldependencies.org and would like to understand which ones I have to work with in this system.

1 Like

The values for the training and dev data for the current example of the dependency parser are set up to point to the Penn Treebank, which I believe you need a license to obtain, e.g.:

"train_data_path": std.extVar("PTB_DEPENDENCIES_TRAIN"),
"validation_data_path": std.extVar("PTB_DEPENDENCIES_VAL"),

Fortunately, it is pretty straightforward to switch to Universal Dependencies data by changing the values above to point towards a UD dataset. Below are some instructions to do just that.

Assuming you have an environment with AllenNLP set up and have cloned allennlp-models to somewhere on your filesystem. What you can do is cd to allennlp-models and copy the below code to a script, e.g. download_ud_data.sh to download the latest UD v2.6 data:

#!/bin/bash

mkdir -p data
cd data
echo "Downloading UD data..."$'\n'
curl --remote-name-all https://lindat.mff.cuni.cz/repository/xmlui/bitstream/handle/11234/1-3226{/ud-treebanks-v2.6.tgz}
echo $'\n'
tar -xvzf ud-treebanks-v2.6.tgz
rm ud-treebanks-v2.6.tgz
echo $'\n'"Done"

Run the script:

`./download_ud_data.sh`

Once you have the data downloaded you can change the "train_data_path" and "validation_data_path" values in training_config/structured_prediction/dependency_parser.jsonnet to something like the below:

"train_data_path": std.extVar("TRAIN_DATA_PATH"),
"validation_data_path": std.extVar("DEV_DATA_PATH"),

You can then set this variables to point to the UD data you’ve just downloaded:

export TRAIN_DATA_PATH=data/ud-treebanks-v2.6/UD_English-EWT/en_ewt-ud-train.conllu
export DEV_DATA_PATH=data/ud-treebanks-v2.6/UD_English-EWT/en_ewt-ud-dev.conllu

And then you can train the dependency parser:

allennlp train training_config/structured_prediction/dependency_parser.jsonnet -s logs/dependency_parser_en_ewt --include-package allennlp_models

You can view the types of dependency relations by viewing the vocabulary directory which was created in the below location:

logs/dependency_parser_en_ewt/vocabulary/head_tags.txt