HELP - Add entity markers to dataset reader - Error

Hi !
I’m new to AllenNLP and want to add entity markers to _read in my Dataset reader, but get an Error in items = json.loads(line).


This is how I try to do it.

Does anyone have any idea what I might be doing wrong or suggestions for what I could try ? Thanks in advance for your time !

Best,
Julia

I don’t see anything wrong in particular with how you’re inserting those entity markers, but that’s not where you say the code is failing. If it’s failing on items = json.loads(line) then it looks like the data is not formatted as the code expects it to be. I would double check your assumptions about your data file.

1 Like

Thank you! So I should change “data_loader” in the configuration file, do I understand that correctly?
For now it looks like this:
data_loader

Well, that data loader config is a problem, but it’s not what’s causing the first error you posted. The error that you said was on items = json.loads(line). That’s when you’re trying to load your data file inside the dataset reader. Your data file is apparently not formatted as correct json.

You also have an issue with your data loader config. We’ll have an upgrade guide for 1.0 posted soon, but here’s the relevant part for the data loaders:

Iterators ➔ DataLoaders

Allennlp now uses PyTorch’s API for data iteration, rather than our own custom one. This means that train_data , validation_data , iterator and validation_iterator arguments to the Trainer have been removed and replaced with data_loader and validation_dataloader .

Previous config files which looked like:

{
  "iterator": {
    "type": "bucket",
    "sorting_keys": [["tokens"], ["num_tokens"]],
    "padding_noise": 0.1
    ...
  }
}

Now become:

{
  "data_loader": {
	"batch_sampler" {
		"type": "bucket",
		// sorting keys are no longer required! They can be inferred automatically.
		"padding_noise": 0.1
	}
}

Yes, this data loader config works. Thanks!

I use the BioRelEx dataset (https://github.com/YerevaNN/BioRelEx/releases/tag/1.0alpha7). Training and development sets are provided as JSON files. But maybe something goes wrong… I’ll have a look at it.: :slight_smile: