Hi, I am new to this group. Is there a particular forum for asking questions about the Constituency Parser?
Asking them here is fine.
Amazing- thanks. I have just begun working with the constituency parser- I want to be able to parse blocks of text. It seems that the parser does not separate out sentences, but instead will put two sentences into a single constituent. For example: if I ask it to parse “This is a sentence. This is another sentence.” It will put “is a sentence. This is another sentence.” into a single VP phrase. Is there a way to get the parser to treat sentences separately?
Clearly the problem isn’t that it doesn’t recognize the end of a sentence because it can accurately assign POS to the period at the end of a word (and not to periods within abbreviations like U.S., for example), but how can I tell it to do that before analyzing constituent structure?
Thanks for any ideas about this!
Yeah, the model assumes that it gets a single sentence as input. If you want to have it analyze multiple sentences, you’ll want to run some kind of sentence splitter on the text first, then pass in the sentences separately to the parser (which can be done in a batch, for efficiency). Spacy has functionality for doing this sentence splitting, and we have a simple wrapper around that; whichever you find easier to use should work.
Thanks- this is very helpful. I see that it offers 2 suggestions for the sentence splitting- one based on spaCY and the other based on the rule_based flag. regarding the spaCY option, it says that it is using a dependency parse. Can you help me understand more about how that works? The system I want to build would use the AllenNLP constituent and dependency parsers to analyze texts, but if I have to first call a different dependency parser to split sentences before I call the AllenNLP dependency parser that seems really inefficient.
Maybe it would be better for me to try the rule-based option your page suggests which uses punctuation. Its hard for me to imagine why this wouldn’t work just as well. Do you have insight on this?
Thanks so much for your help!
I would just recommend trying both and seeing which one does better for what you want. There might also be other libraries that do sentence splitting in a faster or better way; I’m not really sure.