I’m curious if anybody has used AllenNLP with the Pytorch Geometric (https://github.com/rusty1s/pytorch_geometric) library, or implemented a similar message passing pattern for graph data in AllenNLP.
Hi. I don’t know what exactly you mean here. AllenNLP is built up on pytorch. Anything work with Pytorch can work with AllenNLP.
This is true. I’m mostly anticipating getting confused around data loading, batching, and building/indexing the vocabulary, as AllenNLP has nice abstractions around these steps that might be a little at odds with those implemented in the PyTorch Geometric library. I figured I may as well ask to see if there might be related work I could refer to before diving in. I’ll keep plugging along and see what I can come up with!
IIRC, for pytorch_geometric sometimes people do batching by combining a batch of small graphs into a large graph. Seems to me this is not directly supported by AllenNLP. If I am doing this, I think I will probably put them into
MetaField and combine them as the way I want in
forward. This is not elegant but does work. I am also curious about what’s the best practice for this need. Can you please keep this post updated once you find the answer? Many thanks!
Apparently for node-level mini-batches (which is what I think would be appropriate in my case), one can use a neighbor sampling method similar to that described in the GraphSAGE paper. There’s an implementation of this in pytorch-cluster, which is used in this example, but I’m currently stuck trying to resolve some compilation issues installing when installing this library. (I’m afraid all the duct-tape and black magic I used to get a weird specific branch of AllenNLP and its dependencies installed on conda and talking to CUDA is catching up to me. )
I might look at the GraphSAGE pytorch implementation and see if I can work from that.
Just out of curiosity, could you describe your project a bit?
I’ve thought about using allennlp and pytorch_geometric for graph-based stuff. I’d like to hear more about your experience!
Sure–it will probably be beneficial for me to describe my experience here. Unfortunately after thinking I had all the dependencies in order, I ran into a few more issues and got a bit discouraged, so I’ve been taking a bit of a break from the project for now.
The project’s overall goal is to investigate the impact of edge annotations taken from citation context (i.e. the text surrounding a citation anchor) in a scientific paper citation network. In a previous implementation, I was simply pre-calculating graph embeddings with Node2Vec and concatenating these to a BERT token from the paper abstract to get a hybrid graph/text representation. I’m interested in implementing a similar approach, but with an actual neural graph approach to let me train the whole hybrid model end-to-end.
It seems that AllenNLP’s main non-geometric assumptions are built into the dataset_reader and batching implementations. I still have some fuzzy spots in my understanding of how the dataset_reader prepares a vocabulary, but it seems that the same code is used both to read through your data and create to create the vocabulary and to yield your instances to be batched up at training time. This pushes anything step that requires a node and edge index out of the dataset_reader and into your model’s forward method. I’m not sure if this is good or bad–maybe it’s preferable to implement it there anyway. It just took me some trial and error to figure this stuff out!
GraphSAGE and GAT both use a neighbor sampling method to generate subgraphs as minibatches. As noted above, there doesn’t seem to be a way to offer these up as batches from the dataset_reader and trainer components, so my solution was to set the batch size to 1 and handle this all in the model forward method. You can pass in paths to your graph data as model parameters, and build your node embeddings and graph index ‘manually’ in the model’s init, translating all the node ids to match those in your model’s vocabulary. In your forward method, you can pass the index of an instance’s node to the pytorch geometric neighbor sampler to get yield your sampled subgraph, which can then be passed on to the pre-implemented models from pytorch geometric.
I’m glad to expand on any of these points if you’re curious. I got to the point where I was able to run the model on the CPU and it seemed to work, but unfortunately my dependency nightmares have prevented me from getting the thing on CUDA.
Nice! I also worked on a project where I concatenated concept embeddings from a GCN to the BERT output for the corresponding text data. I did manage to get it to work, but I didn’t use Allennlp for this project, so I didn’t have to figure out some of the implementation for working with allennlp dataset readers.
I also used to mess around with nightly versions of allennlp and transformers, but didn’t really have problems running things on GPU.
I’m actually about to start a new project that uses allennlp and graph networks to do some multi-hop reasoning. I’ll have to see what issues I run into.
Since we might have to think about a lot of the same stuff, feel free to shoot me an email: firstname.lastname@example.org
I was able to resolve the dependency issues with a fresh conda install of cudatoolkit, but unfortunately the model isn’t learning anything.
I’m not sure about the interface between the AllenNLP Embedding and the pytorch geometric GATConv model. The GATConv forward method takes a node feature matrix and edge index as arguments, but the embedder interface is designed around the assumption that it will only yield a batch at a time. I’m currently passing in the whole embedding weight matrix in as an argument–I’m curious if this seems like a reasonable approach, or if I might be missing something that could cause issues here.