In preparation for a post about doing graph analytics in Neo4j (paralleling SPARQLverse from this earlier post), I had to learn to load text/CSV data into Neo. This post just shows the steps I took to load nodes and then establish edges/relationships in the database.
My head hurt trying to find a simple example of loading the data I had used in my earlier example but this was because I was new to the Cypher language. I was getting really hung up on previewing the data in the Neo4j visualiser and finding that all my nodes had only ID numbers was really confusing me. I had thought it wasn’t loading my name properties or something when it was really just a visualisation setting (more on that another time). Anyway, enough distractions…
Graph Data File – Simple Graph Relations Example
I took my earlier sample data and dumbed it down to fit the normal paradigm of Neo4j – separate nodes and edges load files. I appreciated working with triples before as I didn’t have to pre-load all the nodes first, but that’s also a story for another day.
First, the nodes file looked like the following. Note, I thought I had to add the ID though I didn’t end up using it after all:
id,name 1,Chong 2,Lashaun 3,Roberta 4,Elin 5,Tameka 6,Rosalie 7,Noella 8,Elim 9,Mae 10,Fernando 11,Alan 12,Katrina 13,Kaitlyn 14,Zackary 15,Nana 16,Lamonica 17,Meggan 18,Fermina 19,Genevieve 20,Manual 21,Jolie
The second file was simply a list of “source” and “target” names – the graph relations – where the first person had the second person for a friend. (We handle them as unidirectional in this example.)
source,target Chong,Lashaun Chong,Roberta Chong,Elin Chong,Tameka Chong,Rosalie Chong,Noella Lashaun,Roberta Lashaun,Elin Lashaun,Tameka Roberta,Elin Roberta,Tameka Roberta,Rosalie Roberta,Noella Elim,Tameka Elim,Rosalie Elim,Noella Elim,Mae Elim,Fernando Elim,Alan Tameka,Katrina Tameka,Kaitlyn Rosalie,Alan Noella,Katrina Mae,Kaitlyn Mae,Zackary Mae,Nana Mae,Lamonica Fernando,Meggan Fernando,Fermina Fernando,Genevieve Fernando,Manual Fernando,Jolie Fernando,Chong Alan,Lashaun Alan,Roberta Alan,Elin Katrina,Tameka Katrina,Rosalie Kaitlyn,Noella Zackary,Fernando Zackary,Alan Nana,Katrina Nana,Kaitlyn Nana,Zackary Lamonica,Zackary Lamonica,Fernando Meggan,Zackary Meggan,Fernando Meggan,Jolie Fermina,Fernando Fermina,Jolie Fermina,Chong Genevieve,Lashaun Genevieve,Roberta Manual,Elin Manual,Tameka Jolie,Rosalie
Loading CSV Relationships into Neo4j
To get the data into Neo4j I had to run two commands. But first I run a sort of “delete all” as I was doing lots of testing:
MATCH (n) WITH n LIMIT 10000 OPTIONAL MATCH (n)-[r]->() DELETE n,r;
Then load all the nodes, assigning each one to a Person entity and grabbing only the name property from the CSV:
LOAD CSV WITH HEADERS FROM "file:///Users/tyler/graph/friends_nodes.csv" AS nodes CREATE (p:Person { name: nodes.name })
And finally, load the edges/relationships to map persons -> to persons via a has_friend relationship:
LOAD CSV WITH HEADERS FROM "file:///Users/tyler/graph/friends_edges.csv" AS edges MATCH (a:Person { name: edges.source}) MATCH (b:Person { name: edges.target }) CREATE (a)-[:HAS_FRIEND]->(b);
The resulting load will say something like:
Created 57 relationships, returned 0 rows in 46 ms
More on exploring and analysing this in a future post. Tweet it or comment if you are interested in more along this line. Thanks for reading!