Graph density calculation with Neo4j

Neo4j Cypher Query for Graph Density Analysis

Graph analysis is all about finding relationships. In this post I show how to compute graph density (a ratio of how well connected relationships in a graph are) using a Cypher query with Neo4j. This is a follow up to the earlier post: SPARQL Query for Graph Density Analysis.

Installing Neo4j Graph Database

In this example we launch Neo4j and enter Cypher commands into the web console…

  1. Neo4j is a java application and requires OracleJDK 7 or OpenJDK 7 to be on your system.
  2. Download the Community Edition of Neo4j.  In my case I grabbed the Unix .tar.gz file and unzipped the files.   Install on Windows may vary.
  3. From command line and within the newlycreatedneo4j-community folder, start the database:
    1. bin/neo4j start
  4. Use web console at: http://localhost:7474 – the top line of the page is a window for entering Cypher commands.
  5. Load sample CSV data using a Cypher command – I cover this in a separate post here.  Be sure the path to the file is the same on your system, matching where you saved the CSV files to.

Quick Visualization

Using the built-in Neo4j graph browser, you can easily see all your relationships as nodes and edges.  For a query, return results that include all the objects:

MATCH (p:Person)

Friends graph sample viewed in Neo4j

Compute Graph Density

Graph Density requires total number of nodes and total number of relationships/edges.  We do them both separately, then pull them together at the end.

Compute the number of unique nodes in the graph

MATCH (p:Person)
RETURN count(p)

Returned 1 row in 49 ms

This tells us that there are 21 people as Subjects in the graph.  (I’m not sure how this differed from the 20 I had in my other post – perhaps part of the header from the CSV came in?)

Therefore, a maximum number of edges between all people would be 21² (we’ll use only 21×20 as a person might not link to themselves in this example).

Compute the number of edges in the graph

Here we only select subjects that are “Person” and only where they have a relationship called “HAS_FRIEND”.

(The “p:”, “r:” and “f:” prefixes act like the “?…” variables references in SPARQL – you set them to whatever you want as a pointer to the data returned from the MATCH statement.)

MATCH (p:Person)-[r:HAS_FRIEND]-(f:Person)

count(DISTINCT r)
 When we defined the data from CSV, we set up a relationship that I thought would be just one-way.  But you can see, if you don’t provide the DISTINCT keyword, that you’ll get double the record counts, so I’m assuming it’s treating the relationships as bi-directional.

Total edges is 57.  Do the quick math and see that the ratio is then:

57 / (21 x 20)
= 57 / 420
= 0.136...

Rolling it all together

We can do all this in a single query and get some nice readable results, even though the query looks a bit long.  Note that I snuck in an extra “-1” to account for that stray record I didn’t account for.

MATCH (p:Person)-[r:HAS_FRIEND]-(f:Person) 
RETURN count(DISTINCT p) as nrNodes, 
 count(DISTINCT r) as nrEdges,
 count(DISTINCT r)/((count(DISTINCT p)-1) * (count(DISTINCT p) - 1.0)) AS graphDensity

nrNodes nrEdges graphDensity
21      57      0.1425

Challenge: Why did I get different results than in the SPARQL query example?

In the earlier post I had only 20 nodes, but in this one got 21.  Can you explain why?

Future Post

What’s your favourite graph analytic?  Let me know and I’ll try it out in the future post.

One of things I have planned is to do some further comparisons between graph analytics in SPARQLverse and other product like Neo4j but with large amounts of data instead of these tiny examples.  What different results would you expect?

Published by

Tyler Mitchell

Product and marketing leader, author and technology writer in NoSQL, big data, graph analytics, and geospatial. Follow me @1tylermitchell or get my book from

%d bloggers like this: