Tuesday, October 5, 2010

Inside Neo4j : Intro and roadmap

I love working with databases. They were my first research subject. I enjoy the difficulties inherent in transitioning from a bounded world of bits and bytes to a practically endless space of sectors and pages. ACID, now, this is (another place) where a programmer shows her worth. During a MSc class we were told that a real world RDBMS is possibly the most complex engineering feat humans have realized.  I tend to believe that, although I stand more in awe at any decent modern compiler. However, after finding out about the NoSQL movement and some fine examples of this philosophy, I decided to take a closer look at some source from a well known implementation of a representative of that paradigm. Enter Neo4j.

Introductions are in order

Neo4j is an open source, embeddable graph database written in Java. Embeddable because it can be added in your application and used just like any other library. Graph db because the data model it uses to express its data is a graph, that is it stores vertices and edges that connect them, supporting user defined properties on both constructs. It is fully transactional, exposes a REST interface and has numerous other components that provide abstractions for pretty much any functionality you need from your persistent graph.

There is always a but

My desire to study the internals of neo were hindered by the practically non-existent documentation on the architecture of the system. Granted, the kernel component is no unmanageable beast in terms of code size, but there are so many things going on in there that I would like some kind of a guidebook to take me through the core classes and provide an abstraction over the implementation details. So I decided that, since I am already going through the trouble of analyzing the workings of neo, I might as well provide them for others to see and at the same time have a cheat sheet I can run to while I am hacking away.
This series of posts will be uploaded as I am getting through the code, explaining every submodule of neo. The blueprint I have in mind is:
  • Provide an exposition of the flow during the startup of the database, explaining what the core classes do, when and how are instantiated etc.
  • Create, store and retrieve a node and create a relationship to show how the store (persistence and caching) is managed.
  • Explain transactions.
  • Do a traversal.
As of this writing I am good to go on the first point and almost ready on the next two. The traversal framework is still a closed book (collapsed branch on eclipse) for me. Hopefully tomorrow the next post will be up.

As always, a disclaimer

These posts will not be a tutorial for Neo4j. I suggest you write some code on it first to get the feeling and then come and see my work, if you are still interested.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 Unported License.

No comments:

Post a Comment