Thursday, December 2, 2010

Towards a Neo4j Connector Implementation



As discussed, the setup/architecture is rapidly changing as I go through the specification and doing things the Right Way (TM). As a result, the deployment instructions and code walkthough are almost obsolete. I suggest you keep an eye out in this blog for new posts or refer to the README that will go up with my next commit.

One of the coolest things about working in a managed environment (mostly in the form of an application server) is that many things are taken care for you automatically. It is completely natural to have an EJB acquire connections to different RDBMS's, do stuff with them and then expect things to commit or rollback based on heuristics/exceptions/whatever in a consistent manner, while you may not have the slightest idea that 2PC is involved somewhere. This transparency is essential for rapid development and, unfortunately, is mostly expected only for relational stores, mainly via JDBC. Since I have a thing for Neo4j and time in my hands, I thought that it might be cool to provide such facilities for it. The way to go is Java Connector Architecture (JCA), a way of integrating Enterprise Information Systems in an application server. JDBC is actually a domain specific implementation of JCA with a namespace of javax.sql instead of javax.resource. I would like to tell you what I have built, how it can be used and what I hope it will become.

What does a connector do, exactly?

If you want exactly, then read the spec. You have probably seen it however in your app server's administration interface, near the JDBC section. Connectors provide a platform independent way for application components to communicate to a (in general) non-relational storage manager. In addition, XA compliance and connection pooling is offered, via the application server. The idea is that you get the .rar file provided by your data storage manager's vendor, much like a JDBC driver, you install it and hopefully you have access to the store from your application component, be it an EJB, a Servlet or whatever your poison is. You simply get from the JNDI a ConnectionFactory, ask it for a Connection and you perform the store's specific operations (yeah, yeah, there is also CCI but that is boring). So, let's see how we could use that in an EJB to create a Neo4j node.

A usage example from an EJB

The reason for showing how to use it before explaining anything about it is simple - I do not care about the implementation that much yet. What I want is to show what is possible, what I consider the way to go and get feedback on all that. So, consider the following snippet.

@Stateless
public class GreeterBean implements Greeter {
    @Resource(name = "neo")
    private ConnectionFactory cnf;

   
@Resource(name = "mysql")
    private DataSource sql;

   
@TransactionAttribute(TransactionAttributeType.REQUIRED)
    public String getMessage() {
        Node node = null;
        Connection conn1 = null;
        Connection conn2 = null;
        Connection conn3 = null;
        java.sql.Connection sqlConn = null;
        StringBuffer message = new StringBuffer();
        long currentId = -1;
        try {
            conn1 = cnf.getConnection();
            conn2 = cnf.getConnection();
            conn3 = cnf.getConnection();
            node = conn2.createNode();
            currentId = node.getId();
            node.setProperty("foo""BAR "+currentId);
            for (Node n : conn3.getAllNodes()) {
                if (n.hasProperty("foo")) {
                    message.append(n.getProperty("foo")).append("<br/>");
                }
                else {
                    message.append("node ").append(n.getId())
                     
.append(" did not have property foo.<br/>");
                }
            }
            sqlConn = sql.getConnection();
            PreparedStatement st = sqlConn
                      .prepareStatement
("Insert into Sample values (?,?,?)");
            st.setInt(1(intnode.getId());
            st.setString(2"foo");
            st.setString(3(Stringnode.getProperty("foo"));
            st.execute();
        catch (ResourceException e) {
            e.printStackTrace();
            throw new Error(e);
        catch (XAException e) {
            e.printStackTrace();
            throw new Error(e);
        }
        catch (SQLException e) {
            e.printStackTrace();
            throw new Error(e);
        }
        finally {
            try {
                if (conn1 != null) {
                    conn1.close();
                }
                if (conn2 != null) {
                    conn2.close();
                }
                if (conn3 != null) {
                    conn3.close();
                }
                if(sqlConn != null) {
                    sqlConn.close();
                }
            catch (Exception e) {
                throw new Error(e);
            }
        }
        return message.toString();
    }
}


We ask the application server to inject a Resource for us, with a name that we have already configured (in a container specific way). This is a ConnectionFactory that returns, well, Connections that allow us to do what a Neo EmbeddedGraphDatabase would allow us to do, minus the indexing, remote shell, everything, actually, except the primitives manipulation. This is because every ManagedConnection allows for management of one XAResource and, given that the indexing is a different XAResource than the Neo store, things have to be made to look nice, a thing I have not had time to do yet. But that is not important right now, since this is not production ready.
I use three connections over the same XAResource, a guarantee that comes from the fact that they are created in the same thread, as expected. Calling close() on them simply invalidates them. The "program" creates a new Node, sets a property and then gets the same property from all currently stored Nodes from the database. Simple but enough to prove my point.
The code also manipulates a JDBC connection (I used a mySQL instance). Note that nowhere do we ask a GraphDatabase to begin a transaction nor do we commit. As when you use a JDBC connection, entering the EJB method begins the tx (depending on the annotation, of course) and exiting normally commits, any other way rolls back. Both resources participate in a 2PC protocol, orchestrated by the application server's transaction manager. This means that crash recovery should also work, by the way, but I have not tested it yet.

So there. To make this run some legwork is needed and that is what the rest of the post is about. But while this code is small, I think it demonstrates a powerful concept: Mixing SQL and NoSQL sources in the same environment with a minimum of management overhead. Cats and dogs, living together, as others have put it.

A flash code walkthough


So, you decided you want to play. Fair enough. The code currently comes in three components: The transaction manager proxy, the actual connector and the modified neo4j kernel. The first needs some explanation.
The TransactionManager is used in the kernel for more tasks than its interface suggests. The Transaction objects it returns are used by various classes to keep a map of resources specific to each running transaction and it is also used for resource enlistment. The transaction status is also useful to know and there is also the need to set the transaction in rollbackOnly mode. In a managed environment however the availability of the provided TransactionManager (and from there the Transaction object) is not certain or standardized. Also, it is not the actual transaction manager that is needed but a specific subset of its functionality. So, instead of going through hoops to accommodate every application server out there or hacking up the kernel to remove what does not convenience us to be there, I went another way. The JTA 1.1 spec defines a TransactionSynchronizationRegistry interface that must be made available by conforming app servers. From there almost all the above functionality can be implemented in a portable way. Check the API for details. Actually, the only thing it does not do is the XAResource enlistment. However, this is OK because that is not where it is supposed to happen anyway. Instead the ManagedConnection has a getXAResource() method that returns the XAResource it uses. When the ManagedConnection is first associated with a transaction, the XAResource is retrieved and enlisted by the transaction manager, transparently.
The connector is a bare minimum implementation of the JCA spec, v1.6, hopefully not entirely wrong. The NeoResourceAdapter does the database instantiation and shutdown, the NeoManagedConnectionFactory keeps the ManagedConnections mapped to the current transactional context, the NeoManagedConnections return Connections that are simple shells over them and simply forward calls. The architecture is actually very simple if you read the spec, meaningless to discuss otherwise.
The jta kernel has a ManagedGraphDbImpl that enforces the use of the "container" transaction manager, returns the XAResource mapped to the current top level tx and disallows indexing operations. Nothing fancy yet.

Set up the environment


What follows has been successfully performed on Glassfish and unsuccessfully on Geronimo and JonAS. The last two require custom descriptors alongside the ra.xml I provide. If you want to deploy to them, you should probably consult the documentation - JonAS, for example provides a utility that adds its descriptor automatically in the rar.


The container provided txm is here. The connector code is here and the jta kernel, as always, here. Download and maven install, no editing should be needed. To see this in action, you must create an ear that holds an enterprise application, ready for deployment on an application server. The jta kernel jar must replace your main kernel (no functionality has been changed, it should work as a drop in replacement) and the txm service should be added in the lib/ of the ear, or anywhere where it can be picked up. The .rar that is the connector however must be added as an ear module. In your application.xml, next to the EJB jar and the war you should add

  <module>
    <connector>neo4j-connector.rar</connector>
 </module>

 
and the archive must be on the top level, next to the other ear modules. Bundle up the ear and deploy. Next, go to the administrative console and add a connector connection pool bound to the Neo connector (it should be available) and create a connector resource over that pool bound to the JNDI location you have asked your EJB to find it (neo in the example above). You should be set to go.

From here

So, what do you think? Is the approach viable? What I want to create is something that is at least as easy to use as JDBC and with the same functionality/guarantees. Of course, the graph database is embedded now, no indexing, dubious pooling and a whole lot of other issues but the first step has been taken. I hope that soon it will be possible to seamlessly integrate Neo4j in enterprise applications in all the available layers, from the EJB up to Spring. If only we had Object to Graph mapping standards...