Tuesday, November 2, 2010

Neo4j Internals: Interlude - Xa roundup and consistency

The time has come for the most boring of these posts (imagine that!). There are some details that haven't been referenced yet, mainly regarding the interactions between the various classes that lead from the persistence store up to the NodeManager and from there to you, the user. There are many classes that have to be explained, with a lot of cross-references, but if you have followed my work thus far, then it shouldn't be that difficult to digest. Possibly, this post should have come before the talk about transactions, but hey!, I am currently myself on the path to enlightenment concerning the internals of Neo, so I understood things somewhat out of order. This is the reason I label this article as interlude, since practically no tx talk will take place. Prerequisites are not that demanding in this post, although if this is your first contact with Neo4j internals you will in all probability be overwhelmed. So, we begin.

DataSources and XaConnections over a persistence store

XaConnections encapsulate a XaResource towards a XA compatible persistence store. It is not part of the XA specification but is a useful abstraction provided by Neo that couples a XaTransaction with a XAResource. The concrete implementation is NeoStoreXaConnection that holds a NeoStoreXaResource as the implementation of the XAResource and a WriteTransaction as the implementation of the XaTransaction. The XA related interface exposed by NeoStoreXaConnection is getXaResource() : XAResource that returns an instance of the inner class NeoStoreXaResource, which forwards all XAResource operations to a XaResourceManager implementation and defines the isSameRm() XA-required method for XAResources by equality comparison on the filename of the supporting store. Finally, NeoStoreXaConnection returns event consumers for operations on Neo primitives such as NodeEventConsumer and PropertyIndexEventConsumer that forward the requested operations to the WriteTransaction encapsulated by the parent NeoStoreXaConnection. These event consumers are used by NioNeoDbPersistenceSource to implement ResourceConnections, but that is discussed in detail later in this post.
XaDataSource is an abstract class that defines the means of obtaining XaConnections from a data source and some facilities for recovering txs. The idea is that classes extending XaDataSource will encapsulate a transactional resource, capable of supporting XAResources so that they can fit in a XA environment. This is obvious from the LogBackedXaDataSource extending class, where all tx recovery operations are forwarded to an underlying XaLogicalLog. Neo extends this with NeoStoreXaDataSource which, apart from creating XaConnections, is pretty busy: On instantiation is responsible for creating the NeoStore that is the on-disk storage of the graph, creates a XaContainer to help with housekeeping (more on that next), even creates the IdGenerators for the various Stores. Is also provides implementations (as inner classes) for a XaCommandFactory and a XaTransactionFactory that it passes to the XaContainer for recovery purposes. This gives it the role of a Facade over the details of a lot of things I have described previously, summing up XaLogicalLogs, XaResourceManagers, Stores and their paraphernalia into a data source practically ready for fitting into a XA environment.
Before we leave NeoStoreXaDataSource, a note on its instantiation. Instead of the usual new call for creating an instance, there is a more roundabout way for getting a Neo DataSource up and running. When the database starts, the TxModule object held by the Config is asked to register DataSources, as it goes around the various components (the Indexer service is another example of a user of DataSources). For the Neo kernel, when GraphDbInstance is start()ed, the TxModule in the Config object is asked to register a DataSource with an implementing class of NeoStoreXaDataSource and there it is passed to the DataSourceManager which instantiates it via reflection. DataSourceManager keeps a mapping from identifying Strings to XaDataSource instances, maintaining this way a single instance for every data source. The identifying String is kept in the Config as DEFAULT_DATA_SOURCE_NAME.

Management of XaResources

The mapping of a XAResource to a XaTransaction represented by a XaConnection is realized in the XaResourceManager. This class mainly keeps a Map<XAResource,Xid> and a Map<Xid,XidStatus>, XidStatus being an inner class that, with the help of another inner class, TransactionStatus, holds the current status of the tx and the tx itself identified by its xid and mapped by an XAResource. Essentially, from this mapping, all tx operations on an XAResource are forwarded to the related tx. This helps the decoupling of tx operations that XaResources are asked to perform from any implementation details of the XaLogicalLog or the backing store, leaving XaResources, XaTransactions and XaConnections as thin shells that can be useful in higher layers.
XaResourceManager also participates in failure recovery in conjunction with the XaLogicalLog, accepting via the methods the recreated txs as the logical log reads them and then completing them. In a sense, a XaResourceManager coupled with a XaLogicalLog are the equivalent of the TxManager+TxLog as we saw them last time but with the addition of a backing persistence store, in the form of a DataSource.

Binding related things together

The various components that help out a XaDataSource must be instantiated with a specific order and it is a nice idea to keep them together since they are closely coupled. This is a job for XaContainer, which keeps a XaDataSource, a XaCommandFactory, a XaTransactionFactory, a XaLogicalLog and a XaResourceManager. The idea is that the XaResourceManager and the XaLogicalLog must have access to a txFactory and a commandFactory before they start and additionally the log needs a XaResourceManager before being open()d, else the recovery will be impossible to proceed. This leads to a specific serialization of the instantiation/initialization operations and this is done by the XaContainer. XaContainer in turn is created by NeoStoreXaDataSource when it is instantiated, which passes its internal implementations of CommandFactory and TransactionFactory to the create() method of XaContainer, leaving to it the creation of instances of XaLogicalLog and XaResourceManager. To open the log (and possibly trigger a recovery) you must call  on XaContainer.open() after initializing it, ensuring that everything is in place.

An intermediate interface: The ResourceConnection

PersistenceSource defines an interface that exposes the expected functionality for every persistence store that Neo can use to store its data on disk. The operations themselves are abstracted as ResourceConnections that are returned by a PersistenceSource. For that reason, NioNeoDbPersistenceSource implements this as an inner class, NioNeoDbResourceConnection, that accepts a NeoStoreXaDataSource, extracts from it the XaConnection and from there the various primitive event consumers, dispatching to them the operations each is supposed to handle. This 2-level indirection is a purely engineering construct, having no other impact on the logic of any subsystem.

Addressing problems with store-memory consistency: The LockReleaser

There is an issue I haven't touched upon yet. We have seen how the various records are updated in the store and kept locked for the duration of a tx, ensuring their isolation guarantees. However, there remains to be seen how the modifications upon a Primitive are kept in memory for reading within a tx and how overlapping creation/deletions/changes of properties are managed. This is the task assigned to LockReleaser, with the more general responsibility of locking the entities that are to be modified and releasing the locks upon commit. The core idea is that, per transaction, we keep a set of changes for every element and its properties. The set of changes in the properties of a primitive are kept as instances of the inner classes CowNodeElement or CowRelElement for Nodes and Relationships respectively and the set of those elements (one for each corresponding primitive) are kept as instances of the inner class PrimitiveElement. The cowMap field is a Map<Transaction,PrimitiveElement> that keeps the mapping of the changes for the current tx. The easy part is deletion, where calling delete() on a primitive passes the call to NodeManager, which forwards the call first to LockReleaser, marking the corresponding CowElement as deleted via a boolean field and then to the PersistenceManager which updates the XaTransaction (WriteTransaction in the case of NioNeoDbPersistenceSource). The great management overhead and the bulk of the code is the addition, deletion and changing of properties for Nodes and Relationships. Two sets are kept for each one, a propertyAddMap and a propertyRemoveMap. When a property is added, it is appended in the propertyAddMap for the primitive, while removals are appended in the propertyRemoveMap. Asking a primitive for a property passes from the Proxy (that implements the Node or Relationship interface and is the user visible class) to the NodeManager, which retrieves the corresponding NodeImpl or RelationshipImpl and there propertyAddMap and propertyRemoveMap are consolidated, keeping the actual changeset and finally retrieving the requested property, if present. To make this clear, let's see an example.
Say you have a Node and you add a property via setProperty("foo","bar"). Initially, the NodeProxy simply forwards the call to the corresponding (based on id) NodeImpl. There it is locked (in the Primitive.setProperty() method) for WRITE by the LockReleaser. The full propertyMap is brought into  memory if not already there (NodeManager.loadProperties()) and the addProperty and removeProperty maps for this primitive are obtained. Note that currently there are 3 sets of properties in memory for this primitive. The ones loaded from the store (the propertyMap), the so far in this tx added (the addPropertyMap) and the ones so far removed (the removePropertyMap). These have to be aggregated into a currently consistent set so that we can decide whether to create a new property or to change an existing one. There are three stages for this. First, we check the currently known property indexes that are resident in memory. If it is not there, we make sure we bring all property indexes in memory and we check those. If it is also not there, then we create it. In the mean time, if the property value was found in either the stored property set or in the add map (it was added previously in this tx) then we retrieve it, removing it from the removePropertyMap. Its value is changed, it is added in the addPropertyMap and the WRITE lock is released. Similar paths are followed in all other operations, including additions and removals of Relationships for Nodes. Finally, before commit(), the addPropertyMap, removePropertyMap and propertyMap are consolidated in the final version at Primitive.commitPropertyMaps(), which adds all properties in addPropertyMap and then removes all properties in removePropertyMap from propertyMap. This brings the in-memory copy of this Primitive back to a consistent state with the now updated in-file version, getting rid of all the versions in the temporary add and remove maps.
LockReleaser is also used by WriteTransaction to invalidate cached entries. The various removeFromCache() calls are there to ensure that after a tx which deletes a primitive is committed, the corresponding entry in the cache is removed so that it cannot be referenced again. This is used in WriteTransaction, where after a delete command, a rollback of a creation command or the execution of a recovered command, the matching removeFromCache call is made to the LockReleaser, which forwards it to the keeper of the cache, the omnipotent NodeManager.

Managing locks: Again, the LockReleaser (with help from the NodeManager)

LockManager and RagManager were described in a previous post as the core mechanism that provides isolation between txs. Now we will see where the lock aquisition and release are done. First, WRITE locks are acquired on Node and Relationship creation events from NodeManager (in createNode() and createRelationship()) and from Primitives (via NodeManager.acquireLock()) on removeProperty, setProperty and delete events. Releasing them is not done right away, although the call to NodeManager.releaseLock() is done at the end of every method that acquires a lock (ie, the aforementioned). Obviously, we cannot release a WRITE lock before the tx completes, since that would lead to other txs reading uncommited changes (Neo currently guarantees only SERIALIZABLE isolation levels). So, we must postpone the releasing of the lock to be done on commit. This is done in LockReleaser.addLockToTransaction(), which adds a LockElement to a List<LockElement> that is mapped by the current tx (kept in lockMap, a Map<Transaction, List<LockElement>>) and also adds a tx Synchronization hook to the current tx that afterCompletion() releases all locks held by this tx.

Almost there

This post concludes a description of the core classes that provide the performance and ACID guarrantees of Neo. What remains to be seen is a walkthrough of the path from creating a new EmbeddedGraphDatabase to its shutdown. This will be the topic of the next post.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 Unported License.

No comments:

Post a Comment