Tuesday, August 21, 2012

Entry Processors - Part 1

Hello! This is the first of a two part series on entry processors:
Entry processors are a very powerful tool in our coherence toolbox Coherence 3.7 improved the  entry processors API even further. In case you haven’t looked at the documentation, below is a quick description:

Entry Processors:
Entry Processors, process entries. I say this because I cannot be the only one who interpreted the name as something that  processes things as they enter, which is the job of the trigger, but I digress.
Entry Processors allow us to execute code server side against entries or sets of entries, either passed as a parameter or that match a filter that has been passed as a parameter to the processor. This is the preferred method of updating the value of entries in a cache, as it will be performed in parallel in each node, an implicit lock is acquired in the object and multiple entry processors changing the same object are queued, guaranteeing ordering of updates thus ensuring data consistency.

Why use entry processors?
The main reason is because coherence is a hashmap, so all the operations are put/gets, which means that it is quite easy to end up with an inconsistent state. The diagram below explains the most common scenario.



●      At T0 the object has its fields with a value of “Initial” and “Initial”.
●      At T4 The first field has a value of “changed” and the second one of “initial”.
●      At T6 the object will have the values of “Initial” and “changed”.

From this we can see that the changes made by the client 1 where overridden by client 2 because he had a stale version of the object.
The simplest and heavy handed way to solve this is:


●      Before actually getting the objects the clients attempt to get a lock on them.
●      At T3 client two will ask for a lock, this will either return false, or he will block until it is released if he asked for the lock passing a long for the maximum time to wait for the lock (this will avoid multiple network calls to attempt to get the lock, but you must set a value smaller than the service guardian, or risk it killing your Thread.)
●      At T7 Client 2 finally has the lock and can get the object.

This will protect you from stale data, however, we are not doing government work here, our implementations shouldn’t only work, but they need to be efficient and easy to maintain and this solution has quite a few problems, for starters it slows the cache as a whole, as the clients are constantly locking the data. Secondly, there are extra network hops required for each locking step, further slowing down the application.

Simple Entry Processor solution:
In order to solve this problem when writing a processor we need to ensure it is a simple as possible  and that code is  deployed on the member that holds the data, as the entry processor will be serialized and sent over the wire. Below is the example code for the firstField entry processor, the one for the second field looks exactly the same, except it changes a different value. In production it would be more efficient to use a POF updater, like the one used in the Versioning post, but I wanted to keep this example as simple as possible.

And the graph:



●      If the entry processor at T2 arrives before T1 it will get queued to be executed after the lock is released, as entry processors take an implicit lock on the objects.

Hopefully this has made the advantages of using entry processors clear.
Entry processor Particulars:
As explained above, locking is very important for the correct functioning of the entry processor. I found the documentation a bit confusing in this regard, the description is: “EntryProcessors are individually executed atomically, however multiple EntryProcessor invocations by using InvocableMap.invokeAll() do not execute as one atomic unit.”  A simpler way I found to think about it, is that each group of keys will be executed atomically, and each group of keys will lock for the entire duration of that group of keys, for instance, lets say our invokeAll evaluates to keys 1,3,4,7 and 9. For those purposes, it is inconsequential if it evaluates to those keys through a filter or if you passed in a set of keys. The partition that contains keys 1,3 and 7 are on member 1 and the ones with keys 4 and 9 are on member 2. Therefore the changes made on each of those groups will lock and be transactional for the duration of the execution of all of the keys on that member.

Entry Processors on a replicated cache:
Although not very common, as any writes to a replicated cache are expensive, sometimes entry processors are executed in replicated caches and unlike partitioned caches where the entry processor is executed on the nodes that own the entries,in a replicated cache the execution takes place on the member that invoked the entry processor. It is important to note that the locking still takes place, however the locks are implicit without extra the network calls, as would be needed if we were using the methods in concurrent map interface.

Cheers,

No comments:

Post a Comment