Saturday, August 25, 2012

Entry Processors part 2 - Deadlocking

This is Part two of my entry processor post and deals with the advantages and disadvantages of cross cache entry processors.

Cross cache Entry processors
As mentioned in the previous post entry processors allow us to update values in the most efficient manner, however, often you will want to update values in multiple caches, or update a value that is derived from related entries, to do this, all you would need to do is a call from CacheFactory.getCache,    from inside the entry processor, for instance:

By looking at the diagram we can see the problem straight away, as our code will perform an extra network hop to get one of the blue objects, nothing guarantees the object will be within the same node. So how can we solve this?

Coherence offers something called key association, which allows us to ensure that two or more related entries that share the same business key are co-located, this will require for both caches to be sharing a service. I have previously shown some benefits of key association in the versioning post, and the documentation is here, so, I’ll assume everyone is familiar with it, if not, take a look at the mechanism and i’ll wait and check my emails.
Welcome back, now, after we implement key association our cache would look like this:

Which is exactly what we wanted, but when we run the code we get this exception:
2012-08-07 11:49:41,510 ERROR [Logger@9252516] Coherence(3) - 2012-08-07 11:49:41.507/14.742 Oracle Coherence GE <Error> (thread=DistributedCache:first-service, member=1): Assertion failed: poll() is a blocking call and cannot be called on the Service thread

This is very dangerous, because if we have multiple threads for the service instead of an exception you will get is the warning:
2012-08-07 11:40:50,040 WARN  [Logger@9243192] Coherence(3) - 2012-08-07 11:40:50.040/5.833 Oracle Coherence GE <Warning> (thread=first-serviceWorker:1, member=1): Application code running on "first-service" service thread(s) should not call ensureCache as this may result in deadlock. The most common case is a CacheFactory call from a custom CacheStore implementation.

This happens because you are accessing the service from a thread from the same service,  and here is where the behavior starts getting tricky, as if you have two or more free threads it will work as expected, barring the exception, however if you only have one free thread or only one extra thread on your service, the execution of the entry processor will deadlock until being killed by the service guardian. You should never do this, so how should we go cross cache properly?

Direct backing map cross cache access 
As the title, which is a mouthful, explains we should access the entires directly in the backing. Because we are using key association we are sure the entries will be co-located, as explained in figure 2.

Entry Processors in coherence 3.7

Entry Processors have gotten better in 3.7, as its transactionality and locking has been extended to include cross cache access, so all of the wonderful transactional guarantees and locking explained in my previous post are extended to cross cache access. This however creates quite a big problem, as you can potentially deadlock your application. It might not be clear at first glance, but the diagram below should make it clearer:

As all access locks, the following example will deadlock and the entry processor that was invoked last will fail due to deadlocking while the one who arrived first will continue its execution. This is a deceptively simple problem to avoid with good design… Creating and documenting the expected flow through the caches-  i.e. all cross cache entry processors should go from the first to the second cache, this way they will queue and provide the expected behavior from entry processors.


Tuesday, August 21, 2012

Entry Processors - Part 1

Hello! This is the first of a two part series on entry processors:
Entry processors are a very powerful tool in our coherence toolbox Coherence 3.7 improved the  entry processors API even further. In case you haven’t looked at the documentation, below is a quick description:

Entry Processors:
Entry Processors, process entries. I say this because I cannot be the only one who interpreted the name as something that  processes things as they enter, which is the job of the trigger, but I digress.
Entry Processors allow us to execute code server side against entries or sets of entries, either passed as a parameter or that match a filter that has been passed as a parameter to the processor. This is the preferred method of updating the value of entries in a cache, as it will be performed in parallel in each node, an implicit lock is acquired in the object and multiple entry processors changing the same object are queued, guaranteeing ordering of updates thus ensuring data consistency.

Why use entry processors?
The main reason is because coherence is a hashmap, so all the operations are put/gets, which means that it is quite easy to end up with an inconsistent state. The diagram below explains the most common scenario.

●      At T0 the object has its fields with a value of “Initial” and “Initial”.
●      At T4 The first field has a value of “changed” and the second one of “initial”.
●      At T6 the object will have the values of “Initial” and “changed”.

From this we can see that the changes made by the client 1 where overridden by client 2 because he had a stale version of the object.
The simplest and heavy handed way to solve this is:

●      Before actually getting the objects the clients attempt to get a lock on them.
●      At T3 client two will ask for a lock, this will either return false, or he will block until it is released if he asked for the lock passing a long for the maximum time to wait for the lock (this will avoid multiple network calls to attempt to get the lock, but you must set a value smaller than the service guardian, or risk it killing your Thread.)
●      At T7 Client 2 finally has the lock and can get the object.

This will protect you from stale data, however, we are not doing government work here, our implementations shouldn’t only work, but they need to be efficient and easy to maintain and this solution has quite a few problems, for starters it slows the cache as a whole, as the clients are constantly locking the data. Secondly, there are extra network hops required for each locking step, further slowing down the application.

Simple Entry Processor solution:
In order to solve this problem when writing a processor we need to ensure it is a simple as possible  and that code is  deployed on the member that holds the data, as the entry processor will be serialized and sent over the wire. Below is the example code for the firstField entry processor, the one for the second field looks exactly the same, except it changes a different value. In production it would be more efficient to use a POF updater, like the one used in the Versioning post, but I wanted to keep this example as simple as possible.

And the graph:

●      If the entry processor at T2 arrives before T1 it will get queued to be executed after the lock is released, as entry processors take an implicit lock on the objects.

Hopefully this has made the advantages of using entry processors clear.
Entry processor Particulars:
As explained above, locking is very important for the correct functioning of the entry processor. I found the documentation a bit confusing in this regard, the description is: “EntryProcessors are individually executed atomically, however multiple EntryProcessor invocations by using InvocableMap.invokeAll() do not execute as one atomic unit.”  A simpler way I found to think about it, is that each group of keys will be executed atomically, and each group of keys will lock for the entire duration of that group of keys, for instance, lets say our invokeAll evaluates to keys 1,3,4,7 and 9. For those purposes, it is inconsequential if it evaluates to those keys through a filter or if you passed in a set of keys. The partition that contains keys 1,3 and 7 are on member 1 and the ones with keys 4 and 9 are on member 2. Therefore the changes made on each of those groups will lock and be transactional for the duration of the execution of all of the keys on that member.

Entry Processors on a replicated cache:
Although not very common, as any writes to a replicated cache are expensive, sometimes entry processors are executed in replicated caches and unlike partitioned caches where the entry processor is executed on the nodes that own the entries,in a replicated cache the execution takes place on the member that invoked the entry processor. It is important to note that the locking still takes place, however the locks are implicit without extra the network calls, as would be needed if we were using the methods in concurrent map interface.