Monday 15 December 2014

Thread safe Jedis client

Jedis is neat client library for Redis. The redis java client (Jedis) object provided by the library is not thread safe. Hence, if an application was muli-threaded, it should ideally be using a JedisPool object (which is thread-safe) or it would be facing a lot of unpredictable behavior and weird exceptions. JedisPool is a Jedis clients pool implementation using the Apache commons pool 2 API.

However, managing a pool and its resources might bring in some complexity and length to the app's implementation. So, I thought abstracting this resource management might help out the developer a lot. I've hosted the implementation on github. A new ThreadSafeJedis class has been introduced which will internally manage pooled jedis objects, and provide an interface similar to that of a Jedis class, but will be awesomely thread safe!

Background: The Jedis class currently uses a single Client object in all it's methods, which reasons the "thread unsafe" nature. Hence, the class cannot be simply extended. All the commands (which are methods) use the client object to launch commands. To introduce thread safe nature, each command must be launched using a client which has been freshly fetched from a pool, which means that all the  methods need to re-implemented in a fashion that they would:
  • borrow a client, 
  • launch the operation, 
  • record response,
  • return the client back to the pool 
  • and return the response back to the callee. 
And that's it!

Monday 1 December 2014

Running expect scripts in the background

Expect scripts are very useful for automating interaction with other processes. But the usage of expect as shown in this post, might be not very handy when required to be run as a cron or in the background.

So, here's an alternative with a small tweak:

Cheers!



Redis pipeline explained

So, I had made some conceptual assumptions in my previous post, which were incorrect in practice according to some feedback I got from the redis google group. So here's another post which attempts to be closer to the actual implementation of Redis pipelining.

Here's the thing: Pipeline is a purely a client side implementation. Redis has very less to do with pipelining. 

Now, that makes it difficult to generalize as this client side implementation may differ among client libraries. Please have a look at this post for why pipelining. So, here's how it could go about when operations are pipelined by a client:
  • Buffer the redis commands/operations on the client side
  • Synchronously or asynchronously flush the buffer periodically depending on the client library implementation
  • Redis executes these operations as soon as they are received at the server side
  • Subsequent redis commands are sent without waiting for the response of the previous commands. Meanwhile, the client is generally programmed to return a constant string for every operation in the sequence as an immediate response
  • The tricky part: the final responses. Very often it is wrongly interpreted that all the responses are always read at one shot and that the responses maybe completely buffered on server's side. Even though the response to all the operations seem to arrive at one shot when the client closes the pipeline, it is actually partially buffered on both client and the server. Again, this depends on the client implementation. There does exist a possibility that the client reads the buffered response periodically to avoid a huge accumulation on the server side. But it is also possible that it doesn't. For example: the current implementation of the Jedis client library reads all responses of a pipeline sequence at once. 
What this means and how it needs to be interpreted in practice:
  • Pipelining is super fast and is a good option for multiple subsequent client-server interactions in a high latency environment.
  • Pipelining does not guarantee atomicity. That is the job of a transaction.
  • Pipelining is not suitable if some redis operations are to be performed depending on the response of preceding operations. 
  • The performance of pipelining depends on the client library being used.
  • There must be an reasonable upper limit to the number of operations in a pipeline sequence, as the server's memory may run out by buffering too many responses.
  • Issues with pipelining might be observed even in low latency environment because of bandwidth issues like low MTU (Maximum Transmission Unit).
Now, to why this approach might have been faster can be answered from the last point above. The approach exploited the usage of the bandwidth and didn't do much to cut down on latency. 

Here are some links I read through to come at these conclusions:
  • https://groups.google.com/forum/m/#!topic/redis-db/3kRJdugPTNM
  • https://github.com/xetorthio/jedis/issues/229
  • https://groups.google.com/forum/#!topic/redis-db/BMe1uTOZbpc
  • http://redis.io/topics/pipelining
  • https://github.com/xetorthio/jedis/pull/349