Monday 15 December 2014

Thread safe Jedis client

Jedis is neat client library for Redis. The redis java client (Jedis) object provided by the library is not thread safe. Hence, if an application was muli-threaded, it should ideally be using a JedisPool object (which is thread-safe) or it would be facing a lot of unpredictable behavior and weird exceptions. JedisPool is a Jedis clients pool implementation using the Apache commons pool 2 API.

However, managing a pool and its resources might bring in some complexity and length to the app's implementation. So, I thought abstracting this resource management might help out the developer a lot. I've hosted the implementation on github. A new ThreadSafeJedis class has been introduced which will internally manage pooled jedis objects, and provide an interface similar to that of a Jedis class, but will be awesomely thread safe!

Background: The Jedis class currently uses a single Client object in all it's methods, which reasons the "thread unsafe" nature. Hence, the class cannot be simply extended. All the commands (which are methods) use the client object to launch commands. To introduce thread safe nature, each command must be launched using a client which has been freshly fetched from a pool, which means that all the  methods need to re-implemented in a fashion that they would:
  • borrow a client, 
  • launch the operation, 
  • record response,
  • return the client back to the pool 
  • and return the response back to the callee. 
And that's it!

Monday 1 December 2014

Running expect scripts in the background

Expect scripts are very useful for automating interaction with other processes. But the usage of expect as shown in this post, might be not very handy when required to be run as a cron or in the background.

So, here's an alternative with a small tweak:


Redis pipeline explained

So, I had made some conceptual assumptions in my previous post, which were incorrect in practice according to some feedback I got from the redis google group. So here's another post which attempts to be closer to the actual implementation of Redis pipelining.

Here's the thing: Pipeline is a purely a client side implementation. Redis has very less to do with pipelining. 

Now, that makes it difficult to generalize as this client side implementation may differ among client libraries. Please have a look at this post for why pipelining. So, here's how it could go about when operations are pipelined by a client:
  • Buffer the redis commands/operations on the client side
  • Synchronously or asynchronously flush the buffer periodically depending on the client library implementation
  • Redis executes these operations as soon as they are received at the server side
  • Subsequent redis commands are sent without waiting for the response of the previous commands. Meanwhile, the client is generally programmed to return a constant string for every operation in the sequence as an immediate response
  • The tricky part: the final responses. Very often it is wrongly interpreted that all the responses are always read at one shot and that the responses maybe completely buffered on server's side. Even though the response to all the operations seem to arrive at one shot when the client closes the pipeline, it is actually partially buffered on both client and the server. Again, this depends on the client implementation. There does exist a possibility that the client reads the buffered response periodically to avoid a huge accumulation on the server side. But it is also possible that it doesn't. For example: the current implementation of the Jedis client library reads all responses of a pipeline sequence at once. 
What this means and how it needs to be interpreted in practice:
  • Pipelining is super fast and is a good option for multiple subsequent client-server interactions in a high latency environment.
  • Pipelining does not guarantee atomicity. That is the job of a transaction.
  • Pipelining is not suitable if some redis operations are to be performed depending on the response of preceding operations. 
  • The performance of pipelining depends on the client library being used.
  • There must be an reasonable upper limit to the number of operations in a pipeline sequence, as the server's memory may run out by buffering too many responses.
  • Issues with pipelining might be observed even in low latency environment because of bandwidth issues like low MTU (Maximum Transmission Unit).
Now, to why this approach might have been faster can be answered from the last point above. The approach exploited the usage of the bandwidth and didn't do much to cut down on latency. 

Here are some links I read through to come at these conclusions:

Sunday 5 October 2014

Bufferedis - Faster than redis pipeline?!

Redis is a blazing fast in-memory data base. It works using a request-response protocol. Every request (command) made to redis is followed by a response. This might be a problem say, if you want to write a million key-value pairs to redis. Because every write command can be launched only post receiving the response of the previous command. This might pose to be a serious issue if the client and server experience a significant network latency.

UPDATE: The pipelining concept portrayed here are based on some incorrect assumptions. A corrected version of the concept with more details can be found in this later post. Rest of the stuff in this post, pretty much still holds.

Pipelining (a technique offered by redis) is considered to be one of the fastest method for bulk reads/writes to redis because it cuts down the round trip time (rtt) by half. In this technique, the client sends a command without waiting for the response of the previous command which results in just half the rtt. The response is read from the server in bulk (for all commands launched using the pipeline) once the client has closed the pipeline. This technique is pretty damn fast. It exploits the idea that you might not be interested in the response of a command immediately after it's launched and hence cutting down on the time spent across the network. Awesome eh?

So, is it possible to write faster than a pipelined write? Apparently it is with the simple technique of buffering. But how?

First, lets talk about an ideal use case of Bufferedis. Consider a scenario in which you are not interested in the individual response of every command. All that interests you is that you want to write/delete lots of data to/from redis server through a client-server connection which suffers a significant network latency.

Bufferedis (currently implemented in java) is simply a wrapper around the jedis (java redis client) that exploits the capability to send multiple arguments to a single command. It buffers the arguments to multiple commands and then launches it in bulk with a single command.

But why would this be faster?

Theoretically: Because in pipelining the the typical time taken for launching n write commands would be in the order of n. But for bufferedis, the time taken would be of the order of n/m where m is the size of the buffer.

Mathematically: Take a simple example of n writes to redis. The total time taken = time taken over the network + time taken for redis to execute. Hence time taken for

  • Launching n writes = (n * rtt) +  (n * et)
  • Pipelining n writes = (1/2 * n * rtt) + (n * et
  • Launching n writes using bufferedis  = (n/m * rtt) + (n * et)
    (with a buffer size of m)
where et - execution time for one write command and rtt - round trip time for one command's request to the redis server. Remember that redis executes commands really fast, so rtt is the bitch not et. 

Practically: Things are never quite ideal as a result of the assumptions a theoretical hypothesis makes. So let's do some quick benchmarking for the set command. 

Background of locations: Client - India, Server - South Central US. 

Time taken for
  • Launching a million sets = 208 sec
  • Pipelining a million sets = 94 sec
  • Launching a million sets using bufferedis = 38.028 sec
    (using a buffer size of 100k)
Also, bufferedis has an added advantage of launching these commands asynchronously. It functions in a non-blocking fashion, as a result of which, the application using bufferedis will never have to wait for the requests to be launched or for the response to be received. It simply has to add the keys/values to the buffer and not worry about the time taken to launch/receive response from redis server. Hence, the awesome non-blocking behavior adds to the better performance. 

  • We are not trying to make redis faster, but use it faster.
  • Bufferedis simply exploits the space-time tradeoff in computer science.
  • Redis Mass Insertion may or may not be faster. I am not yet sure what it does internally. I'll leave that comparison for another post.
  • Bufferedis comes with a number of setbacks as tradeoffs for speed. I'll discuss these on another post.
  • Bufferedis is currently under construction. Implementation approaches may change with time. Stay tuned to the implementation here.
Feel free to use/fork/improvise my code on github or do some benchmarking.

Sunday 13 July 2014

Script to automate ssh

Some of you might be aware of the fact that you could use sshpass on linux to achieve this job, but for mac users unfortunately brew doesn't provide you with sshpass because they think it is spoils ssh's security. Anyway, you could always download the source code of sshpass and build it yourself. If your lazy, use this script. Uses expect to make non-interactive ssh connection. You could provide the password as an argument to make it more secure.

The point you could take away from this post is expect can be used to automate interacting with programs. Hence, you could do wonders interacting with ANY command or code which executes in the terminal and interacts via STDIN/STDOUT.

Tuesday 20 May 2014

Saving energy and my F5 key

Results day is a nightmare for your F5 key if you are from NIT Calicut. Our servers are pretty bad at handling huge number of requests. Very often the server continuously resets incoming connections. It can be very annoying to manually refresh and check for updates every single time. Our results aren't put up at any specific timing. They can be updated on the page at any time of the day (sometimes even the next day!). This can be very annoying especially if it is your last semester and you have a subject on-the-line. Instead of criticizing our servers like I always used to, this time I decided to find a solution to save me the energy and my F5 key.

Nothing ground breaking - just a simple bash script that makes multiple (tons of) requests to our server and checks if the updates I am looking for have been put up. wget -t inf to take care of the annoying connection resets. Should be an easy read even for the newbies at the terminal.

DISCLAIMER: I am not going to claim it is a great script, but well, it did it's intended job. I wanted to know the link at which the results where put up as soon as it was put up. (It was a matter of another semester at NITC. You can't blame my curiosity.)  And from what I observed from our online group's activity, I just might have been the first one to see the results!

Here's what I wrote:

After a few crashes and fixes, it ran for about 3000+ iterations before I could see "Some update!". Phew.

And yes, I passed all. Woohoo.

Sunday 2 March 2014

Graceful degradation - not so 'graceful'

"Graceful degradation, well, is not so graceful" were the words of a young entrepreneur who is a co-founder of a mobile first design based startup. Driven by curiosity, I decided to dig into the subject a little. This article is simply a mix of my opinions with what I read and heard about.

Mobile first design, Graceful Degradation & Progressive enhancement   

Put in simple words, mobile first design is a software design strategy in which  software is designed initially for the mobile environment instead of following the traditional method where software is designed and developed for desktops and then support for mobile is provided eventually. With the increasing number of mobile web users, the mobile first method has been adopted by many software companies these days. But why? You'll see.

Graceful degradation is simply the route which the traditional method adopts. Fully functional web applications are developed for a desktop environment with support for many browsers. These applications are then 'degraded' for a mobile environment with lesser features. Let's take Facebook for example. If you use Facebook's desktop website and the Facebook mobile application almost equally often, you wouldn't fail to notice that many features in the main site aren't provided in the mobile application. The 'Get notifications' feature for instance is missing on the mobile application. But why? The answer is, if a web application/website were to be ported to a mobile platform with exactly the same behavior as in a desktop's browser, it would simply end up being too complex or heavy resulting in lesser responsiveness

Progressive enhancement is the exact reverse of the graceful degradation strategy i.e., developing for small platforms with minimal (but primary) features and then gradually adding functions as required while continuing to develop for more complex platforms. Mobile first development can be thought of as an instance of progressive enhancement.

Why progressive enhancement is better than graceful degradation

The advantage of progressive enhancement is that an application would be developed with most of it's important functional requirements, and gradually more requirements can be added as we continue developing for bigger platforms which provide more support for better features. The hard part of cutting down on features does not have to be done here. Also, the user wouldn't notice any 'degradation' in the application. If Facebook were to follow mobile-first design and release their mobile application first, eventually when they release their desktop site I wouldn't be complaining about the missing 'Get notifications' feature on my mobile application. Instead I would be excited to see new features on their newly released website! In case of Graceful degradation, many features of the application may need to be cut down to keeping responsiveness of the application in mind. Well, not so graceful isn't it? That is the idea.

If you would like some illustration, statistics and more detail visit my reference article.