Sunday, 5 October 2014

Bufferedis - Faster than redis pipeline?!

Redis is a blazing fast in-memory data base. It works using a request-response protocol. Every request (command) made to redis is followed by a response. This might be a problem say, if you want to write a million key-value pairs to redis. Because every write command can be launched only post receiving the response of the previous command. This might pose to be a serious issue if the client and server experience a significant network latency.

UPDATE: The pipelining concept portrayed here are based on some incorrect assumptions. A corrected version of the concept with more details can be found in this later post. Rest of the stuff in this post, pretty much still holds.

Pipelining (a technique offered by redis) is considered to be one of the fastest method for bulk reads/writes to redis because it cuts down the round trip time (rtt) by half. In this technique, the client sends a command without waiting for the response of the previous command which results in just half the rtt. The response is read from the server in bulk (for all commands launched using the pipeline) once the client has closed the pipeline. This technique is pretty damn fast. It exploits the idea that you might not be interested in the response of a command immediately after it's launched and hence cutting down on the time spent across the network. Awesome eh?

So, is it possible to write faster than a pipelined write? Apparently it is with the simple technique of buffering. But how?

First, lets talk about an ideal use case of Bufferedis. Consider a scenario in which you are not interested in the individual response of every command. All that interests you is that you want to write/delete lots of data to/from redis server through a client-server connection which suffers a significant network latency.

Bufferedis (currently implemented in java) is simply a wrapper around the jedis (java redis client) that exploits the capability to send multiple arguments to a single command. It buffers the arguments to multiple commands and then launches it in bulk with a single command.

But why would this be faster?

Theoretically: Because in pipelining the the typical time taken for launching n write commands would be in the order of n. But for bufferedis, the time taken would be of the order of n/m where m is the size of the buffer.

Mathematically: Take a simple example of n writes to redis. The total time taken = time taken over the network + time taken for redis to execute. Hence time taken for

  • Launching n writes = (n * rtt) +  (n * et)
  • Pipelining n writes = (1/2 * n * rtt) + (n * et
  • Launching n writes using bufferedis  = (n/m * rtt) + (n * et)
    (with a buffer size of m)
where et - execution time for one write command and rtt - round trip time for one command's request to the redis server. Remember that redis executes commands really fast, so rtt is the bitch not et. 

Practically: Things are never quite ideal as a result of the assumptions a theoretical hypothesis makes. So let's do some quick benchmarking for the set command. 

Background of locations: Client - India, Server - South Central US. 

Time taken for
  • Launching a million sets = 208 sec
  • Pipelining a million sets = 94 sec
  • Launching a million sets using bufferedis = 38.028 sec
    (using a buffer size of 100k)
Also, bufferedis has an added advantage of launching these commands asynchronously. It functions in a non-blocking fashion, as a result of which, the application using bufferedis will never have to wait for the requests to be launched or for the response to be received. It simply has to add the keys/values to the buffer and not worry about the time taken to launch/receive response from redis server. Hence, the awesome non-blocking behavior adds to the better performance. 

  • We are not trying to make redis faster, but use it faster.
  • Bufferedis simply exploits the space-time tradeoff in computer science.
  • Redis Mass Insertion may or may not be faster. I am not yet sure what it does internally. I'll leave that comparison for another post.
  • Bufferedis comes with a number of setbacks as tradeoffs for speed. I'll discuss these on another post.
  • Bufferedis is currently under construction. Implementation approaches may change with time. Stay tuned to the implementation here.
Feel free to use/fork/improvise my code on github or do some benchmarking.