In my previous post post, I had talked about a challenge I had with my supervisor and discussed some thoughts on functional programming for efficiently building real world distributed systems. This post is an update of what happened :)
I pitched my idea during the CERN Webfest 2017 and (rather surprisingly) attracted some interest. After the presentation, me and some fellow summer students (Ophir, FabioL, Badissa and Martyn) who found the idea exciting hacked away during the last weekend on the project we called FADE (A Framework for Distributed Execution). We didn't win (duh!) as a presentation with some benchmarks was no match for flashing LEDs, an augmented reality particle collider and a HoloLens particle motion game! But this was expected. The goal was to showcase, and I hope we caught at least one person's attention.
We expected to build an Erlang backend which offers a Web API through which one can submit jobs and get results, and a Python client which internally uses this API to provide a nice ThreadPoolExecutor like interface:
No communication, no irrelevant error handling, no orchestration, no installation, no non-sense. As clean and simple as executing on a pool of threads.
And guess what? We did manage to make this work for simple functions with native Python code! :) How? At a high level it looks like this: Serialize a function and it's arguments, transport it over the network, deserialize on the worker, and well, run!
This assumes that the code was already available on the worker - so well, it obviously works. But, the real challenge lies in transporting the code and especially it's dependancies: what if the submitted job calls some functions from some other module? In Erlang, one would simply have to do a nl(module) to ensure that the module is made available throughout the cluster. We aimed to crack this by "building the job" using a build tool in Python during runtime, but this was a lot harder than expected.
The Erlang backend came along quite nicely as expected: Jobs are lazily dispatched to workers (as the workers poll a master) which provides a nice work based load balancing. If a worker or node fails, the jobs are re-scheduled on available workers. Hence, the backend offers a distributed, load balanced and fault tolerant workpool - a child's play in Erlang. The backend was merely aimed to be used for routing and managing communication. The job and how it is to be executed has to be provided by the client implementation.
But, the plan isn't to stop here. I would like to see the above API work without ANY additional effort from the programmer. Since my Java-foo is better, I might try out a Java client. Overall, a fun experience and a lot of thought into distribution of code dependancies and the Erlang magic :)
I did a demo of FADE to my supervisors, and offered to show it to a few other people at CERN. The reaction was a lot less enthusiastic than I expected. Nevertheless, I found it rather fascinating that the implementation was *incredibly* small and easily adaptable. Just 95 lines!
For now, I'm a little tired. I will continue to leverage the incredibly powerful functional programming languages and their principles in the code I write, but - for now - not going to preach anymore.
For the curious reader, here's the ENTIRE implementation of the core logic - only the http wrapper is missing (You can find the rest of the code here: https://github.com/fade-cern/):