From the various talks and lectures at CERN, I've understood that computing plays a key role in CERN's mission to understand the universe through particle physics experiments. In addition to CERN's needs for computing (being able to handle and analyze the massive volumes of data generated by the LHC), it is also largely within CERN's interest to innovate and push the boundaries of current limits of computation and technology.
I've been lucky enough to be selected for their Summer student program and be a minuscule part of their huge mission. I'm working on a project to develop a framework called MolR which would allow AccTesting to delegate, manage and control execution of tests remotely on it's test execution servers. AccTesting is a framework which is currently used for commissioning and testing software and electronic systems consisting of various machines which are a part of the LHC. Briefly, MolR is expected to be able to distribute computational tasks over various machines i.e, a distributed work pool framework. Since MolR is expected to interface with various programs and systems written in Java, the obvious choice of technology for the project was Java.
Requirements for MolR include being able to orchestrate executions of varying durations and outcomes (some may fail, some may succeed, some may have a result etc), provide fault tolerance (network may break, machines or applications may crash), and manage the complexity involved in communicating with a large number of machines.
But, this is exactly the kind of problem that Erlang excels at! In fact, Erlang is so good at this that it can even be built by a bunch of inexperienced student programmers over a weekend ;) This is exactly the challenge I have with JC (one of my project supervisors). I believe that the Webfest at CERN would be a nice platform to work on this challenge. My supervisors - although slightly skeptical - are largely supportive and are keen to see the outcome.
Building the core (fault tolerant work pool) in Erlang is the easy part. The challenge actually lies in being able to interface with Java programs (which are the executable computational tasks) and managing dependancies (both code and resources) of these tasks on the machines in which they are executed. Docker? Maybe. I don't know yet. And of course, the project is useless without a nice web front-end and not so interesting without some physics data to crunch.
Why the **** ?
Although the project was initiated by a specific use case, it has a wide and generic application:
Thread pools are well understood and widely used for executing tasks over a fixed set of threads. The specific thread on which an executable task is executed is irrelevant to the programmer. All that she/he cares about is that the task is executed. This idea of decoupling the execution of the task from where it is executed is also called location transparency. So, how does one achieve location transparency over a cluster of machines (as opposed to threads running on same machine)? The outcome of this project would be a generic tool to tackle this problem.
At the first open lab lecture, Helge Meinhard explains how most of the computationally intensive tasks at CERN are embarrassingly parallel, and as a result one does not need expensive super computers. What he means is that the individual tasks can be computed independently and they do not require shared memory. This enables one to exploit distributed execution with ease! Upon a request for clarification, he confirms the same. This also explains the large amounts of interest and investment in distributed systems at CERN. Distributed computing is indeed one of the hottest computing problems at CERN ;)
A tool of this nature could be valuable to any group at CERN (or outside) crunching some big numbers!
Why it matters
My interests in this challenge are two fold: To show that
1. Erlang is a very fast way to build robust distributed systems, and that
2. Functional programming is real, powerful and practical.
Functional programming is here and it IS "state of the art". Most of the arguments for the use of functional programming languages by the advocates are highly technical (often using terms which are only understood by functional programmers themselves!). Even if a passionate functional programmer took the effort to explain the expressiveness of functional code, this is often overlooked by the imperative minded listener since the expressiveness could potentially be achieved in their favorite imperative language using a sophisticated design pattern or a framework. Additionally, efforts to convey beauty and elegance of code often fail to be successful as beauty - by definition - is subjective. I often find myself having trouble explaining why a functional program looks beautiful and why it appears to be an elegant solution. Although it is often visibly concise, it doesn't seem to impress the imperative minded listener who is by now lost in the unfamiliarity of the language itself.
So, no more talking. It's time to show what functional programming languages can do. Join me? :)