Have you ever been able to confidently write Java for hours without executing it? I think exploiting J8 and generic types can help with that :) Although I consider Java as a practical language, I've never really been a fan. But after my recent ventures into J8's functional features, I like it a little better! In this article, I'm going to demonstrate how one can leverage these features to write some clean and reliable Java code by using a specific example.
Functions can return a result of one type. But what if we wanted to return a result containing a value of one of two types? For instance, a remote service could return an Exception or a Result as a response. One way would be to wrap both of them inside a Response object and provide some methods to check which type of answer is available, and give them suitable naming/javadoc to help the programmer find out how to retrieve the appropriate answer to the request.
This is exactly the kind of *nonsense* we hope to avoid: contracts which aren't obviously visible at code or type level.
It would be nice to have the request return an object of type Either<Exception,Result>. Now this is very different from a Response object, as Either is very generic and as you will see, much more robust. In Either<L,R>, L and R denote Left and Right values. By convention, R is used to hold the "right" or correct result. Let's look at instances Left and Right later, and look at the Either methods first:
Here' a sample use:
getComponent returns an exception when the component is not found, and the component when it is. It uses Left<Exception,Result>(Exception e) and Right<Exception,Result>(Result r) constructors to construct an object of type Either<Exception,Result>. To achieve this specific use case, one might as well simply use Optional, but assume this code is on a remote service which offers responses using Either, and just hold on :)
Now that we have a response, we would want to do something useful with it - like print it! To examine the result, we use pattern matching!
Neat, isn't it? Here's the definition of instances Left and Right:
Now, what if we wanted to find a component's price using a component's ID? Say, by making another call which could fail?
Now, that we have the pieces to get a component and find it's price - where both of which return an either value of a different type, how does one compose them? This is where the magical bind comes in:
The bind ensures that if the response returned a failure (exception), the failure is propagated, else the correct computation takes place on the "Right" value. But not always do we have Either computations (in our context, failure prone), we might have simple functions which we would like to re-use like this:
How does one re-use this to apply a discount on our component's price wrapped inside an Either? This is where fmap is incredibly useful: to use functions of simple types and on an Either result. Here's a modified getComponentPrice which applies a discount:
Now, time to put these together: assume a product is made of components, and to a buy a product one would need to sum the prices of all products. The idea is: if any phase fails, the failure should be propagated. If a component is not available or it's price is a not available, it shouldn't be possible to buy the product!
How cool is that? Notice how all the core functions - getComponentPrice and getProductPrice - containing the important business logic do not contain any unnecessary and clutter-y error handling code. This is the real beauty of this pattern. All the error handling is done under the hood! Time to test it:
Works like a charm on the very second run! The problem with my first run was that I forgot to initialize something and at the only place I do an unsafe .get() (Guess where?) caused a NullPointerException. Yet another reason to use Optional everywhere.
The really nice part is that I spent more time coding, than debugging and shaping my code accordingly - very rewarding indeed! The types restrict you to a point that you are forced to screw up very less. This is the main reason I can confidently write code for hours without running it. Can you? :)
p.s. Congratulations, you have (hopefully) learnt to write some monadic code in Java: Either is a monad. This is was also my hidden agenda ;) All the code can be found here: https://github.com/nachivpn/fj
Hard Code
Sunday, 30 July 2017
Saturday, 29 July 2017
Mission partially accomplished?
In my previous post post, I had talked about a challenge I had with my supervisor and discussed some thoughts on functional programming for efficiently building real world distributed systems. This post is an update of what happened :)
I pitched my idea during the CERN Webfest 2017 and (rather surprisingly) attracted some interest. After the presentation, me and some fellow summer students (Ophir, FabioL, Badissa and Martyn) who found the idea exciting hacked away during the last weekend on the project we called FADE (A Framework for Distributed Execution). We didn't win (duh!) as a presentation with some benchmarks was no match for flashing LEDs, an augmented reality particle collider and a HoloLens particle motion game! But this was expected. The goal was to showcase, and I hope we caught at least one person's attention.
We expected to build an Erlang backend which offers a Web API through which one can submit jobs and get results, and a Python client which internally uses this API to provide a nice ThreadPoolExecutor like interface:
No communication, no irrelevant error handling, no orchestration, no installation, no non-sense. As clean and simple as executing on a pool of threads.
And guess what? We did manage to make this work for simple functions with native Python code! :) How? At a high level it looks like this: Serialize a function and it's arguments, transport it over the network, deserialize on the worker, and well, run!
This assumes that the code was already available on the worker - so well, it obviously works. But, the real challenge lies in transporting the code and especially it's dependancies: what if the submitted job calls some functions from some other module? In Erlang, one would simply have to do a nl(module) to ensure that the module is made available throughout the cluster. We aimed to crack this by "building the job" using a build tool in Python during runtime, but this was a lot harder than expected.
The Erlang backend came along quite nicely as expected: Jobs are lazily dispatched to workers (as the workers poll a master) which provides a nice work based load balancing. If a worker or node fails, the jobs are re-scheduled on available workers. Hence, the backend offers a distributed, load balanced and fault tolerant workpool - a child's play in Erlang. The backend was merely aimed to be used for routing and managing communication. The job and how it is to be executed has to be provided by the client implementation.
But, the plan isn't to stop here. I would like to see the above API work without ANY additional effort from the programmer. Since my Java-foo is better, I might try out a Java client. Overall, a fun experience and a lot of thought into distribution of code dependancies and the Erlang magic :)
I did a demo of FADE to my supervisors, and offered to show it to a few other people at CERN. The reaction was a lot less enthusiastic than I expected. Nevertheless, I found it rather fascinating that the implementation was *incredibly* small and easily adaptable. Just 95 lines!
For now, I'm a little tired. I will continue to leverage the incredibly powerful functional programming languages and their principles in the code I write, but - for now - not going to preach anymore.
For the curious reader, here's the ENTIRE implementation of the core logic - only the http wrapper is missing (You can find the rest of the code here: https://github.com/fade-cern/):
I pitched my idea during the CERN Webfest 2017 and (rather surprisingly) attracted some interest. After the presentation, me and some fellow summer students (Ophir, FabioL, Badissa and Martyn) who found the idea exciting hacked away during the last weekend on the project we called FADE (A Framework for Distributed Execution). We didn't win (duh!) as a presentation with some benchmarks was no match for flashing LEDs, an augmented reality particle collider and a HoloLens particle motion game! But this was expected. The goal was to showcase, and I hope we caught at least one person's attention.
We expected to build an Erlang backend which offers a Web API through which one can submit jobs and get results, and a Python client which internally uses this API to provide a nice ThreadPoolExecutor like interface:
No communication, no irrelevant error handling, no orchestration, no installation, no non-sense. As clean and simple as executing on a pool of threads.
And guess what? We did manage to make this work for simple functions with native Python code! :) How? At a high level it looks like this: Serialize a function and it's arguments, transport it over the network, deserialize on the worker, and well, run!
This assumes that the code was already available on the worker - so well, it obviously works. But, the real challenge lies in transporting the code and especially it's dependancies: what if the submitted job calls some functions from some other module? In Erlang, one would simply have to do a nl(module) to ensure that the module is made available throughout the cluster. We aimed to crack this by "building the job" using a build tool in Python during runtime, but this was a lot harder than expected.
The Erlang backend came along quite nicely as expected: Jobs are lazily dispatched to workers (as the workers poll a master) which provides a nice work based load balancing. If a worker or node fails, the jobs are re-scheduled on available workers. Hence, the backend offers a distributed, load balanced and fault tolerant workpool - a child's play in Erlang. The backend was merely aimed to be used for routing and managing communication. The job and how it is to be executed has to be provided by the client implementation.
But, the plan isn't to stop here. I would like to see the above API work without ANY additional effort from the programmer. Since my Java-foo is better, I might try out a Java client. Overall, a fun experience and a lot of thought into distribution of code dependancies and the Erlang magic :)
I did a demo of FADE to my supervisors, and offered to show it to a few other people at CERN. The reaction was a lot less enthusiastic than I expected. Nevertheless, I found it rather fascinating that the implementation was *incredibly* small and easily adaptable. Just 95 lines!
For now, I'm a little tired. I will continue to leverage the incredibly powerful functional programming languages and their principles in the code I write, but - for now - not going to preach anymore.
For the curious reader, here's the ENTIRE implementation of the core logic - only the http wrapper is missing (You can find the rest of the code here: https://github.com/fade-cern/):
Monday, 10 July 2017
Challenge accepted!
Background
From the various talks and lectures at CERN, I've understood that computing plays a key role in CERN's mission to understand the universe through particle physics experiments. In addition to CERN's needs for computing (being able to handle and analyze the massive volumes of data generated by the LHC), it is also largely within CERN's interest to innovate and push the boundaries of current limits of computation and technology.
I've been lucky enough to be selected for their Summer student program and be a minuscule part of their huge mission. I'm working on a project to develop a framework called MolR which would allow AccTesting to delegate, manage and control execution of tests remotely on it's test execution servers. AccTesting is a framework which is currently used for commissioning and testing software and electronic systems consisting of various machines which are a part of the LHC. Briefly, MolR is expected to be able to distribute computational tasks over various machines i.e, a distributed work pool framework. Since MolR is expected to interface with various programs and systems written in Java, the obvious choice of technology for the project was Java.
The challenge
Requirements for MolR include being able to orchestrate executions of varying durations and outcomes (some may fail, some may succeed, some may have a result etc), provide fault tolerance (network may break, machines or applications may crash), and manage the complexity involved in communicating with a large number of machines.
But, this is exactly the kind of problem that Erlang excels at! In fact, Erlang is so good at this that it can even be built by a bunch of inexperienced student programmers over a weekend ;) This is exactly the challenge I have with JC (one of my project supervisors). I believe that the Webfest at CERN would be a nice platform to work on this challenge. My supervisors - although slightly skeptical - are largely supportive and are keen to see the outcome.
Building the core (fault tolerant work pool) in Erlang is the easy part. The challenge actually lies in being able to interface with Java programs (which are the executable computational tasks) and managing dependancies (both code and resources) of these tasks on the machines in which they are executed. Docker? Maybe. I don't know yet. And of course, the project is useless without a nice web front-end and not so interesting without some physics data to crunch.
Why the **** ?
Although the project was initiated by a specific use case, it has a wide and generic application:
Thread pools are well understood and widely used for executing tasks over a fixed set of threads. The specific thread on which an executable task is executed is irrelevant to the programmer. All that she/he cares about is that the task is executed. This idea of decoupling the execution of the task from where it is executed is also called location transparency. So, how does one achieve location transparency over a cluster of machines (as opposed to threads running on same machine)? The outcome of this project would be a generic tool to tackle this problem.
At the first open lab lecture, Helge Meinhard explains how most of the computationally intensive tasks at CERN are embarrassingly parallel, and as a result one does not need expensive super computers. What he means is that the individual tasks can be computed independently and they do not require shared memory. This enables one to exploit distributed execution with ease! Upon a request for clarification, he confirms the same. This also explains the large amounts of interest and investment in distributed systems at CERN. Distributed computing is indeed one of the hottest computing problems at CERN ;)
A tool of this nature could be valuable to any group at CERN (or outside) crunching some big numbers!
Why it matters
My interests in this challenge are two fold: To show that
1. Erlang is a very fast way to build robust distributed systems, and that
2. Functional programming is real, powerful and practical.
Functional programming is here and it IS "state of the art". Most of the arguments for the use of functional programming languages by the advocates are highly technical (often using terms which are only understood by functional programmers themselves!). Even if a passionate functional programmer took the effort to explain the expressiveness of functional code, this is often overlooked by the imperative minded listener since the expressiveness could potentially be achieved in their favorite imperative language using a sophisticated design pattern or a framework. Additionally, efforts to convey beauty and elegance of code often fail to be successful as beauty - by definition - is subjective. I often find myself having trouble explaining why a functional program looks beautiful and why it appears to be an elegant solution. Although it is often visibly concise, it doesn't seem to impress the imperative minded listener who is by now lost in the unfamiliarity of the language itself.
So, no more talking. It's time to show what functional programming languages can do. Join me? :)
Subscribe to:
Posts (Atom)