Friday, 28 October 2016

Proving mathematical induction using predicate logic

Mathematical induction is a very useful proof technique which is commonly used to reason in many areas of mathematics and computer science. It seems to work out of the box. This article explores the ideas behind induction, why it is correct, and where it works - as it would be quite embarrassing to not be able to reason the correctness of our reasoning tool!

Mathematical induction over natural numbers can be stated as follows:
  • Base case: A property P(x) is true (or "holds") for x=1. i.e, P(1) holds.
  • Inductive step: Assuming P(k) holds for a natural number k, then P(k+1) holds.
If both of the above are provable, then P(n) is true for any natural number n.

Ok, but why?

The answer lies in fundamentals of logic. 

Proof of mathematical induction


Say, we have proved that P(1) holds (the base case - B)
  • P(1)
and that P(k+1) holds when P(k) is assumed to hold, or, in other words, P(kimplies P(k+1) (the inductive step - I)
  • P(k) → P(k+1)

Now, using I, we can substitute k = 1 and hence: P(1) implies that P(2) holdsSince we have B which states that P(1) holds, we can infer that P(2) holds (using Modus Ponens). Similarly, substituting k = 2, we know P(3) holds. Hence, we can repetitively argue up till n that P(n) holds.

P(1) → P(2) → P(3) → P(4) → P(5) → ... → P(n
Image credits: http://cliparts.co/cliparts/gce/AAp/gceAApLXi.jpg

More formally, using predicate logic, we can describe proof by induction for natural numbers as: 


For a property P, the following formula in predicate logic holds to be true in natural numbers.


f = ( P(1) ∧ k. (P(k) → P(k+1)) ) → ∀nP(n)



It is crucial to notice that the formula f cannot be proved using natural deduction as it is claimed to be satisfiable only by a specific model M described as follows:

Universal set A = N = {0, 1, 2, 3, 4....}

Set of predicates IP = {P} where
  • P is a unary predicate
Set of functions IF = {succ0} where 
  • succ: → A which maps a number to it's successor
  • 0 is a nullary function or a constant
Note: k+1 is syntactic sugar for succ(k)

The proof above proves semantic entailment i.e, proves that our model M satisfies the formula f. This means that induction can be used only within the defined model with a specific domain and will fail when used an arbitrary model/domain. For example: f cannot be used to prove a property P(x) on model M' with A' = {0, 32, -98, 1097, 41}. f only works for an ordered set of numbers where the i+1 th number is greater than ith number in the order by 1. But, on the brighter side, induction or a variant of induction can be used, if it can be proved that it holds in the domain of concern.

Another way to prove induction, is to use proof by contradiction:

Say, we have already proved the base case and inductive step. Assume that P(n) does not hold where n = m+1 for some m. As P(m+1) does not hold, by contraposition of implication (Modus Tollens), P(m) does not hold. We can repeat this argument till we reach P(2). And since P(2) does not hold, P(1) does not hold. But this contradicts our base case which we have already proved. Hence, our assumption must be wrong. Therefore, P(k+1) must hold.

        Induction is not limited strictly to natural numbers. From our proof above, we can see that it can be used as a proof technique for "similar" models. Examples are:

  • A = {0, 1, 2, 3, 4....} where P(0) is the base case
  • A = {-2, -1, 0, 1, 2, 3, 4....} where P(-2) is the base case
  • A = {12, 13, 14....} where P(12) is the base case
etc. 

Since we have chosen an arbitrary property P, we can claim that proof by induction holds for any property! Unfortunately, first order predicate logic can't express this, so we turn to a second order formalization:


∀P. P(m) ∧ kP(k) → P(k+1) ) → ∀nP(n)


where m, k and n belong to an applicable domain and ≥ m. 

Structural Induction

Induction is not limited to just sequence of numbers - it can be used on some structures too! Let us take the example of proving that a tree has exactly n-1 edges if it has n nodes.

Let P(x) be a property: A tree (an undirected connected graph with no cycles) of x nodes always has x -1 edges.

Base case: P(1) holds as a tree with 1 node has 0 edges.
Inductive hypothesis: Assume P(k) holds i.e., a tree with k nodes has k-1 edges.
To prove: P(k+1) i.e, a tree with k+1 nodes has k edges.
Now consider a tree T of k nodes. 

Image credit: http://www.ics.uci.edu/~eppstein/0xDE/triplerake.png
Let us try to add a new node x to T to make a new tree T'. In doing so, you can add exactly one edge from any one of the existing nodes in the tree T to x, because, if you added more than one edge to x from other nodes in the tree, then you would introduce a cycle - and it wouldn't be a tree anymore by definition. As a result, a our new tree T' of k+1 nodes has k-1+1 (=k) edges. This proves that P(k+1) holds. As a result, P(n) holds.

There are other variants of induction like course of values induction. But the underlying idea behind all of them are very similar: A formal representation of induction must be formulated and proved to be satisfiable in a given model. Post which, we can use the formulated induction to prove any property in the model.

Isn't proof by induction just beautiful? It's a very powerful, yet simple and elegant tool.

Thursday, 11 February 2016

Notes on Concurrency and Distribution: Programming Languages

In my previous posts, I had mentioned programming languages multiple times as the right place to implement solutions to concurrency and distribution. I'm going to elaborate on this a bit more.

A programming language is a tool which provides powerful abstractions to build software systems. Much of our approach while building systems is guided by the languages that we know or use. For example, there is a significant difference between the people who use a functional language vs people who use an imperative language.

Functional programmers seem to have a data driven approach i.e, they think about how the input data can be transformed to generate the desired output. They use functions as black-box entities to transform the data, and the program as a whole is a function. And, by definition, a function must return the same output every time for a given input - this encourages the programmer to write stateless programs i.e, the programmer doesn't use many intermediate data structures which represent the current state of the program's execution.

While on the other hand, imperative programmers approach the problem as a sequence of steps which need to be performed on the given data to generate the desired output. Here, the programmer often achieves this by storing the result of a certain sequence of instructions in an intermediate data structure, which can further be used in the remaining sequence of instructions.

The key idea, is to realize that programming languages, to a large extent, affect the way programmers break down a given problem into sub-problems. More details on this specific topic can be found on this paper.

This is precisely the reason I (and many more people) think a programming language is the most effective way to provide abstractions to effectively build concurrent and distributed systems. But this is also where the problem lies, as most widely used languages are designed primarily for sequential computation and merely "support concurrency" (let's not even get into distribution).

It is not that these languages cannot be used to build distributed systems (people have done that), rather, it is very difficult to effectively build and maintain distributed systems in such languages (the same people will tell you that). It would be great to have the intricacies involved hidden from the programmer and implemented (and hence natively supported) by the language itself. When was the last time you said "Oh, I forgot to deallocate that memory block" ? (I feel sorry for you if you have an immediate answer).

More on the specifics later.

p.s. Yes, Erlang folks, I hear you.

Wednesday, 10 February 2016

Notes on Concurrency and Distribution: 101

While reading up on distributed systems, I couldn't help noticing the amount of overlap between the ideas of distribution, concurrency and parallelism. At a point of discussion with some friends, it almost seemed like they meant the same thing. In this post, I'm going study the background necessary to understand the differences and lay a foundation for understanding the relationship between concurrency and distribution.

Distributed, Concurrent and Parallel

Here's one of my favorite statements to start with:

"the processors in a typical distributed system run concurrently in parallel" - Wikipedia

Interesting? Indeed.

Let's begin with distributed systems: A system consisting of software components located on networked computers. These components have their own individual private memory and work towards achieving the system's goal. When working towards a common goal, there arises a need to communicate with each other for various reasons (like reporting a status of a task execution, sending an output, triggering an event etc.), and this is facilitated by the network.

Concurrency is the property of a system in which multiple computational tasks are in execution state at overlapping periods of time. They may even start and complete within the overlapping time period.

On the other hand, a parallel system executes a set of computational tasks simultaneously i.e, they run exactly at the same time in different execution points. The execution point (or thread of control as Wikipedia puts it) could refer to a processor or even a computer as a whole. 

Confusion mainly arises between the latter two. A few examples did the trick for me:
  • parallel processes can't be executed on the same execution point, say a processor. While, on the other hand, the concurrent execution of two processes can take place on the same processor by context switching.
  • a concurrent computation MAY be executed in parallel.   
I bring in parallelism here just to get the basic understanding of the differences right. We will continue to focus on concurrency and distribution.

A problem with concurrent execution of computational tasks is keeping the outcome deterministic i.e, if parts of the tasks are run in a different order, they must produce the same outcome. When the processes communicate with each other, the order in which they communicate cannot easily be established as we don't have control over the order of execution (in context switching for example), hence there is an explosion in the number possible execution paths of the system and which one exactly isn't known - this introduces non-deterministic behavior in the system. We need a framework that allows us to think about and approach communication in a deterministic manner without having to handle too many constraints. What better place to introduce this than the programming language itself? More on this later.

Communication

Communication, in general, can be implemented in different ways.

Communication between concurrent processes on the same machine can be achieved by writing and reading from the same memory block - called shared memory communication. I find this approach very troubling and difficult to handle. The programmer has to explicitly go through the difficulty of locking, synchronizing and making the program thread safe.

Another method is message passing: the processes send messages to each other. For example, a message can be sent from one process to another by opening a socket.

In the case of distributed systems, shared memory doesn't apply as all the components have their own individual private memory. Which leaves us with passing messages between the components.

Although, in practice, while building a distributed system, we often simplify communication by mocking a shared memory using a centralized data-store and building stateless components. In this case, I think, we avoid the problem, not solve it. I am not sure if such a system can be called "truly distributed". What happens to the system when the components cannot connect to the centralized database due to a network failure?

Such a system trades some fault tolerance for simplicity of the system and may not be suitable for all use cases. Additionally, it brings in the complexity of making the components stateless and hence having to query the database often - which would also affect the performance of the system. A centralized registry can be used as a backup store to restore the state of the system in case of an partial or universal failure. But, using it as the primary mode of communication doesn't seem to be the best option for the reasons stated above.

Along with messaging, arise a number of requirements such as a messaging protocol, message delivery mechanism, component discovery etc. These are topics we will eventually be discovering in depth. There are a number of open questions here: What about run time safety in distributed systems? Can we identify protocol violations while implementing the system i.e, before run time? How do we debug errors efficiently?

As per my current knowledge, it is my understanding that communication (more specifically message passing) is what brings these two fields together. Being two different fields, they would (probably) have different approaches and terminology for a similar problem - both theoretically and practically. It would be fruitful to study both these fields and especially their intersection. In the process I hope to study relevant formal models and evaluate various tools such as Erlang, Lasp lang, Vert.x, Akka, Scribble, Mungo and other similar ones which come up as a part of the study or via suggestions.

References:

Tuesday, 19 January 2016

Notes on Concurrency and Distribution

In the age of distributed software applications, the focus of computing has shifted towards solving communication problems. In the industry, these problems are tackled by making domain specific assumptions, using strict communication protocols etc. Such approaches are troublesome as they require significant development effort and time. The non-determinism involved makes it very difficult to debug and document such applications. I’ve built large distributed systems and I’ve personally felt this problem. In addition to the nightmare of deployment and maintenance, these systems are largely unpredictable due to the convoluted dependencies. In addition, the problem worsens exponentially as the system grows due to the increased intercommunication between the components. 

 Ideally, we shouldn't have to worry about this at all. In the past, programming language principles have solved such low level problems. Programming languages have, as a result, been successful in providing powerful abstractions for effectively solving computational problems. 

What we need is a system and a programming model that provides a reliable abstraction to easily build and understand distributed applications. It would be fruitful to explore existing programming language principles and relevant ongoing research to solve distribution problems.

 I hope to do some research in this area and will be posting a summary of my findings and understanding every weekend. I would also like to understand the role of concurrency in distributed systems. So, I will be beginning with basic aspects of understanding concurrency, parallelism, distribution and then moving on to defining the actual problem before jumping into papers. I've realized that starting by reading papers can be quite misleading. 

During my trip to Bangalore, I've gotten some wonderful questions, comments and suggestions on the directions to look into from friends and folks at Papers We Love - Bangalore. Also got some comments from Prof. Shriram Krishnamurthi (from Brown University) on this topic - who was a speaker at one of the PWLB meetups. I will be trying to understand, answer and analyze all of these inputs in my coming up posts. This should be a good learning exercise for me too!