Last summer, I participated in Google Summer of Code , Google’s student program. Every year, organizers select several Open Source projects, including from such well-known organizations as Boost.org and The Linux Foundation . Google invites students from all over the world to work on these projects.

As a Google Summer of Code 2019 participant, I did a project in the framework of the Alga library with the Haskell.org organization , which develops the Haskell language, one of the most famous functional programming languages. Alga is a library representing a type-safe representation for graphs in Haskell. It is used, for example, in semantic- The Github library, which builds semantic trees by code, graphs of calls and dependencies, and knows how to compare them. My project was to add there a type-safe representation for bipartite graphs and algorithms for this representation.

In a post I will talk about my implementation of the algorithm for checking a graph for bipartite on Haskell. Despite the fact that the algorithm is one of the most basic, its beautiful implementation in a functional style took me several iterations and required a lot of work. As a result, I settled on the implementation with monad transformers.

About myself

My name is Vasily Alferov, I am a fourth-year student at the St. Petersburg HSE. Earlier in the blog, I wrote about my project about parameterized algorithms and about a trip to ZuriHac . Right now I am on an internship at the University of Bergen in Norway, where I deal with the approaches to the List Coloring task . My interests include parameterized algorithms and functional programming.

About the implementation of the algorithm

Foreword

Students participating in the program are strongly encouraged to blog. I was provided with the Summer of Haskell site for the blog . This article is a translation of an article I wrote there in July in English, with a short introduction.

Pull Request with the code in question can be found here .

You can read about the results of my work (in English) here .

The post assumes that the reader is familiar with the basic concepts in functional programming, although I will try to recall all the terms used when they reach the point.

Checking graphs for bipartite

A bipartite graph check algorithm is usually given in the course of algorithms as one of the simplest graph algorithms. His idea is straightforward: first, we somehow put the vertices in the left or right lobe, and when a conflicting edge is detected, we claim that the graph is not bipartite.

A little more: first we put some kind of top in the left lobe. Obviously, all the neighbors of this peak should lie in the right lobe. Further, all the neighbors of the neighbors of this peak should lie in the left lobe, and so on. We continue to assign shares to the vertices until there are still vertices in the connected component of the vertex with which we started that we have not assigned neighbors to. Then we repeat this action for all connected components.

If there is an edge between the vertices that fall into the same share, it is easy to find an odd cycle in the graph, as is widely known (and quite obvious) impossible in a bipartite graph. Otherwise, we have the correct division into parts, which means that the graph is bipartite.

Typically, this algorithm is implemented using a breadth -first search or depth-deep search . Imperative languages usually use deep search, as a little more simple and not requiring additional data structures. I also chose a deep search as more traditional.

Thus, we have arrived at the following scheme. We go around the vertices of the graph by searching in depth and assign them shares, changing the number of the shares when moving along the edge. If we try to assign a share to a vertex for which a share has already been assigned, we can safely say that the graph is not bipartite. At the moment when all the vertices are assigned a share and we looked at all the edges, we have a good split.

Clean Computing

At Haskell, we assume that all calculations are clean. However, if this were indeed the case, we would not be able to print anything onto the screen. In general, pure computing is so lazy that there is not a single pure reason to calculate anything. All calculations occurring in the program are forced in one way or another in the “unclean” IO monad.

Monads are a way of representing calculations with effects in Haskell. An explanation of how they work is beyond the scope of this post. A good and clear description can be read in English here .

Here I want to note that while some monads, such as IO, are implemented through the magic of the compiler, almost all the others are implemented programmatically and all calculations in them are clean.

There are a lot of effects and each one has its own monad. This is a very strong and beautiful theory: all monads implement the same interface. We will talk about the following three monads:

Either ea is a calculation returning a value of type a or throwing an exception of type e. The behavior of this monad is very similar to working with exceptions in imperative languages: errors can be caught or passed on. The main difference is that the monad is completely logically implemented in a standard library in the same Haskell, while imperative languages usually use operating system mechanisms.
State s a — , a s.
Maybe a. Maybe , Nothing. MonadPlus Maybe, : , .

We have two types of data, Graph a and Bigraph ab, the first of which represents graphs with vertices labeled with values of type a, and the second represents bipartite graphs with vertices of the left side labeled with values of a type and vertices of the right part labeled with values of type b.

These are not types from the Alga library. Alga has no representation for undirected bipartite graphs. Types I made so for clarity.

We will also need auxiliary functions with the following signatures:

--    .
neighbours :: Ord a => a -> Graph a -> [a]

--       ,   
--        ,   .
toBipartiteWith :: (Ord a, Ord b, Ord c) => (a -> Either b c)
                                         -> Graph a
                                         -> Bigraph b c

--    
vertexList :: Ord a => Graph a -> [a]
 ,    ,  :

type OddCycle a = [a]
detectParts :: Ord a => Graph a -> Either (OddCycle a) (Bigraph a a)

It’s easy to see that if we find a conflicting edge in the depth search, the odd loop lies on top of the recursion stack. Thus, in order to restore it, we need to cut off everything from the recursion stack until the first occurrence of the last vertex.

We implement a depth search, maintaining an associative array of fraction numbers for each vertex. The recursion stack will be automatically supported through the implementation of the Functor class of the selected monad: you only need to put all the vertices from the path into the result returned from the recursive function.

My first idea was to use the Either monad, which seems to implement just those effects that we need. The first implementation I wrote was very close to this option. In fact, I had five different implementations at some point, and I ended up focusing on another.

First, we need to maintain an associative array of share identifiers - this is something about State. Secondly, we need to be able to stop in case of conflict. It can be either Monad for Either, or MonadPlus for Maybe. The main difference is that Either can return a value if the calculation was not stopped, and Maybe returns only information about it in this case. Since we do not need a separate value in case of success (it is already stored in State), we select Maybe. And at that moment when we need to combine the effects of two monads, transformers of monads come out , which just combine these effects.

Why did I choose such a complex type? Two reasons. First, the implementation is very similar to imperative. Secondly, we need to manipulate the value returned in the event of a conflict when returning from recursion to restore an odd loop, and this is much easier to do in the Maybe monad.

Thus, we get such an implementation.

{-# LANGUAGE ExplicitForAll #-}
{-# LANGUAGE ScopedTypeVariables #-}

data Part = LeftPart | RightPart

otherPart :: Part -> Part
otherPart LeftPart  = RightPart
otherPart RightPart = LeftPart

type PartMap a = Map.Map a Part
type OddCycle a = [a]

toEither :: Ord a => PartMap a -> a -> Either a a
toEither m v = case fromJust (v `Map.lookup` m) of
                    LeftPart  -> Left  v
                    RightPart -> Right v

type PartMonad a = MaybeT (State (PartMap a)) [a]

detectParts :: forall a. Ord a => Graph a -> Either (OddCycle a) (Bigraph a a)
detectParts g = case runState (runMaybeT dfs) Map.empty of
                     (Just c, _)  -> Left  $ oddCycle c
                     (Nothing, m) -> Right $ toBipartiteWith (toEither m) g
    where
        inVertex :: Part -> a -> PartMonad a
        inVertex p v = ((:) v) <$> do modify $ Map.insert v p
                                      let q = otherPart p
                                      msum [ onEdge q u | u <- neigbours v g ]

        {-# INLINE onEdge #-}
        onEdge :: Part -> a -> PartMonad a
        onEdge p v = do m <- get
                        case v `Map.lookup` m of
                             Nothing -> inVertex p v
                             Just q  -> do guard (q /= p)
                                           return [v]

        processVertex :: a -> PartMonad a
        processVertex v = do m <- get
                             guard (v `Map.notMember` m)
                             inVertex LeftPart v

        dfs :: PartMonad a
        dfs = msum [ processVertex v | v <- vertexList g ]

        oddCycle :: [a] -> [a]
        oddCycle c = tail (dropWhile ((/=) last c) c)

The where block is the core of the algorithm. I will try to explain what is happening inside him.

inVertex is part of the depth search, where we visit the top for the first time. Here we assign the number of the vertex to the vertex and run onEdge for all neighbors. This is also the place where we restore the call stack: if msum returned the value, we hang vertex v there.
onEdge is the part in which we visit the rib. It is called twice for each edge. Here we check if the peak has been visited on the other side, and visit it if not. If visited, check if the edge is in conflict. If it is, we return the value - the very top of the recursion stack, where then all the other vertices will be hung upon return.
processVertex checks for each vertex whether it has been visited, and runs inVertex on it if not.
dfs runs processVertex on all vertices.

That's all.

The history of the word INLINE

The word INLINE was not in the first implementation of the algorithm, it appeared later. When I tried to find a better implementation, I found that on some graphs the version without INLINE is noticeably slower. Considering that semantically functions should work the same way, it surprised me a lot. Even stranger, on another machine with a different version of GHC, there was no noticeable difference.

After spending a week reading the output of GHC Core, I was able to fix the problem with one line with explicit INLINE. At some point between the GHC 8.4.4 and the GHC 8.6.5, the optimizer stopped doing this on its own.

I did not expect to encounter such filth in Haskell programming. However, optimizers still make mistakes sometimes even today, and giving them hints is our task. For example, here we know that a function must be inlined, since it is inlined in the imperative version, and this is an occasion to give the compiler a hint.

What happened next?

Then I implemented the Hopcroft-Karp algorithm with other monads, and the program ended on this.

Thanks to the Google Summer of Code, I gained practical experience in functional programming, which not only helped me get an internship in Jane Street next summer (I'm not sure how much this place is known even among the knowledgeable audience of Habr, but it is one of the few where you can summer to engage in functional programming), but also introduced to the amazing world of applying this paradigm in practice, significantly different from my experience in traditional languages.

GSoC 2019: Checking Bipartite Counts and Monad Transformers