Saturday, September 19, 2009

Haskell, monads, graph theory and the puzzle that took 19 years to solve

Well, 'solve' is actually a misnomer, since it turns out that the puzzle doesn't have a solution (at least that's what my code tells me). Here's the puzzle:

Given the figure below, find a path from point A to point B that crosses every edge line segment exactly once:

I came upon this when I was in college, and spent an hour or so on it, trying out different paths by hand, but could not find a solution.

Nineteen years later, The Communications of the ACM carries an article on the P=NP question, which triggers an impulse in me to read up more on the complexity of algorithms, and I learn about graph theory, DFS, BFS, the entire works. Quite a fascinating subject. Anyway, the upshot of all this is that I realize that there is a formal method of solving the puzzle that had piqued my curiosity nearly two decades ago.

Coincidentally, my copy of Real World Haskell arrives at my doorstep at around the same time I am thinking of whipping up some code to solve the puzzle. Learn Haskell, solve puzzle. One stone, two birds.

The first thing is to come up with the data structures. Not too difficult (apologies for the screwed up formatting -- getting all the tabs correct in Blogger would probably take me four hours):

type Node = Int
type Edge = (Node, Node)
type Path = [Node]
type Graph = ([Node],[Edge])

theGraph :: Graph
theGraph = ([1,2,3,4,5,6], [(1,2),(1,3),(1,4),(1,2),
(2,3),(3,4),(1,4),(2,5),
(3,5),(3,6),(4,6),(1,5),
(5,6),(1,6),(1,5),(1,6)])

One of the options I considered initially was to model the direction of the path, something along the lines of

type Path = [(Edge,Direction)]

with Direction indicating whether we are going forward or backward, but it turns out that this was not required.

Having gotten the data structures in place, it's just a question of routine code to put together the helper functions that will be needed:

-- to check whether two edges are the same [(1,2) is equal to (2,1)]
edgeEq :: Edge -> Edge -> Bool
edgeEq e1 e2 = e11 == e21 && e12 == e22 ||
e11 == e22 && e12 == e21
where e11 = fst e1
e12 = snd e1
e21 = fst e2
e22 = snd e2

-- get all the edges joined at a node
getEdges :: Node -> [Edge]
getEdges node = [edge | edge <- snd theGraph, fst edge == node || snd edge == node]

-- get the node at the other end of an edge
getOtherEnd :: Node -> Edge ->Node
getOtherEnd node edge = if fst edge == node then snd edge else fst edge

-- get all the neighbours for a node
getNeighbours :: Node -> [Node]
getNeighbours node = map (getOtherEnd node) (getEdges node)

-- converts a path into a list of edges (e.g. [1,2,3,2] to [(1,2),(2,3),(3,2)]
buildEdges :: Path -> [Edge]
buildEdges path | null path || length path == 1 = []
| otherwise = (head path, head (tail path)) : buildEdges (tail path)

-- get the list of neighbouring nodes that are yet to be visited in the current path
getUnvisitedNeighbours :: Node -> Path -> [Node]
getUnvisitedNeighbours node path = map (getOtherEnd node) untravelledEdges
where untravelledEdges = deleteFirstsBy edgeEq (getEdges node) (buildEdges path)

-- check whether a given path is a solution
isSolution :: Path -> Bool
isSolution path = head path == 5 && last path == 6 && length path == length (snd theGraph) + 1

The astute reader will observe that the graph object -- is it OK to call things 'objects' in Haskell? -- is baked into the solution; I initially had a version where the graph object was passed as a parameter to every function, but this made things more verbose, and anyway, my objective was not to produce a graph library, but to solve the puzzle.

With that out of the way, time to move on to the actual algorithm. The algorithm is a DFS brute search, where we start at a node (note that the area outside the figure is also modelled as a node), choose one of its unvisited neighbours, choose one of the neighbour's unvisited neighbours, and so on, till we run out of neighbours to visit. Check the path to see if we have covered all the edges and if the path is bookended by the start and end nodes that are of interest to us, and we have our solution. An NP-hard problem, BTW.

It's obvious that recursion is needed here, but I was not sure how to handle the enumeration of the different branches, the backtracking from a dead end, and so on in Haskell, considering that looping is frowned upon, and the strongly typed nature of the language implies that both the if and else clauses should return the same type, i.e. you cannot do the equivalent of:

if <path is a solution>
then <print solution>
else <add next neighbour and try again>

While working on the solution, I was also reading up on monads, and man, are they a pain to wrap your head around. But luckily I ran into [*] Monads as containers (and its sibling Monads as computation, which I'm still digesting), where I learned that a list is also a monad, and we can do stuff like take a list, apply a function that produces a list, and end up with a flat list, so to speak (yeah, we don't need monads for this, a simple map and concat are enough, but I learned this in hindsight, after realizing that >>= was exactly what I was looking for). Anyway, that sort of nails the algorithm:

-- build a list of candidate paths for a given path
getNextChoices :: Path -> [Path]
getNextChoices path = nub (unvisitedNodes >>= (\x -> [reverse (x : reverse path)])) -- hack to append to end of a list
where unvisitedNodes = getUnvisitedNeighbours (last path) path

-- filter out all the solutions from a given list of candidate paths
findSolutions :: [Path] -> [Path]
findSolutions paths = filter isSolution (nub paths)

-- find all the solutions starting from a given list of candidate paths.
-- invoked with a single node path, i.e. solve [[1]]
solve :: [Path] -> [Path]
solve paths = if null choices
then findSolutions paths
else solve choices
where choices = (paths >>= getNextChoices)

solve' :: [Path]
solve' = solve [[5]]

From All About Monads:
One use of functions which return lists is to represent ambiguous computations -- that is computations which may have 0, 1, or more allowed outcomes. In a computation composed from ambiguous subcomputations, the ambiguity may compound, or it may eventually resolve into a single allowed outcome or no allowed outcome at all. During this process, the set of possible computational states is represented as a list. The List monad thus embodies a strategy for performing simultaneous computations along all allowed paths of an ambiguous computation.
The above algorithm is not exactly an ambiguous computation, but the bit about "performing simultaneous computations along all allowed paths" sure resonates with its structure.

And now for the denouement, which by the way, takes a looong time (remember the NP-hardness):

*Main> solve'
[]

Nope, still no solution.

[*] It doesn't reflect too well on a book if I still have to rely on Google to help me out. I'm looking at you, "Real World Haskell".