Efficiently calculating Hamming neighbourhoods in Haskell

How can we compute the d-neighbourhood of a point x? Mathematically it’s no problem, we can just write

N_d(x) = \{y\in\{0,1\}^n\,|\,0 < d(x,y)\leq d\}

The first idea would be to exploit Haskell’s math friendly syntactic feature calld list comprehension which would yield

neigh :: Int -> Searchpoint -> [Searchpoint]
neigh d x = [y | y Searchpoint -> [Searchpoint]
neigh d x tabu = (f $ iterate (g ) [x]) \\ [x]
where f = nub' . concat . take d . tail
g = map flipAt [1 .. n]

What’s iterate (g) [x]? Formally it is

iterate (g) [x] == [[x], g[x], g(g[x]), ...]

Since we’re not ineterested in the first element, we only consider the tail of that iterated list. What are the elements? Let’s see the first three results for the search point x=0^5.

g [x] ==
 [[1, , , , ],[ ,1, , , ],[ , ,1, , ],[ , , ,1, ],[ , , , ,1]]

Since g is a list of functions, every element of g is applied to every element of [x] which is just x, so we get exactly the neighbourhood of distance 1 of x.

g (g [x])) ==
 [[ , , , , ],[1,1, , , ],[1, ,1, , ],[1, , ,1, ],[1, , , ,1]
 ,[1,1, , , ],[ , , , , ],[ ,1,1, , ],[ ,1, ,1, ],[ ,1, , ,1]
 ,[1, ,1, , ],[ ,1,1, , ],[ , , , , ],[ , ,1,1, ],[ , ,1, ,1]
 ,[1, , ,1, ],[ ,1, ,1, ],[ , ,1,1, ],[ , , , , ],[ , , ,1,1]
 ,[1, , , ,1],[ ,1, , ,1],[ , ,1, ,1],[ , , ,1,1],[ , , , , ]]

Now every flipping function is applied to the first result. We already see that there are several duplicate elements. In fact there are 25 total results whereas there are only 11 unique results.

g (g (g [x])) ==
 [[1, , , , ],[ ,1, , , ],[ , ,1, , ],[ , , ,1, ],[ , , , ,1]
 ,[ ,1, , , ],[1, , , , ],[1,1,1, , ],[1,1, ,1, ],[1,1, , ,1]
 ,[ , ,1, , ],[1,1,1, , ],[1, , , , ],[1, ,1,1, ],[1, ,1, ,1]
 ,[ , , ,1, ],[1,1, ,1, ],[1, ,1,1, ],[1, , , , ],[1, , ,1,1]
 ,[ , , , ,1],[1,1, , ,1],[1, ,1, ,1],[1, , ,1,1],[1, , , , ]
 ,[ ,1, , , ],[1, , , , ],[1,1,1, , ],[1,1, ,1, ],[1,1, , ,1]
 ,[1, , , , ],[ ,1, , , ],[ , ,1, , ],[ , , ,1, ],[ , , , ,1]
 ,[1,1,1, , ],[ , ,1, , ],[ ,1, , , ],[ ,1,1,1, ],[ ,1,1, ,1]
 ,[1,1, ,1, ],[ , , ,1, ],[ ,1,1,1, ],[ ,1, , , ],[ ,1, ,1,1]
 ,[1,1, , ,1],[ , , , ,1],[ ,1,1, ,1],[ ,1, ,1,1],[ ,1, , , ]
 ,[ , ,1, , ],[1,1,1, , ],[1, , , , ],[1, ,1,1, ],[1, ,1, ,1]
 ,[1,1,1, , ],[ , ,1, , ],[ ,1, , , ],[ ,1,1,1, ],[ ,1,1, ,1]
 ,[1, , , , ],[ ,1, , , ],[ , ,1, , ],[ , , ,1, ],[ , , , ,1]
 ,[1, ,1,1, ],[ ,1,1,1, ],[ , , ,1, ],[ , ,1, , ],[ , ,1,1,1]
 ,[1, ,1, ,1],[ ,1,1, ,1],[ , , , ,1],[ , ,1,1,1],[ , ,1, , ]
 ,[ , , ,1, ],[1,1, ,1, ],[1, ,1,1, ],[1, , , , ],[1, , ,1,1]
 ,[1,1, ,1, ],[ , , ,1, ],[ ,1,1,1, ],[ ,1, , , ],[ ,1, ,1,1]
 ,[1, ,1,1, ],[ ,1,1,1, ],[ , , ,1, ],[ , ,1, , ],[ , ,1,1,1]
 ,[1, , , , ],[ ,1, , , ],[ , ,1, , ],[ , , ,1, ],[ , , , ,1]
 ,[1, , ,1,1],[ ,1, ,1,1],[ , ,1,1,1],[ , , , ,1],[ , , ,1, ]
 ,[ , , , ,1],[1,1, , ,1],[1, ,1, ,1],[1, , ,1,1],[1, , , , ]
 ,[1,1, , ,1],[ , , , ,1],[ ,1,1, ,1],[ ,1, ,1,1],[ ,1, , , ]
 ,[1, ,1, ,1],[ ,1,1, ,1],[ , , , ,1],[ , ,1,1,1],[ , ,1, , ]
 ,[1, , ,1,1],[ ,1, ,1,1],[ , ,1,1,1],[ , , , ,1],[ , , ,1, ]
 ,[1, , , , ],[ ,1, , , ],[ , ,1, , ],[ , , ,1, ],[ , , , ,1]]

In this case there are even 125 total results, but only 15 of those are unique. We can comb out duplicate elements with Haskell’s nub function, and since the order of the elements doesn’t matter in our case we can even use a faster version, namely nub’ = toList . fromList. The functions toList and fromList have to be imported from Data.Set, they are implementations of set/list conversions respectivley. As we’ve seen above a huge effort is being made just to remove duplicate elements. We see that this method has complexity \mathcal{O}(n^d) which is polynomial in n for constant d but exponential in d. Since we’re interested in implementing neighbourhoods efficiently right now we have to come up with something better, preferably a solution that doesn’t even create duplicate elements.

Suppose we need a d-neighbourhood. The search points of Hamming distance d of a specific search point are those points that have exactly three bits flipped. There are exactly \binom{n}{d} of those. The indices of these points can be described by \binom{\haken{n}}{d}. We can calculate these indices efficiently like this:

subsets :: Int -> [a] -> [[a]]
subsets 0 _ = [[]]
subsets _ [] = []
subsets k (x:xs) = map (x:) (subsets (k - 1) xs) ++ subsets k xs

Now we have to augment our flipping function flipAt to be able to cope with lists of indices to flip at.

flipAt' :: [Int] -> Searchpoint -> Searchpoint
flipAt' l x = (foldr (.) id $ map flipAt l) x

The function flipAt’ gets a list of indices and a searchpoint and returns the searchpoint flipped at those indices.

Now we can implement our new neighbourhood function expplouting our stripped down flipAt’ function:

neigh' :: Int -> Searchpoint -> [Searchpoint]
neigh' k x = [flipAt'] <*> (concat $ [(flip subsets [1..n])] [1..k]) <*> [x]

Not only is this implementation a lot shorter, it is also more efficient since duplicate elements aren’t produced.

I didn’t perform any profiling to see some actual results, but the improvement is noticable although I had expected it to be more drastic.

Post a Comment

Your email is never shared. Required fields are marked *

*
*