Efficiently calculating Hamming neighbourhoods in Haskell

How can we compute the d-neighbourhood of a point x? Mathematically it’s no problem, we can just write

N_d(x) = \{y\in\{0,1\}^n\,|\,0 < d(x,y)\leq d\}

The first idea would be to exploit Haskell’s math friendly syntactic feature calld list comprehension which would yield

neigh :: Int -> Searchpoint -> [Searchpoint]
neigh d x = [y | y Searchpoint -> [Searchpoint]
neigh d x tabu = (f $ iterate (g ) [x]) \\ [x]
where f = nub' . concat . take d . tail
g = map flipAt [1 .. n]

What’s iterate (g) [x]? Formally it is

iterate (g) [x] == [[x], g[x], g(g[x]), ...]

Since we’re not ineterested in the first element, we only consider the tail of that iterated list. What are the elements? Let’s see the first three results for the search point x=0^5.

g [x] ==
 [[1, , , , ],[ ,1, , , ],[ , ,1, , ],[ , , ,1, ],[ , , , ,1]]

Since g is a list of functions, every element of g is applied to every element of [x] which is just x, so we get exactly the neighbourhood of distance 1 of x.

g (g [x])) ==
 [[ , , , , ],[1,1, , , ],[1, ,1, , ],[1, , ,1, ],[1, , , ,1]
 ,[1,1, , , ],[ , , , , ],[ ,1,1, , ],[ ,1, ,1, ],[ ,1, , ,1]
 ,[1, ,1, , ],[ ,1,1, , ],[ , , , , ],[ , ,1,1, ],[ , ,1, ,1]
 ,[1, , ,1, ],[ ,1, ,1, ],[ , ,1,1, ],[ , , , , ],[ , , ,1,1]
 ,[1, , , ,1],[ ,1, , ,1],[ , ,1, ,1],[ , , ,1,1],[ , , , , ]]

Now every flipping function is applied to the first result. We already see that there are several duplicate elements. In fact there are 25 total results whereas there are only 11 unique results.

g (g (g [x])) ==
 [[1, , , , ],[ ,1, , , ],[ , ,1, , ],[ , , ,1, ],[ , , , ,1]
 ,[ ,1, , , ],[1, , , , ],[1,1,1, , ],[1,1, ,1, ],[1,1, , ,1]
 ,[ , ,1, , ],[1,1,1, , ],[1, , , , ],[1, ,1,1, ],[1, ,1, ,1]
 ,[ , , ,1, ],[1,1, ,1, ],[1, ,1,1, ],[1, , , , ],[1, , ,1,1]
 ,[ , , , ,1],[1,1, , ,1],[1, ,1, ,1],[1, , ,1,1],[1, , , , ]
 ,[ ,1, , , ],[1, , , , ],[1,1,1, , ],[1,1, ,1, ],[1,1, , ,1]
 ,[1, , , , ],[ ,1, , , ],[ , ,1, , ],[ , , ,1, ],[ , , , ,1]
 ,[1,1,1, , ],[ , ,1, , ],[ ,1, , , ],[ ,1,1,1, ],[ ,1,1, ,1]
 ,[1,1, ,1, ],[ , , ,1, ],[ ,1,1,1, ],[ ,1, , , ],[ ,1, ,1,1]
 ,[1,1, , ,1],[ , , , ,1],[ ,1,1, ,1],[ ,1, ,1,1],[ ,1, , , ]
 ,[ , ,1, , ],[1,1,1, , ],[1, , , , ],[1, ,1,1, ],[1, ,1, ,1]
 ,[1,1,1, , ],[ , ,1, , ],[ ,1, , , ],[ ,1,1,1, ],[ ,1,1, ,1]
 ,[1, , , , ],[ ,1, , , ],[ , ,1, , ],[ , , ,1, ],[ , , , ,1]
 ,[1, ,1,1, ],[ ,1,1,1, ],[ , , ,1, ],[ , ,1, , ],[ , ,1,1,1]
 ,[1, ,1, ,1],[ ,1,1, ,1],[ , , , ,1],[ , ,1,1,1],[ , ,1, , ]
 ,[ , , ,1, ],[1,1, ,1, ],[1, ,1,1, ],[1, , , , ],[1, , ,1,1]
 ,[1,1, ,1, ],[ , , ,1, ],[ ,1,1,1, ],[ ,1, , , ],[ ,1, ,1,1]
 ,[1, ,1,1, ],[ ,1,1,1, ],[ , , ,1, ],[ , ,1, , ],[ , ,1,1,1]
 ,[1, , , , ],[ ,1, , , ],[ , ,1, , ],[ , , ,1, ],[ , , , ,1]
 ,[1, , ,1,1],[ ,1, ,1,1],[ , ,1,1,1],[ , , , ,1],[ , , ,1, ]
 ,[ , , , ,1],[1,1, , ,1],[1, ,1, ,1],[1, , ,1,1],[1, , , , ]
 ,[1,1, , ,1],[ , , , ,1],[ ,1,1, ,1],[ ,1, ,1,1],[ ,1, , , ]
 ,[1, ,1, ,1],[ ,1,1, ,1],[ , , , ,1],[ , ,1,1,1],[ , ,1, , ]
 ,[1, , ,1,1],[ ,1, ,1,1],[ , ,1,1,1],[ , , , ,1],[ , , ,1, ]
 ,[1, , , , ],[ ,1, , , ],[ , ,1, , ],[ , , ,1, ],[ , , , ,1]]

In this case there are even 125 total results, but only 15 of those are unique. We can comb out duplicate elements with Haskell’s nub function, and since the order of the elements doesn’t matter in our case we can even use a faster version, namely nub’ = toList . fromList. The functions toList and fromList have to be imported from Data.Set, they are implementations of set/list conversions respectivley. As we’ve seen above a huge effort is being made just to remove duplicate elements. We see that this method has complexity \mathcal{O}(n^d) which is polynomial in n for constant d but exponential in d. Since we’re interested in implementing neighbourhoods efficiently right now we have to come up with something better, preferably a solution that doesn’t even create duplicate elements.

Suppose we need a d-neighbourhood. The search points of Hamming distance d of a specific search point are those points that have exactly three bits flipped. There are exactly \binom{n}{d} of those. The indices of these points can be described by \binom{\haken{n}}{d}. We can calculate these indices efficiently like this:

subsets :: Int -> [a] -> [[a]]
subsets 0 _ = [[]]
subsets _ [] = []
subsets k (x:xs) = map (x:) (subsets (k - 1) xs) ++ subsets k xs

Now we have to augment our flipping function flipAt to be able to cope with lists of indices to flip at.

flipAt' :: [Int] -> Searchpoint -> Searchpoint
flipAt' l x = (foldr (.) id $ map flipAt l) x

The function flipAt’ gets a list of indices and a searchpoint and returns the searchpoint flipped at those indices.

Now we can implement our new neighbourhood function expplouting our stripped down flipAt’ function:

neigh' :: Int -> Searchpoint -> [Searchpoint]
neigh' k x = [flipAt'] <*> (concat $ [(flip subsets [1..n])] [1..k]) <*> [x]

Not only is this implementation a lot shorter, it is also more efficient since duplicate elements aren’t produced.

I didn’t perform any profiling to see some actual results, but the improvement is noticable although I had expected it to be more drastic.

USB Microscope

This week, the discounter Lidl sold Bresser USB microscopes for only 29,99EUR, so i got myself one, knowing that the quality would be rather poor. Provided I’ll have 10h of fun with it, that makes 3EUR/h, which is a good deal.

As expected, Linux supports the microscope via the UCV drivers, so it’s literally plug and play.

> uvcvideo: Found UVC 1.00 device USB2.0 Camera (1871:0101)

While you can use any tool like ucview or cheese, I prefer using mplayer.

> mplayer tv:// -tv driver=v4l2:width=640:height=480:device=/dev/video1

But let’s skip the details, here are two shots of a dark winged fungus gnat:

20x magnification

350x magnification

20x magification with a 1c-coin

DVBTheremin

Everyone with a DVB-T receiver knows that the tuner signal can get corrupted by the weather or in my case the bus which stops in front of the house I live in. Some DVB-T receivers display the tuner signal in an OSD of your TV. Since I’m using a DVB-T card in my PC with mplayer to watch TV I don’t see anything like that, so I found out that if you’ve tuned to some channel you can use the debugging and analyzing tool dvbsnoop to check the signal which looks like this in my case:

[cgie@talia:~] dvbsnoop -s signal -pd 1 -n 10
dvbsnoop V1.4.50 -- http://dvbsnoop.sourceforge.net/
Sig: 10537
Sig: 10537
Sig: 10537
Sig: 10537
Sig: 10537
Sig: 10537
Sig: 10537
Sig: 10537
Sig: 10537
Sig: 10537

I don’t know what these numbers actually represent. 10537 corresponds with a good signal in my case. 10537 is the product of the two primes 41 and 257… but I guess that’s irrelevant. One might think that the numbers have certain range and that the higher numbers correspond with a better signal. In fact it’s the other way around and there are discrete values which represent several states of signal quality. However, the only interesting fact is that the numbers change if the signal does.

Step 1

Tune your DVB card to some channel, for example:

tzap -c .mplayer/channels.conf -r "Das Erste(NDR)"

Step 2

The next step is to convert these numbers to an audio signal, a perfect job for Pure Data which is “is a visual programming language developed by Miller Puckette in the 1990s for the creation of interactive computer music and multimedia works” (Wikipedia). The problem is to pipe the terminal output of dvbsnoop into Pure Data. I solved this by sending the outputs via UDP to my Pure Data patch, bash makes socket programming very easy. I used this simple script:

SERVER="127.0.0.1"
PORT=3333

exec 0<>/dev/udp/${SERVER}/${PORT} || exit # SDTIN
exec 1>&0                                  # STDOUT

while true; do
  dvbsnoop -s signal -pd 1 -n 1 | sed -ne 's/Sig: //p' | tail -1
done

This script does more than you actually need, you can also pipe the output directly to /dev/udp/${SERVER}/${PORT}. STDIN is only used with TCP connections and the script doesn’t receive anything. Anyway, if you run this script the signal values are constantly sent to port 3333 on your localhost.

Step 3

This the Pure Data patch I’m using.
PD patch for the DVB theremin
Basically the patch receives the output with the netreceive object and feeds an oscillator with it. I put some objects into the patch to adjust the volume and the pitch of the sound. The interesting thing to really make it sound like a theremin are the arithmetic operations on the upper left, some frequencies get interpolated between two outputs of dvbsnoop to create the typical sweeping sounds of a theremin, yeah I know, it’s kind of phony.

Here’s the result:

SciFi Queue

I’ve gathered a lot of sci-fi books in 2010, mostly from flea markets for around about 50c each. The publisher Heyne has its own science fiction series and makes it easy for scifi-nerds to find by printing white “SF” letters onto the black back of the book. If you happen to pile up a certain amount of something you almost feel like a collector.  Recently I’ve obained almost the whole original Dune cycle by Frank Herbert on a book sale at uni for little money, Heyne SF of course.

The following picture shows the books I’ve read so far, I rated them as if I had to rate a movie on IMDB, this is of course highly subjective.

The next picture is a more sophisticated way of rating: I rated some characteristics of the books independently. These numbers don’t correspond with the ones above at all, but they aren’t comparable anyway since my IMDB-ratings reflect my final judgement while the numbers below don’t really render my general impression.

Also: star diagrams look cool.

My DIY Stethophone

My DIY Stethophone
Actually I wanted to build a hydrophone, but making a stethophone is just too easy. Besides: I gave my niece a stethoscope for Christmas, because we happen to play doctor fairly often and she has two “fake” stehoscopes for children that actually serve no purpose but being an ugly dummy. I got myself one, too, the price is 15EUR,  not really cheap, but at least it works. First, I removed the rubber earplugs, they’re not fixed with glue or anything else, you can just unplug them, then I attached an old pair of diy microphones at the ends with hot melt adhesive. The microphones are cheap mic capsules each soldered to an old earplug wire. The capsules are actually quite good, I used electret mic capsules for 1EUR each, they record ferquencies from 20HZ to 20kHz. I then just plugged the mics to my Edirol R09HR (actually i plugged them into my altoids battery box because the power supply of my R09HR seems to be broken).

My recorded samples are somewhat disappointing, I expected some kind of super effective James Bond like gadget, but the sounds you get are very very quiet, and the background noise after normalization/amplification is very dominant. Nonetheless the mic capsules prove to be extremely precise. Here’s a sample of my heartbeat when i put the bell on my cervical artery. It’s the exact up and down movement of the skin induced by the rush of blood.

Come on, purr! PURR HARDER!I experimented with recording the purring sound of my cat, but i only got garbage. The noise of the fur swishing against the bell’s membrane was very dominant and my cat wasn’t really cooperating.

Bash Autocompletion for DVB channels

I finally added autocompletion for my TV channels for mplayer which is far superior to my previous method of looking up the exact channel’s name in my channels.conf and typing mplayer dvb://”channel’s name” in a terminal.

I just added this very simple script to /etc/bash-completion.d/

_channel()
{
local cur
COMPREPLY=()
cur="${COMP_WORDS[COMP_CWORD]}"
local chans=$(cat .mplayer/channels.conf |sed -ne 's/^\([^:]*\).*/"\1"/p')
oifs="$IFS"
IFS=$'\n' COMPREPLY=($(compgen -W "${chans}" $cur))
IFS="$oifs"
}
 
complete -F _channel channel

Then I added the following to my .bashrc:

function channel {
    mplayer dvb://"$1"
}

So it’s channel “<TAB>  for me now.

Done.

My favorite onomatopoeia – Mau!

Every time i listen to this it gets scarier…

Mau!

Quantifying God

Okay, this is pure nonsense, but anyway, I’d like to share some thoughts in order to quantify and in some way “locate” god.

The whole idea is to work with distances, everyone can measure his or her felt distance to some objects, things or even thoughts. While it’s hard to actually give numbers to those felt distances we can always compare them to each other. My distance to the wall is around about one metre right now. My distance to Barack Obama is rather big (physically) and although the physical distance to, let’s say, Geert Wilders is rather small, my felt distance to him is larger than that to Barack Obama. The distance to the thought of becoming president of the United States is large, and there’s some kind of imaginary part to that particular thought because I’d have to be born there. So it makes sense to see all these objects combined by a real and an imaginary part… which motivates considering all of these things in a way complex numbers are considered, so let’s assign every object an imaginary and a real part: x = Re(x)+ i Im(x). Since I feel pretty real Im(\text{myself})=0 and Re(\text{myself})=\text{myself} for example.
Let’s try to phrase it mathematically: let d(x,y) be the “felt distance” of two objects x,y. We certainly have d(x,x)=0, the triangle inequality holds trivially and for the sake of simplicity let’s assume symmetry. Of course the felt distance of two people does not need to be the same, but quasimetrics can simply be transformed into metrics and d(d(x,y),d(y,x)) is rarely very large, so the felt distance is a metric.

Let g denote God and c myself. What’s d(c,g) then? In the Lord’s Prayer we find the first line to be “Our Father who art in heaven” which implies two bounds: d(c,g) < r_\text{heaven} and d(c,g) > r_\text{family}, where r_\text{heaven} is my distance to heaven and r_\text{family} is the distance i have to anything or anyone family-related. The r_\text{heaven}-sphere around me as its center intersects with the reals in exactly two points which would symbolize the real part of heaven which would probably be paradise and paradise is, hands down, very distant, so for all people p, family or not, no matter if they’re alive or not r_\text{family} \leq d(c,p) < r_\text{heaven}

Let j denote Jesus. Since only few historians doubt Jesus’ existence as a person, let’s assume Im(j) = 0. Due to trinity there exists some argument \varphi > 0 such that g = j\cdot e^{i\varphi}. This equality would symbolize Jesus’ resurrection then. From a physical perspective my distance to Jesus is huge, but due to my argumentation above: r_\text{family} < d(c,j) < r_\text{heaven}.
All these considerations boil down to this image:

There are some nice conclusions which I won’t interpret any further:

  1. If God doesn’t exist, then Re(g) = c.
  2. d(c,g) \leq d(c,j) + d(j,g), so according to the church the only way to god, which is Jesus, is longer than the direct path.
  3. No matter if God exists: Re(g) = Re(\bar{g}) which is scary, think about it.

De Morgan’s laws in procmail recipes

I have an account at dimeadozen, a torrent tracker “for audio and video recordings of independent origin (ROIO) which have not been officially released“. If you know me you’re aware of my hobby of recording concerts. Dime is by far the best resource to share those recordings although I appreciate archive.org‘s approach, too.

I don’t want to check the site every day to see if there’s something I’m interested in, so I’m subscribed to their bot mailing list that sends out an email for every new, removed, deleted and banned torrent on the tracker. I like that approach, in times of RSS/ATOM feeds or twitter this seems a bit old-fashioned, but emails are around for some decades now, i guess that makes them a pretty robust data to handle.

Anyway: There are around 100-200 torrent actions a day, each of them is announced by an email. The subject line look like this:

Subject: [bot-dimeadozen-org] NEW on DIME: Grinderman HMVRelentless Garage 23 Sept 2010
Subject: [bot-dimeadozen-org] Removed from DIME: PJ Harvey & John Parish - The Ritz, Manchester, England - 24th April 2009
Subject: [bot-dimeadozen-org] BANNED from DIME: PJ Harvey 1995 Peel & BBC (Foxtrotter)

My filter rule would look like this in pseudocode:

if the list-id is bot-dimeadozen-org then
  if the subject contains "banned" OR contains "removed" etc then
    delete the email
  else if the subject contains artist name 1 OR contains artist name 2 etc then
    save the email in a specific mailbox
  else
    delete the email

Checking the list ID is a one-liner, but all the other conditions are rather lengthy and though easily done would result in awkwardly long regular expressions. As you can see the second condition checks if some keywords occur in the subject line. Each would be represented like this in procmail:

* ^Subject:.*BANNED.from.DIME.*
* ^Subject:.*Removed.from.DIME.*
* ^Subject:.*DIME.BAN.LIFTED.*
* ^Subject:.*Deleted from DIME.*

Subsequent condition lines are logically combined via conjunction (AND that is) so this is not what I need, but using de Morgan’s laws I can simply negate each condition and switch the “then” and “else” parts, so the modified filter rule looks like this:

if the list-id is bot-dimeadozen-org then
  if the subject does NOT contain "banned" AND does not contain "removed" etc then
    if the subject does NOT contain artist name 1 AND does NOT contain artist name 2 etc then
      delete the email
    else
      save to a specific mailbox
  else
    delete the email

This is great because this directly transforms into the following procmail recipe:

:0
* ^List-Id:.*bot-dimeadozen-org\.yahoogroups\.com
{
  :0
  * ! ^Subject:.*BANNED.from.DIME.*
  * ! ^Subject:.*Removed.from.DIME.*
  * ! ^Subject:.*DIME.BAN.LIFTED.*
  * ! ^Subject:.*Deleted from DIME.*
  {
    :0
    * ! ^Subject:.*(pj|polly.jean|dalek|d.lek).*
    * ! ^Subject:.*(hugo.race|dirt.music|dirtmusic|hr.tts|sepiatone|merola.matrix).*
    * ! ^Subject:.*(nick.cave|nicholas.*cave|grinderman|bad.seeds|birthday.party|boys.next.door).*
    * ! ^Subject:.*(f.nt.mas|patton|lovage|tomahawk|bungle|peeping.tom|mondo.cane).*
    * ! ^Subject:.*(mastodon|melt.banana|mono|neurosis|isis|jesu|earth|pelican|red.sparowes).*
    * ! ^Subject:.*(sigur|gun.club|lizaveta|rowland|sunn|goldfrapp|neubauten|cult.of.luna|bohren|blixa).*
    * ! ^Subject:.*(meshuggah).*
    $MAILDIR/.dimebot-all/
    :0 E
    $MAILDIR/.dimebot-wanted/
  }
  :0 E
  $MAILDIR/.dimebot-all/
}

Keeping /etc under control

I like tweaking my config files in a lot so keeping /etc under revision control makes sense. My weapon of choice is git without any special reason. I’m actually used to svn but I don’t see the point in a central repository which leaves git and mercurial and this site nails it.

So what’s actually the problem? Basically I want every change automatically revisioned. This won’t work very well with manual changes, but I can live with that, but I want to autoamtically commit changes every time I install or upgrade a package. There are tools like etckeeper that do the trick, but I don’t want an extra daemon for that purpose, besides: pacman doesn’t have post installation hooks implemented like apt does, I’d have to switch to pacman-g2, but I don’t want that either.

This is how I do it: I mostly use yaourt which is basically a wrapper for pacman package manager and instead of hacking a wrapper for yaourt I just use a wrapper script for pacman which executes pacman and commits any changes in /etc. Then point yaourt to it.

First create your git repo, as root perform:

$ cd /etc
$ git init
$ git commit -m "Initial commit"

Now that your repo is ready, paste this into /usr/local/sbin/etcpacman

#!/bin/sh
# ---------------------------------------------------------------------
# file:     /usr/local/sbin/etcpacman
# author:   Christian Gießen  http://sainthuck.de
# modified: September 2010
# ---------------------------------------------------------------------
 
# The script uses fancy colors
. /etc/bash.colors
# First execute pacman-color with any originally passed parameters
/usr/bin/pacman-color "${@}"
# Then git-commit any changes to the git repo and add an informative
# commit message.
echo "[${yellow}${0}${NC}] ${yellow}Updating git repository /etc${NC}"
cd /etc
git add .
echo "[${yellow}${0}${NC}] ${red}git commit -a -m\"Snapshot after: pacman-color ${*}\"${NC}"
git commit -a -m"Snapshot after: pacman-color ${*}"
echo "[${yellow}${0}${NC}] ${yellow}Done.${NC}"

Then make sure that PACMANBIN=”/usr/local/sbin/etcpacman” in your /etc/yaourtrc and you’re done.

And before anyone asks, this is the color file I’m sourcing in my script, very handy for sexy scripts:

red=$'\e[0;31m'
RED=$'\e[1;31m'
blue=$'\e[0;34m'
BLUE=$'\e[1;34m'
cyan=$'\e[0;36m'
CYAN=$'\e[1;36m'
NC=$'\033[0m' # no color
black=$'\e[0;30m'
BLACK=$'\e[1;30m'
green=$'\e[0;32m'
GREEN=$'\e[1;32m'
yellow=$'\e[0;33m'
YELLOW=$'\e[1;33m'
magenta=$'\e[0;35m'
MAGENTA=$'\e[1;35m'
white=$'\e[0;37m'
WHITE=$'\e[1;37m'