A small TCP server #programming #python #haskell

🕜︎ - 2023-02-04

Yesterday at work I needed access to the output of the locate(1) command remotely.

I wrote a little daemon that I set up as a service on the file server that I want to run locate on from another machine.

As I was at work, I implemented it in Python - the lingua franca there, and I was, after all, on company time/money.

It looked like this:

#!/usr/bin/env python3

# located.py - a small tcp server that listens on port 10811 for
#              strings to locate, and returns located strings.

import socketserver
import time
import subprocess

class LocateDHandler(socketserver.BaseRequestHandler):
    def handle(self):
        client_ip, client_port = self.client_address
        print("Connection from {}: ".format(client_ip), flush=True, end="")
        if client_ip not in (""):
            print("Not talking to you.")
        search_for = self.request.recv(1024).decode("utf8").strip()
        if "'" in search_for:
            print("Not searching for '{}'.".format(search_for))
            print("Searching for '{}' ... ".format(search_for), flush=True, end="")
            start = time.perf_counter()
            found = subprocess.check_output(["/usr/bin/locate", search_for])
            end = time.perf_counter()
            print("found {} hits in {:0.2f}s".format(len(found.splitlines()), end-start))
        except subprocess.CalledProcessError as e:

if __name__ == "__main__":
    socketserver.TCPServer.allow_reuse_address = True
    with socketserver.ForkingTCPServer(("", 10811), LocateDHandler) as server:

The port number is ascii l = 108 followed by ascii o = 111, but with a digit cut off, to stay within the allowed port numbers :-)

I was quite happy with finding the socketserver module and the relative ease it was to whip this together. The biggest problem I had was when searching for documentation online, a lot of it referred to SocketServer, which was in an older version of Python.

You'll notice that it restricts access to a set of IP-addresses - in real life (work) the list is different. It also doesn't to the search if there is a ' in the search term, even though I assume that subprocess handles escaping for me.

The only flourish is logging how much time it takes, and logging on one line, flushing each part, making it more exciting to follow "live".

I'm not thrilled about the hardcoded length in the .recv() call, but I decided it was "good enough".

When I got home, I reimplemented it in Haskell - without looking at the Python code (I would have to go to another computer to fetch it, to much faff).

This is what I came up with:

-- located.hs - a small tcp server that listens on port 10811 for
--              strings to locate, and returns located strings.

{-# LANGUAGE OverloadedStrings #-}
module Main where

import Network.Simple.TCP
import Network.SockAddr (showSockAddrBS)

import Data.Text as T (pack, strip, unpack, elem)
import Data.Text.IO as T (putStr, putStrLn)
import Data.Text.Encoding as T (decodeUtf8)
import Data.ByteString.Char8 as BS (lines)
import System.IO (hFlush, stdout)

import System.Process.ByteString (readProcessWithExitCode)

import System.Clock
import Formatting
import Formatting.Clock

main :: IO ()
main = do
  serve (Host "") "10811" $ \(connectionSocket, remoteAddr) -> do
    let remoteIP = T.decodeUtf8 (showSockAddrBS remoteAddr)
    T.putStr ("Connection from " <> remoteIP <> ": ")
    hFlush stdout
    case remoteIP of
      "" -> T.putStr ""
      _           -> error "No thank you."
    searchFor <- recv connectionSocket 1024
    case searchFor of
      Just searchBytes -> do
        let search = strip (decodeUtf8 searchBytes)
        case T.elem '\'' search of
          True  -> error ("Won't search for \"" ++ T.unpack search ++ "\"")
          False -> T.putStr ("Searching for '" <> search <> "' ... ")
        hFlush stdout
        start <- getTime Monotonic
        (_, result, _) <- readProcessWithExitCode "/usr/bin/locate" [T.unpack search] ""
        end <- getTime Monotonic
        T.putStrLn ("found " <> T.pack (show (length (BS.lines result))) <> " results in " <> sformat timeSpecs start end)
        send connectionSocket result
      Nothing -> T.putStrLn "nothing."

It's quite similar, and a little longer.

I dislike having to convert to and from String in a couple of places, because that's what the libraries return/expect, i.e. showSockAddressBS and readProcessWithExitCode - and in general there is too much faff to use Data.Text. It would be nicer if I could just import Data.Text.putStrLn as putStrLn, even though the prelude has the String version of putStrLn.

In the first iteration I converted the bytes coming back from locate's stdout to Text, and then I converted them to bytes before sending them. That failed when I hit file names with invalid utf-8 encoding, and the explicit conversions made it easy for me to realize that I could skip the converting and just treat the result as ByteString, as soon as I found the lines function for ByteStrings.

When I made that change to the Haskell version, I went back and made the same change to the Python version, improving its robustness to rogue filenames.

While the concatenation of strings in the Haskell code is a little clunky, at least it doesn't fall into Python's "there are too many ways to format strings" trap (+? .format()? f""?).

Another thing I dislike in the Python version is the awkward idiom if __name__ == "__main__": at the bottom. That's just an ugly hack compared to having the call to serve at the top.

Also, that allow_reuse_address defaults to False in Python feels silly.

When started up the Python version uses 10MB of memory before serving anything, and around 11MB after serving a search with 1.8M hits in 1.89s. On disk the Python version takes up 1.4KB.

Similarly the Haskell version uses 4MB of memory before serving anything, and around 259MB after serving the search with 1.8M hits in 1.86s. On disk the compiled Haskell binary takes up 5.5MB (stripped).

Note that the timings are how long it takes to run locate, so it doesn't say anything about the languages.

Intransitive dice #math

🕒︎ - 2023-01-31

The introduction to the Quanta Magazine article "Mathematicians Roll Dice and Get Rock-Paper-Scissors" is fun:

As Bill Gates tells the story, Warren Buffett once challenged him to a game of dice. Each would select one of four dice belonging to Buffett, and then they’d roll, with the higher number winning. These weren’t standard dice — they had a different assortment of numbers than the usual 1 through 6. Buffett offered to let Gates choose first, so he could pick the strongest die. But after Gates examined the dice, he returned a counterproposal: Buffett should pick first.

Gates had recognized that Buffett’s dice exhibited a curious property: No one of them was the strongest. If Gates had chosen first, then whichever die he chose, Buffett would have been able to find another die that could beat it (that is, one with more than a 50% chance of winning).

Seemingly simple things can be so very complicated.

Feedbase year 7 #feedbase

🕜︎ - 2023-01-29

Feedbase has been running for 7 years now. Time flies. I still use and enjoy the system.

In 2022 Feedbase got nntp-connections from 506 uniqe IP-addresses (up from 479). Of these 137 were IPv6 addresses (down from 155). However, 225 addresses only connected once (up from 138).

I was the user connecting the most times. The number of addresses with a 3 digit number of connections was 6 (down from 15). DHPC and IPv6 might be making this count less interesting.

At the end of 2022 there was 3.9 million articles (up from 3.1), and 153 follow ups (down from 159 last year, did I clean up some spam?) The 8 new follow ups in 2022 were all by me, as usual.

The number of commits fell to 16 (from 21). Most of the commits were normalization improvements, to avoid updating when contents hasn't really changed, or to better detect crossposts.

On to the next year!

Unix story time: Bringing a Chainsaw #unix

🕥︎ - 2023-01-08

Marc Donner shares a story about "infiltrating" an IBM facility with Unix: Bringing a Chainsaw.

Another interesting story from the TUHS mailing list.

Null-terminated strings #c #programming

🕙︎ - 2023-01-03

On the TUHS mailing list Doug McIlroy corrects an urban legend propagated by Joel on Software:

The ever brief Ken Thompson confirms Doug's suspicion:

Good to have the record set straight.

Tom Lehrer releases all songs into public domain #music

🕘︎ - 2023-01-01

Tom Lehrer is a pretty cool old dude - he released a statement on November 1, 2022:

I, Tom Lehrer, and the Tom Lehrer Trust 2007, hereby grant the following permissions:

All copyrights to lyrics or music written or composed by me have been relinquished, and therefore such songs are now in the public domain.

In short, I no longer retain any rights to any of my songs.

So help yourselves, and don’t send me any money.

Go grab those sound files, lyrics, and sheet music while you can: tomlehrersongs.com.

I have definitely been humming:

"Once the rockets are up, who cares where they come down? That's not my department!" says Wernher von Braun.

to myself.

Ok, now I'm off poisoning pigeons in the park, bye bye.

TUHS - USENIX flame award #unix #community

🕢︎ - 2022-12-31

This years USENIX flame award went to Warren Toomey for tuhs.org - short for The Unix Heritage Society.

Fittingly this was announced on their mailing list by Douglas McIlroy followed by a bunch of congratulatory messages, including one from Ken Thompson, both of Unix fame.

The mailinglist is worth following.

Lille langebro


Jules Verne (195).


Mia Farrow (78).


Deep blue wins against Kasparov (27).


Charles Darwin (214).

Abraham Lincoln (214).

First colour transmission in Danish television (54).


DASK (65).

Robbie Willams (49).


Valentine's Day.

I 🖤 Free Software Day.


Anne Frank (97).

BBS (45).


Yoko Ono (90).


Python (32).

Mir space station (27).


International modersmålsdag (24).


streetkids.dk (23).


mozilla.org announced (25).

Ørsted satellite in orbit (24).


seistrup.dk (24).