A small TCP server #programming #python #haskell

🕝︎ - 2023-02-04

Yesterday at work I needed access to the output of the locate(1) command remotely.

I wrote a little daemon that I set up as a service on the file server that I want to run locate on from another machine.

As I was at work, I implemented it in Python - the lingua franca there, and I was, after all, on company time/money.

It looked like this:

#!/usr/bin/env python3

# located.py - a small tcp server that listens on port 10811 for
#              strings to locate, and returns located strings.

import socketserver
import time
import subprocess

class LocateDHandler(socketserver.BaseRequestHandler):
    def handle(self):
        client_ip, client_port = self.client_address
        print("Connection from {}: ".format(client_ip), flush=True, end="")
        if client_ip not in (""):
            print("Not talking to you.")
        search_for = self.request.recv(1024).decode("utf8").strip()
        if "'" in search_for:
            print("Not searching for '{}'.".format(search_for))
            print("Searching for '{}' ... ".format(search_for), flush=True, end="")
            start = time.perf_counter()
            found = subprocess.check_output(["/usr/bin/locate", search_for])
            end = time.perf_counter()
            print("found {} hits in {:0.2f}s".format(len(found.splitlines()), end-start))
        except subprocess.CalledProcessError as e:

if __name__ == "__main__":
    socketserver.TCPServer.allow_reuse_address = True
    with socketserver.ForkingTCPServer(("", 10811), LocateDHandler) as server:

The port number is ascii l = 108 followed by ascii o = 111, but with a digit cut off, to stay within the allowed port numbers :-)

I was quite happy with finding the socketserver module and the relative ease it was to whip this together. The biggest problem I had was when searching for documentation online, a lot of it referred to SocketServer, which was in an older version of Python.

You'll notice that it restricts access to a set of IP-addresses - in real life (work) the list is different. It also doesn't to the search if there is a ' in the search term, even though I assume that subprocess handles escaping for me.

The only flourish is logging how much time it takes, and logging on one line, flushing each part, making it more exciting to follow "live".

I'm not thrilled about the hardcoded length in the .recv() call, but I decided it was "good enough".

When I got home, I reimplemented it in Haskell - without looking at the Python code (I would have to go to another computer to fetch it, to much faff).

This is what I came up with:

-- located.hs - a small tcp server that listens on port 10811 for
--              strings to locate, and returns located strings.

{-# LANGUAGE OverloadedStrings #-}
module Main where

import Network.Simple.TCP
import Network.SockAddr (showSockAddrBS)

import Data.Text as T (pack, strip, unpack, elem)
import Data.Text.IO as T (putStr, putStrLn)
import Data.Text.Encoding as T (decodeUtf8)
import Data.ByteString.Char8 as BS (lines)
import System.IO (hFlush, stdout)

import System.Process.ByteString (readProcessWithExitCode)

import System.Clock
import Formatting
import Formatting.Clock

main :: IO ()
main = do
  serve (Host "") "10811" $ \(connectionSocket, remoteAddr) -> do
    let remoteIP = T.decodeUtf8 (showSockAddrBS remoteAddr)
    T.putStr ("Connection from " <> remoteIP <> ": ")
    hFlush stdout
    case remoteIP of
      "" -> T.putStr ""
      _           -> error "No thank you."
    searchFor <- recv connectionSocket 1024
    case searchFor of
      Just searchBytes -> do
        let search = strip (decodeUtf8 searchBytes)
        case T.elem '\'' search of
          True  -> error ("Won't search for \"" ++ T.unpack search ++ "\"")
          False -> T.putStr ("Searching for '" <> search <> "' ... ")
        hFlush stdout
        start <- getTime Monotonic
        (_, result, _) <- readProcessWithExitCode "/usr/bin/locate" [T.unpack search] ""
        end <- getTime Monotonic
        T.putStrLn ("found " <> T.pack (show (length (BS.lines result))) <> " results in " <> sformat timeSpecs start end)
        send connectionSocket result
      Nothing -> T.putStrLn "nothing."

It's quite similar, and a little longer.

I dislike having to convert to and from String in a couple of places, because that's what the libraries return/expect, i.e. showSockAddressBS and readProcessWithExitCode - and in general there is too much faff to use Data.Text. It would be nicer if I could just import Data.Text.putStrLn as putStrLn, even though the prelude has the String version of putStrLn.

In the first iteration I converted the bytes coming back from locate's stdout to Text, and then I converted them to bytes before sending them. That failed when I hit file names with invalid utf-8 encoding, and the explicit conversions made it easy for me to realize that I could skip the converting and just treat the result as ByteString, as soon as I found the lines function for ByteStrings.

When I made that change to the Haskell version, I went back and made the same change to the Python version, improving its robustness to rogue filenames.

While the concatenation of strings in the Haskell code is a little clunky, at least it doesn't fall into Python's "there are too many ways to format strings" trap (+? .format()? f""?).

Another thing I dislike in the Python version is the awkward idiom if __name__ == "__main__": at the bottom. That's just an ugly hack compared to having the call to serve at the top.

Also, that allow_reuse_address defaults to False in Python feels silly.

When started up the Python version uses 10MB of memory before serving anything, and around 11MB after serving a search with 1.8M hits in 1.89s. On disk the Python version takes up 1.4KB.

Similarly the Haskell version uses 4MB of memory before serving anything, and around 259MB after serving the search with 1.8M hits in 1.86s. On disk the compiled Haskell binary takes up 5.5MB (stripped).

Note that the timings are how long it takes to run locate, so it doesn't say anything about the languages.

@kas suggests putting -- in as the first argument, to avoid clients passing an option to locate - good point!

- Adam Sjøgren 🕞︎ - 2023-02-04


Adam Sjøgren skrev:

The port number is ascii l = 108 followed by ascii o = 111, but with a digit cut off, to stay within the allowed port numbers :-)

I sometimes use the digits I would get by typing the name (full or short) of the service with T9. E.g., I have a bookmark manager that is listening on port 2675 (T9: BMRK).

- Klaus Alexander Seistrup 🕧︎ - 2023-02-11


Add comment

To avoid spam many websites make you fill out a CAPTCHA, or log in via an account at a corporation such as Twitter, Facebook, Google or even Microsoft GitHub.

I have chosen to use a more old school method of spam prevention.

To post a comment here, you need to:

¹ Such as Thunderbird, Pan, slrn, tin or Gnus (part of Emacs).

Or, you can fill in this form: