A small TCP server #programming #python #haskell
Yesterday at work I needed access to the output of the locate(1)
command remotely.
I wrote a little daemon that I set up as a service on the file server
that I want to run locate
on from another machine.
As I was at work, I implemented it in Python - the lingua franca there, and I was, after all, on company time/money.
It looked like this:
#!/usr/bin/env python3
# located.py - a small tcp server that listens on port 10811 for
# strings to locate, and returns located strings.
import socketserver
import time
import subprocess
class LocateDHandler(socketserver.BaseRequestHandler):
def handle(self):
client_ip, client_port = self.client_address
print("Connection from {}: ".format(client_ip), flush=True, end="")
if client_ip not in ("127.0.0.1"):
print("Not talking to you.")
return
search_for = self.request.recv(1024).decode("utf8").strip()
if "'" in search_for:
print("Not searching for '{}'.".format(search_for))
return
try:
print("Searching for '{}' ... ".format(search_for), flush=True, end="")
start = time.perf_counter()
found = subprocess.check_output(["/usr/bin/locate", search_for])
end = time.perf_counter()
print("found {} hits in {:0.2f}s".format(len(found.splitlines()), end-start))
self.request.sendall(found)
except subprocess.CalledProcessError as e:
print(e)
pass
if __name__ == "__main__":
socketserver.TCPServer.allow_reuse_address = True
with socketserver.ForkingTCPServer(("0.0.0.0", 10811), LocateDHandler) as server:
server.serve_forever()
The port number is ascii l = 108 followed by ascii o = 111, but with a digit cut off, to stay within the allowed port numbers :-)
I was quite happy with finding the socketserver
module and the
relative ease it was to whip this together. The biggest problem I had
was when searching for documentation online, a lot of it referred to
SocketServer
, which was in an older version of Python.
You'll notice that it restricts access to a set of IP-addresses - in
real life (work) the list is different. It also doesn't to the search if
there is a '
in the search term, even though I assume that
subprocess
handles escaping for me.
The only flourish is logging how much time it takes, and logging on one line, flushing each part, making it more exciting to follow "live".
I'm not thrilled about the hardcoded length in the .recv()
call, but I
decided it was "good enough".
When I got home, I reimplemented it in Haskell - without looking at the Python code (I would have to go to another computer to fetch it, to much faff).
This is what I came up with:
-- located.hs - a small tcp server that listens on port 10811 for
-- strings to locate, and returns located strings.
{-# LANGUAGE OverloadedStrings #-}
module Main where
import Network.Simple.TCP
import Network.SockAddr (showSockAddrBS)
import Data.Text as T (pack, strip, unpack, elem)
import Data.Text.IO as T (putStr, putStrLn)
import Data.Text.Encoding as T (decodeUtf8)
import Data.ByteString.Char8 as BS (lines)
import System.IO (hFlush, stdout)
import System.Process.ByteString (readProcessWithExitCode)
import System.Clock
import Formatting
import Formatting.Clock
main :: IO ()
main = do
serve (Host "0.0.0.0") "10811" $ \(connectionSocket, remoteAddr) -> do
let remoteIP = T.decodeUtf8 (showSockAddrBS remoteAddr)
T.putStr ("Connection from " <> remoteIP <> ": ")
hFlush stdout
case remoteIP of
"127.0.0.1" -> T.putStr ""
_ -> error "No thank you."
searchFor <- recv connectionSocket 1024
case searchFor of
Just searchBytes -> do
let search = strip (decodeUtf8 searchBytes)
case T.elem '\'' search of
True -> error ("Won't search for \"" ++ T.unpack search ++ "\"")
False -> T.putStr ("Searching for '" <> search <> "' ... ")
hFlush stdout
start <- getTime Monotonic
(_, result, _) <- readProcessWithExitCode "/usr/bin/locate" [T.unpack search] ""
end <- getTime Monotonic
T.putStrLn ("found " <> T.pack (show (length (BS.lines result))) <> " results in " <> sformat timeSpecs start end)
send connectionSocket result
Nothing -> T.putStrLn "nothing."
It's quite similar, and a little longer.
I dislike having to convert to and from String
in a couple of places,
because that's what the libraries return/expect, i.e.
showSockAddressBS
and readProcessWithExitCode
- and in general there
is too much faff to use Data.Text
. It would be nicer if I could just
import Data.Text.putStrLn
as putStrLn
, even though the prelude has
the String
version of putStrLn
.
In the first iteration I converted the bytes coming back from locate
's
stdout to Text
, and then I converted them to bytes before send
ing
them. That failed when I hit file names with invalid utf-8 encoding, and
the explicit conversions made it easy for me to realize that I could
skip the converting and just treat the result as ByteString
, as soon
as I found the lines
function for ByteStrings.
When I made that change to the Haskell version, I went back and made the same change to the Python version, improving its robustness to rogue filenames.
While the concatenation of strings in the Haskell code is a little
clunky, at least it doesn't fall into Python's "there are too many ways
to format strings" trap (+
? .format()
? f""
?).
Another thing I dislike in the Python version is the awkward idiom if __name__ == "__main__":
at the bottom. That's just an ugly hack
compared to having the call to serve
at the top.
Also, that allow_reuse_address
defaults to False
in Python feels
silly.
When started up the Python version uses 10MB of memory before serving anything, and around 11MB after serving a search with 1.8M hits in 1.89s. On disk the Python version takes up 1.4KB.
Similarly the Haskell version uses 4MB of memory before serving anything, and around 259MB after serving the search with 1.8M hits in 1.86s. On disk the compiled Haskell binary takes up 5.5MB (stripped).
Note that the timings are how long it takes to run locate, so it doesn't say anything about the languages.
@kas suggests putting
--
in as the first argument, to avoid clients passing an option tolocate
- good point!- Adam Sjøgren 🕞︎ - 2023-02-04
Adam Sjøgren skrev:
I sometimes use the digits I would get by typing the name (full or short) of the service with T9. E.g., I have a bookmark manager that is listening on port 2675 (T9: BMRK).
- Klaus Alexander Seistrup 🕧︎ - 2023-02-11