koldfront

Naïve quicksort speeds #haskell #python

🕢︎ - 2021-02-17

I was watching Graham Hutton's lectures on Haskell and he used the naïve straight-forward implementation of quicksort as the first code example, working out what it does.

For fun, I wrote the same in Python, to compare elegance. Running the code for even small lists gave an exception about recursion depth. Fair enough, recursion is common in functional programming and less so in imperative languages like Python (the Haskell version doesn't throw such exceptions).

As it is the naïve version, I had to scramble the list to fix this, as running on the sorted list gives us the worst case (the pivot element is näively taken as the first in the list, so the algorithm splits the list in one element and the rest, oops). Here is the Python code:

from random import sample

def qsort(list):
    if list == []:
        return list
    else:
        x=list[0]
        xs=list[1:]
        return qsort([y for y in xs if y < x]) + [x] + qsort([z for z in xs if z >= x])

n=10000000
g=qsort(sample(range(1, n+1), n))
print(g[-1])

The code is short, but perhaps not what you would call elegant? The list comprehension part does look quite close to the Haskell version, but the pattern matching elegance is missing (although recent PEPs are going to add pattern matching to Python, I am not sure if it is going to look more elegant). Also, I skipped the conventional if __name__ == "__main__": boiler-plate, which would have made it look worse.

Running it on a list of the integers from 1 to 10 million (scrambled) and printing the last (to not make printing be the limiting factor) takes ~48 seconds on my laptop, and makes the fan spin up.

I was expecting the Haskell version to be more elegant to look at and run faster, as it is - after all - compiled. When I printed the entire list, it was very slow, so again I chose to just print the last entry of the result:

import System.Random
import System.Random.Shuffle

f [] = []
f (x:xs) = f ys ++ [x] ++ f zs
  where
    ys = [ y | y <- xs, y < x ]
    zs = [ z | z <- xs, z >= x ]

main = do
  gen <- getStdGen
  let g = f (shuffle' [1..10000000] 10000000 gen)
  putStrLn (show (last g))

Much to my surprise, running this takes ~55 seconds - around 14% slower than Python?! Usually people talk about Python being slow. I know this is the naïve(st) implementation, but so is the Python one I'm comparing with.

It would be interesting to know what part of the Haskell version the most time is spent in.

Whoa, just changing the last line from:

  putStrLn (show (last g))

to:

  putStrLn (show (head g))

changes the running time to ~39 seconds. I guess lists really are implemented very inefficiently. Or am I fooling myself with laziness?

This was with Python 3.9.1 and GHC 8.8.4 on an Intel i7-9750H equipped laptop running Debian unstable.

Add comment

To avoid spam many websites make you fill out a CAPTCHA, or log in via an account at a corporation such as Twitter, Facebook, Google or even Microsoft GitHub.

I have chosen to use a more old school method of spam prevention.

To post a comment here, you need to:

¹ Such as Thunderbird, Pan, slrn or Gnus (part of Emacs).

Or, you can fill in this form:

+=