Sunday, May 19, 2013

Sorry, but no, even merge sort sucks in scala

This is a reply to an article which compares merge sort in Scala and Java. As anyone who knows their sorting algorithms, merge-sort is the poster boy for a functional sorting -- as its a super simple stable divide-and-conquer algorithm, that enjoys a great worst-case complexity. If there's any that scala would be good at, this is it.


So first looking at the Java code, its 40 lines of pretty ugly and verbose code, but it's well commented and not too bad to follow. Just looking at the implementation, it's not easy to see if bug-free (it is, I tested it) but its pretty see that its quite efficient. It however does a great job of showing what's wrong with Java, and why I hate it.


So now lets jump to the Scala code. First off, as someone with experience in both Scala and Java, the Scala code is *a lot* easier to read, and more obviously correct. The only problem, is this code is incredibly misleading.

Not using an array

This is quite subtle, but one of the big wins in his scala code is being able to pattern match. However, had he been directly comparable to the Java implementation, it would've also used Arrays which would require far uglier/complex pattern matching, namely case (Array(x, xs1 @ _*), Array(y, ys1 @ _*)) and what's worse, is the type of xs1 and ys1 are now Seq[Int] and not Array[Int] which precludes the recursive call...

It's uses O(n) stack space

Which is a complete non-starter. Even a small list with cause a stack overflow.

So lets write it tail-recursive

object MergeSort {
  def apply(xs: Seq[Int]): Seq[Int] = {
    val n = xs.length / 2
    if (n == 0) xs
    else {
      def merge(xs: Seq[Int], ys: Seq[Int], res: Seq[Int]) =
        (xs, ys) match {
          case (Seq(), ys) => res ++ ys
          case (xs, Seq()) => res ++ xs
          case (Seq(x, xs1 @ _*), Seq(y, ys1 @ _*)) =>
            if (x < y) merge(xs1, ys, res :+ x)
            else merge(xs, ys1, res :+ y)
        }
      val (left, right) = xs splitAt (n)
      merge(apply(left), apply(right), Seq())
    }
  }
}
The way this works, is by using the fact that Scala arguments are strict -- so instead of building up a big stack of cons's -- I can collapse it each time immediately. And to make it more comparable with Java's, it now uses Seq instead of a linked list. It's definitely not as elegant, but I still find it quite readable.

The problem

It now is honest and works, but it's slow as shit. My quick benchmarks with 100k Ints
Timing Java MergeSort with random ... ..took 19.97ms
Timing Java MergeSort with ordered ... ..took 4.72ms
Timing Java MergeSort with reversed ... ..took 5.21ms
Timing Spire Merge sort with random ... ..took 86.92ms
Timing Spire Merge sort with ordered ... ..took 4.4ms
Timing Spire Merge sort with reversed ... ..took 3.78ms
Timing Scala MergeSort random ... ..took 365.83s
Timing Scala MergeSort ordered ... ..took 91.8s
Timing Scala MergeSort reverse ... ..took 91.31s
AKA its so slow it is completely unusable. On linked lists, on arrays, on everything. And the variability of its slowness completely depends on the implementations of collections that are passed into it.

Conclusion

Scala is an awesome language compared to Java. But lets be honest with our praise, writing sort algorithms is a domain that Scala thoroughly sucks at. (But fortunately we can write imperative code, like Java, but neater)

If anyone's interested, all code / benchmarks are availabe at my playground