This one is explored in depth over at nelhagedebugsshit.tumblr.com.
MongoDB has two different approaches to building indexes, selectable by the operator at index-creation time.
- “foreground” index builds lock the the entire server for the duration of the build, but promise faster builds and more compact indexes, since they can look at all of the data up front.
- “background” builds are slower and lead to more disk fragmentation, but allow other queries (reads and writes) to procede during the index build, by building the B-Tree in parallel with ongoing operations, and merging new writes into the index-in-progress.
It’s a fine theory. But until MongoDB 2.6, foreground index builds were actually dramatically slower on large collections!
It turns out that background builds did the obvious naïve B-Tree construction, of just creating an empty B-Tree and inserting each record one at a time. This approach has some weaknesses, but it’s pretty clearly O(n*log n): O(n) inserts, each of which is an O(log n) tree insert.
Foreground builds, able to stop the world, tried to be clever by doing an external-memory sort, and then building the B-Tree in place on top of the sorted records. This approach is in fact much faster if implemented correctly, since it reduces disk I/O (even though it will also be O(n*log n) comparisons asymptotically – you’re still doing a comparison sort, after all).
However, the MongoDB 2.4 and earlier external sort misses this goal! The sort divides the N records into (N/k) chunks, for a roughly-constant k, sorts each chunk, and then merges them by repeatedly scanning each chunk for the global minimum. But for constant k, N/k = O(N), and so it ends up doing N * O(N) comparisons, for O(N²) work. As long as N is small enough that N/k is small, it does beat out the incremental B-Tree build, but somewhere around 1M records it tips over, and gets rapidly slower.
Mongo 2.6 fixes this by using a min-heap to do the merge, restoring O(n*log n) asymptotic performance, while retaining the I/O efficiency of the external sort. Hooray.