Begin main content

Parallel sorting in Erlang

Of course super-light threads for parallel programming is Erlang's killer feature, so let's apply that to a simple merge sort. Erlang's spawning and message passing interface is fairly straightforward, but a lot of parallel tasks are simple start -> process -> return type tasks, so we can simplify our syntax even further by using some nice helper functions such as Joe Armstrong's parallel map pmap or Anonymous Matthew's parallel eval pareval (via Philip Robinson's blog).

Parallel map makes light work of a massively parallel merge sort:

-module(mergesort).
-export([msort/1]).

%% terminal cases for list of less than 2 entries

msort([]) ->
    [];
msort([A]) ->
    [A];

% split list into 2 & mergesort them

msort(L) ->
    Mid = trunc(length(L) / 2),
    {A, B} = lists:split(Mid, L),
    lists:umerge( pmap(fun(I) -> msort(I) end, [A, B]) ).
This gets really interesting in certain cases eg. where acquisition of the data is expensive but randomly accessible.

Instead of passing in a simple list, our sort function could accept a future that would promise to return any requested linear subset of the data. Assuming we know the total length of the source list, no data is needed until we reach the merge phase at the bottom of the recursion stack, and incrementally more data is required as we bubble back up to the final result.

In this example we are doing our network retrieval in parallel with our sorting (which itself is happening in parallel). If the remote data comes from multiple hosts we could parallelise the retrieval by using multiple port instances.

03:58 PM, 03 Dec 2007 by Mark Aufflick Permalink | Short Link

Add comment