Rowland Carlson

Erlang's is Optimized


Here is Erlang's

map(F, [H|T]) ->
	[F(H)|map(F, T)];
map(F, []) when is_function(F, 1) -> [].

A colleague noted that this implementation doesn't seem tail call optimized.
They noted the list constructor [] is outside of the call to map(F, T),
Which implies that the list constructor will need to wait until after all of the recursive maps are called.

However, lists are core data type in Erlang and Elixir,
So I would be surprised if it weren't optimized.

I wanted to investigate.

First, I implemented map in elixir with the recursive call in the tail:

defmodule Tco do
  def tco_map(f, [h|t]), do: [f.(h)] ++ tco_map(f, t)
  def tco_map(f, []) when is_function(f, 1), do: []

(There's an argument to be made that I should use Erlang here,
But as we'll see that won't matter.)

To compare the two functions,
I found Erlang's :beam_disasm module.

:beam_disasm.file takes a beam file binary and translates it to beam assembly.
I knew that :code.which would return the path of a given module's beam file.


def find_asm(module) do
  |> :code.which(module)
  |> :beam_disasm.file

Calling find_asm(:lists) returns a tuple with disassembled code.
The part I wanted was in the 6th element of the tuple,
A list of functions with their disassembled code:

   {:function, :keyfind, 3, 2,  
      {:label, 1}, 
      {:line, 1}, 
      {:func_info, {:atom, :lists}, {:atom, :keyfind}, 3}, 
      {:label, 2}, 
      {:move, {:atom, :undef}, {:x, 0}},
      {:line, 2}, 
      {:call_ext_only, 1, {:extfunc, :erlang, :nif_error, 1}} 
   {:function, :keymember, 3, 4,  
      {:line, 3}, 
      {:label, 3}, 
      {:func_info, {:atom, :lists}, {:atom, :keymember}, 3}, 
      {:label, 4}, 
      {:move, {:atom, :undef}, {:x, 0}},
      {:line, 4}, 
      {:call_ext_only, 1, {:extfunc, :erlang, :nif_error, 1}} 

The function name appears to be an atom in the second position of 5-tuple.
So I can modify find_asm:

def find_asm(module, fun) do
  |> :code.which(module)
  |> :beam_disasm.file
  |> elem(5)
  |> Enum.find(fn 
     {_, ^fun, _, _, _} -> true
     _ -> false

Now I can call find_asm on both functions,
and get the diff between them:

$ diff -ub map tco_map
--- map.txt     2022-04-28 12:35:00.100609000 -1000
+++ tco_map.txt 2022-04-28 12:34:30.571972000 -1000
@@ -1,27 +1,26 @@
-{:function, :map, 2, 352,
+{:function, :tco_map, 2, 11,
-   {:line, 173},
-   {:label, 351},
-   {:func_info, {:atom, :lists}, {:atom, :map}, 2},
-   {:label, 352},
-   {:test, :is_nonempty_list, {:f, 353}, [x: 1]},
+   {:line, 7},
+   {:label, 10},
+   {:func_info, {:atom, Tco}, {:atom, :tco_map}, 2},
+   {:label, 11},
+   {:test, :is_nonempty_list, {:f, 12}, [x: 1]},
    {:allocate, 2, 2},
    {:move, {:x, 0}, {:y, 1}},
    {:get_list, {:x, 1}, {:x, 0}, {:y, 0}},
    {:move, {:y, 1}, {:x, 1}},
-   {:line, 174},
    {:call_fun, 1},
    {:swap, {:y, 1}, {:x, 0}},
    {:move, {:y, 0}, {:x, 1}},
    {:trim, 1, 1},
-   {:call, 2, {:lists, :map, 2}},
+   {:call, 2, {Tco, :tco_map, 2}},
    {:test_heap, 2, 1},
    {:put_list, {:y, 0}, {:x, 0}, {:x, 0}},
    {:deallocate, 1},
-   {:label, 353},
-   {:test, :is_nil, {:f, 351}, [x: 1]},
-   {:test, :is_function2, {:f, 351}, [x: 0, integer: 1]},
+   {:label, 12},
+   {:test, :is_nil, {:f, 10}, [x: 1]},
+   {:test, :is_function2, {:f, 10}, [x: 0, integer: 1]},
    {:move, nil, {:x, 0}},

The two functions differ in four ways:

All of the other assembly is identical.
So both functions produce roughly the same assembly.
What's going on?

The Erlang documentation explains the mystery:
The Seven Myths of Erlang Performance

At the end of Myth 2.2, we get this note:

(Or it would be more efficient if the compiler did not automatically rewrite [H]++Acc to [H|Acc].)

Myth 2.1 explains that tail recursive functions aren't always faster than recursive functions, and links to Fred Hebert's article:
Erlang's Tail Recursion is Not a Silver Bullet

The entire article is worth reading,
and he puts it well:

One thing that's to remember no matter what is that writing clean code generally leads to better performances. This is something that many languages can not claim as true, but Erlang certainly can. In the general case, the optimisations in the Erlang VM come from observing idiomatic code in production, and clean one at that. This is why body recursion might be better, and why many other optimizations were made. The best recent example I can think of has to do with synchronous messaging patterns, where mailbox scanning has been optimized a little while ago. Clean code is the best code.

If you would like the raw assembly from each function,
I've included both with the code for this article on Github: here.