Tail Recursion and Tail Call Optimization
What is a tail call?
Informally, a tail call is when a function calls another function, and returns immediately. In assembly, it's when a CALL and a RET instructions are found right next to each other. For example, in the following code:
myFunction:
CALL foo
CALL bar
RET
CALL bar
is a tail call, but CALL foo
is not.
Note that, it's important that there is no operation between CALL bar
and RET
. Consider the following example:
int foo(x){
return bar(x) * 2;
}
Even though this looks like a tail call, the caller has to modify the return value after the CALL
operation. As such, it's not actually a tail call. In assembly:
CALL bar
SHL EAX, 1
RET
However the following is a tail call:
int foo(x){
return bar(x * 2);
}
In assembly:
SHL EAX, 1
CALL bar
RET
As you can see, there is no operation between CALL
and RET
this time.
Optimizing a tail call
A tail call can be optimized by simply replacing CALL
with JMP
and removing RET
. So looking at this example, one more time:
int foo(x){
return bar(x * 2);
}
In assembly, no tail call optimization:
SHL EAX, 1
CALL bar
RET
In assembly, with tail call optimization:
SHL EAX, 1
JMP bar
This has a minor speed up as you no longer have to save and restore the instruction pointer. The callee (in this case, bar
) will return all the way back to the caller of this function.
It doesn't have much of an impact on a function like this, but it can have immense impact on recursive calls. A recursive call that takes 1000 recursions will have to save and restore the instruction pointer 1001 times without tail call optimization, but only once with it.
Epilogues and Prologues
In the above examples, it is assumed that functions don't have a prologue or epilogue which is highly impractical. As such a proper implementation of tail call optimization is a lot trickier than shown here. This is only a rudimentary example.