Tail Recursion and Tail Call Optimization

From OSDev Wiki
Jump to: navigation, search



If you’ve done much programming, then you know what recursion is: a function that invokes itself to further its implementation. The canonical example is the Factorial function. (Note that the other canonical example, the Fibonacci sequence, is not conducive to tail recursion.)


In mathematics, the factorial function is written using the ! symbol. When placed after a (whole) number, it means “the result after multiplying this number with every number less than it, down to 1”. Thus 4! is the same as 4x3x2x1, or 24.

Non-recursion implementation

In code, this function could be written many different ways. For example, in the C language you could write it as follows

 unsigned long long Factorial(unsigned start) {
    unsigned long long result = 1;
    for (unsigned i = 2; i <= start; ++i) {
       result *= i;
    } // for
    return result;
 } // Factorial(start)

Note that this function uses long long because the successive multiplications get very big very quickly.

Recursion implementation

Another implementation takes advantage of the fact that factorial is based on a series of numbers, so you can implement it in terms of itself. That is, 4! is the same as 4x3!, which is the same as 4x3x2! etc.

 unsigned long long Factorial(unsigned start) {
    if (start > 1) {
       return Factorial(start - 1) * start;
    } // if
    return 1;
 } // Factorial(start)

A lot simpler code!

Problem with Recursion

To highlight the biggest problem with recursion, though, you need to know what the code is actually making the processor do. Every time a function is called, the processor needs to make room on the stack to hold the intermediate calculations. If you were to look at the processor stack at the moment that it hit the return 1; in the above code (assuming a call to Factorial(4)), you could see something like this:

 [Parameter: 4]
 [Return address: Call to Factorial(4)]
 [Return value: ?]
 [Parameter: 3]
 [Return address: Call to Factorial(3)]
 [Return value: ?]
 [Parameter: 2]
 [Return address: Call to Factorial(2)]
 [Return value: ?]

And, of course, with a larger start parameter, the deeper the stack would be.

Tail Recursion

To help fix this, the programmer can write the code differently - to hint to the compiler that although the code is written recursively, it doesn’t have to be implemented that way.

Rather than calling Factorial(start - 1) in the middle of the function like above, the code could instead be written like this:

 unsigned long long Factorial(unsigned start) {
    if (start <= 1) {
       return 1;
    } // if
    return start * Factorial(start - 1);
 } // Factorial(start)

This code is essentially identical to the previous code, except the if test is reversed, and the recursive call to Factorial(start - 1) is the last thing in the function - at the tail.

Tail-Call Optimisation

So: how does that help? Frankly, it doesn’t - unless the compiler implements “tail-call optimisation”. If the compiler doesn’t, then the above code would produce exactly the same stack as before. But if it does implement it, then the stack depth would be only 1, no matter what the start value was.

Tail-call optimisation is not only useful with recursion. Its hallmark is when there is a call to a new function right at the end of the current function (hence “tail-call”). The compiler can easily realise that there is no point in keeping the current function’s state - it’s no longer going to be needed any more. It can directly modify the current function’s intermediate stack area in preparation for the new function call - and instead of CALLing that code, it instead JMPs to it.

The JMP will be into somewhere in the middle of the function, bypassing the function’s normal initialisation. That doesn’t need to be performed since it’s already been set up by the tail-call optimisation. And the JMP instead of CALL is important since there will not be a corresponding RETurn from the function - other than the RET at the end of the final calculation back to the original CALLing function.


The tail-call optimisation is independent from tail recursion - but it is tricky to implement generically. It is most easily implemented when tail recursion is used, since the compiler is busy compiling the very function that can best take advantage of it - it can modify the way the function is implemented to take full advantage of the situation.

But to get the compiler to even consider using the tail-call optimisation, it is incumbent on the programmer to ensure that the recursive call to itself is the last thing that the function does - the “tail recursion”.

The benefit is faster, more efficient code - in both processor time and memory (stack) usage.

Personal tools