# Tail Recursion and Tail Call Optimization

## Contents |

## Recursion

If you’ve done much programming, then you know what recursion is: a function that invokes itself to further its implementation. The canonical example is the Factorial function. (Note that the other canonical example, the Fibonacci sequence, is not conducive to tail recursion.)

### Factorial

In mathematics, the factorial function is written using the `!`

symbol. When placed after a (whole) number, it means “the result after multiplying this number with every number less than it, down to 1”. Thus `4!`

is the same as `4x3x2x1`

, or `24`

.

#### Non-recursion implementation

In code, this function could be written many different ways. For example, in the C language you could write it as follows

unsigned long long Factorial(unsigned start) { unsigned long long result = 1; for (unsigned i = 2; i <= start; ++i) { result *= i; } // for return result; } // Factorial(start)

Note that this function uses `long long`

because the successive multiplications get very big very quickly.

#### Recursion implementation

Another implementation takes advantage of the fact that factorial is based on a series of numbers, so you can implement it *in terms of itself*. That is, `4!`

is the same as `4x3!`

, which is the same as `4x3x2!`

etc.

unsigned long long Factorial(unsigned start) { if (start > 1) { return Factorial(start - 1) * start; } // if return 1; } // Factorial(start)

A lot simpler code!

## Problem with Recursion

To highlight the biggest problem with recursion, though, you need to know what the code is actually making the processor do. Every time a function is called, the processor needs to make room on the stack to hold the intermediate calculations. If you were to look at the processor stack at the moment that it hit the `return 1;`

in the above code (assuming a call to `Factorial(4)`

), you could see something like this:

[Parameter: 4] [Return address: Call to Factorial(4)] [Return value: ?] [Parameter: 3] [Return address: Call to Factorial(3)] [Return value: ?] [Parameter: 2] [Return address: Call to Factorial(2)] [Return value: ?]

And, of course, with a larger `start`

parameter, the deeper the stack would be.

### Tail Recursion

To help fix this, the programmer can write the code differently - to hint to the compiler that although the code is written recursively, it doesn’t have to be implemented that way.

Rather than calling `Factorial(start - 1)`

in the middle of the function like above, the code could instead be written like this:

unsigned long long Factorial(unsigned start) { if (start <= 1) { return 1; } // if return start * Factorial(start - 1); } // Factorial(start)

This code is essentially identical to the previous code, except the `if`

test is reversed, and the recursive call to `Factorial(start - 1)`

is the last thing in the function - at the **tail**.

### Tail-Call Optimisation

So: how does that help? Frankly, it doesn’t - unless the compiler implements “tail-call optimisation”. If the compiler doesn’t, then the above code would produce exactly the same stack as before. But if it *does* implement it, then the stack depth would be only `1`

, no matter what the `start`

value was.

Tail-call optimisation is not only useful with recursion. Its hallmark is when there is a call to a new function right at the end of the current function (hence “tail-call”). The compiler can easily realise that there is no point in keeping the current function’s state - it’s no longer going to be needed any more. It can directly modify the current function’s intermediate stack area in preparation for the new function call - and instead of `CALL`

ing that code, it instead `JMP`

s to it.

The `JMP`

will be into somewhere in the middle of the function, bypassing the function’s normal initialisation. That doesn’t need to be performed since it’s already been set up by the tail-call optimisation. And the `JMP`

instead of `CALL`

is important since there will not be a corresponding `RET`

urn from the function - other than the `RET`

at the end of the final calculation back to the original `CALL`

ing function.

## Summary

The tail-call optimisation is independent from tail recursion - but it is tricky to implement generically. It is most easily implemented when tail recursion is used, since the compiler is busy compiling the very function that can best take advantage of it - it can modify the way the function is implemented to take full advantage of the situation.

But to get the compiler to even consider using the tail-call optimisation, it is incumbent on the programmer to ensure that the recursive call to itself is the last thing that the function does - the “tail recursion”.

The benefit is faster, more efficient code - in both processor time and memory (stack) usage.