Tail-Call Optimization (TCO) in Catnip

Catnip supports Tail-Call Optimization to transform recursive tail calls into iterative loops, preventing stack overflow and improving performance.

What is Tail-Call Optimization?

The Problem

Recursive functions can cause stack overflow for deep recursion:

factorial = (n, acc=1) => {
    if n <= 1 { acc } else { factorial(n-1, n*acc) }
}

factorial(10000)  # Stack overflow!

The Solution

TCO transforms tail-recursive calls into loops:

pragma("tco", True)

factorial = (n, acc=1) => {
    if n <= 1 { acc } else { factorial(n-1, n*acc) }
}

factorial(10000)  # No stack overflow!

The compiler transforms this into (conceptually):

def factorial(n, acc=1):
    while True:
        if n <= 1:
            return acc
        # Update parameters and continue loop
        n_new = n - 1
        acc_new = n * acc
        n = n_new
        acc = acc_new
        # Loop continues…

What is a Tail Call?

A tail call is a function call that is the last operation before returning:

✔ Tail Calls (Can be optimized)

sum = (n, acc=0) => {
    if n == 0 { acc }           # Not a call
    else { sum(n-1, acc+n) }    # ✔ Tail call
}

even = (n) => {
    if n == 0 { true }
    elif n == 1 { false }
    else { even(n-2) }          # ✔ Tail call
}

process = (list) => {
    match list {
        [] => 0
        [x, …rest] => process(rest)  # ✔ Tail call
    }
}

✘ Not Tail Calls (Cannot be optimized)

factorial_bad = (n) => {
    if n <= 1 { 1 }
    else { n * factorial_bad(n-1) }  # ✘ Not tail: multiply after call
}

fib = (n) => {
    if n <= 1 { n }
    else { fib(n-1) + fib(n-2) }    # ✘ Not tail: addition after calls
}

double_sum = (n) => {
    2 * sum_to(n)                    # ✘ Not tail: multiplication after call
}

Enabling TCO

Global Enable (Pragma)

pragma("tco", True)

# All functions in this file can be TCO-optimized
countdown = (n) => {
    if n == 0 { "Done!" }
    else { countdown(n-1) }
}

Global Disable

pragma("tco", False)

# TCO disabled for debugging or compatibility
recursive_func = (n) => {
    if n == 0 { 1 }
    else { recursive_func(n-1) }
}

Default Behavior

TCO is enabled by default in Catnip. Use pragma("tco", False) to turn it off.

TCO Detection

Catnip automatically detects tail calls:

  1. Analyzes function body for recursive calls
  2. Checks tail position: Is it the last operation?
  3. Verifies self-recursion: Does it call itself?
  4. Transforms if valid: Converts to iterative loop

Detection Rules

A call is in tail position if:

  • It's the last expression in a function
  • It's in the last branch of an if/match
  • It's the last statement in a block
  • Its result is directly returned

Examples

Example 1: Factorial (Tail-Recursive)

Before TCO:

factorial = (n, acc=1) => {
    if n <= 1 { acc }
    else { factorial(n-1, n*acc) }
}

factorial(5)  # 120
factorial(1000)  # Works with TCO!

What TCO does:

# Conceptual transformation
while True:
    if n <= 1:
        return acc
    # Tail call → parameter update
    n, acc = n-1, n*acc

Example 2: Sum (Tail-Recursive)

pragma("tco", True)

sum_to = (n, acc=0) => {
    if n == 0 { acc }
    else { sum_to(n-1, acc+n) }
}

sum_to(10000)  # 50005000 (no stack overflow!)

Example 3: List Length (Tail-Recursive)

length = (list, acc=0) => {
    match list {
        [] => acc
        [_, …rest] => length(rest, acc+1)
    }
}

length(range(10000))  # 10000 ✔

Example 4: Non-Tail Recursion (Cannot Optimize)

pragma("tco", True)

fib = (n) => {
    if n <= 1 { n }
    else { fib(n-1) + fib(n-2) }  # ✘ Not tail-recursive
}

# TCO has no effect here - still exponential time

Converting Non-Tail to Tail Recursion

Pattern: Add Accumulator

Non-tail:

sum_list = (list) => {
    match list {
        [] => 0
        [x, …rest] => x + sum_list(rest)  # ✘ Addition after call
    }
}

Tail-recursive:

sum_list = (list, acc=0) => {
    match list {
        [] => acc
        [x, …rest] => sum_list(rest, acc+x)  # ✔ Tail call
    }
}

Pattern: Continuation-Passing Style (CPS)

Non-tail:

factorial = (n) => {
    if n <= 1 { 1 }
    else { n * factorial(n-1) }  # ✘ Not tail
}

Tail-recursive with accumulator:

factorial = (n, acc=1) => {
    if n <= 1 { acc }
    else { factorial(n-1, n*acc) }  # ✔ Tail call
}

Performance Impact

Benchmark: Factorial

Implementation n=100 n=1000 n=10000
Non-tail 0.5ms Stack overflow -
Tail + no TCO 0.6ms Stack overflow -
Tail + TCO 0.3ms 3ms 30ms ✔

Benchmark: Sum to N

Implementation n=1000 n=10000 n=100000
No TCO 2ms Stack overflow -
With TCO 1ms 10ms 100ms ✔

Performance Benefits

  • No stack growth: O(1) stack space instead of O(n)
  • Faster execution: Loop is faster than function calls
  • No overhead: Eliminated call/return instructions
  • Cache friendly: Better CPU cache utilization

Implementation Details

Transformation Algorithm

Catnip uses a trampoline execution strategy for TCO:

  1. Detect tail calls in semantic analysis python # In semantic/analyzer.py detector = TailCallDetector(function_name) tail_calls = detector.find_tail_calls(function_body) if tail_calls: mark_as_tail_optimizable(function)

  2. Execute with trampoline loop ```python # In core/nodes_core.pyx Function.call() def call(self, args, *kwargs): ctx.push_scope() try: current_func = self current_args = args current_kwargs = kwargs

       while True:  # Trampoline loop
           # Bind parameters in current scope
           bind_parameters(current_func, current_args, current_kwargs)
    
           # Execute function body
           result = execute_body(current_func.body)
    
           # Check if result is a tail call signal
           if isinstance(result, TailCall):
               # Update params and continue (no new scope!)
               current_func = result.func
               current_args = result.args
               current_kwargs = result.kwargs
               ctx.locals.clear()  # Clear for rebinding
               continue
    
           # Normal return
           return result
    

    finally: ctx.pop_scope() ```

  3. Tail call detection at runtime python # Tail position calls return TailCall instead of executing if node.tail: # Marked by semantic analyzer return TailCall(func, args, kwargs) else: return func(*args, **kwargs)

Key Implementation Details

Trampoline Pattern: Instead of transforming the AST, Catnip uses a runtime trampoline:

  • Function creates scope once at entry
  • Tail calls return TailCall signal instead of executing
  • Trampoline loop rebinds parameters without growing the call stack
  • Only one scope used, regardless of recursion depth

Advantages:

  • Simple: No complex AST transformation
  • Robust: Works with all control flow (if/match/blocks)
  • Debuggable: Can trace virtual call stack
  • Fast: C-optimized parameter binding (Cython)

Performance:

  • O(1) stack space (constant single frame)
  • Fast parameter rebinding via Cython
  • No function call overhead on tail calls

Tail Position Analysis

The detector checks these positions:

  • Block: Last statement
  • If/Match: Each branch's last expression
  • Return: Returned expression
  • Direct call: Function call itself

Limitations

Current TCO limitations:

  1. Self-recursion focus: Best optimization for direct self-calls
  2. Limited mutual recursion: Basic support for f()g() patterns

Planned improvements:

  • [ ] Enhanced mutual tail-call optimization
  • [x] Lambda tail-call support ✔
  • [ ] Automatic accumulator insertion
  • [x] Trampoline for tail recursion ✔

Debugging TCO

CLI Flags

Catnip provides a --tco flag to control TCO behavior:

# Enable TCO (default)
catnip -o tco:on script.cat

# Disable TCO completely
catnip -o tco:off script.cat

# Auto mode: respect pragma directives (default)
catnip -o tco:auto script.cat

TCO Tracing

Enable verbose tracing to see TCO in action:

# Run with TCO trace enabled
catnip -o tco:on --trace-tco script.cat

Example output:

TCO Call Stack:
  → factorial(n=5, acc=1)
  ↺ factorial(n=4, acc=5)    [tail call optimization]
  ↺ factorial(n=3, acc=20)   [tail call optimization]
  ↺ factorial(n=2, acc=60)   [tail call optimization]
  ↺ factorial(n=1, acc=120)  [tail call optimization]
  ← return 120

TCO Statistics:
  Function: factorial
    Total tail calls: 4
    Max depth: 1 (constant stack space!)
    Stack saved: 4 frames

Disable for Debugging

pragma("tco", False)

# Easier to debug with normal stack traces
buggy_recursive = (n) => {
    if n == 0 { 1 }
    else { buggy_recursive(n-1) }
}

Programmatic Inspection

from catnip import Catnip

code = '''
pragma("tco", True)

factorial = (n, acc=1) => {
    if n <= 1 { acc }
    else { factorial(n-1, n*acc) }
}
factorial(10)
'''

cat = Catnip()
result = cat.run(code)

# Check TCO statistics
stats = cat.context.tco_stats
print(f"Tail calls: {stats.get_total_tail_calls()}")
print(f"Functions: {stats.get_function_names()}")

Best Practices

✔ DO

  • Use tail recursion for deep recursion
  • Add accumulator parameters
  • Enable TCO for recursive algorithms
  • Test with large inputs

✘ DON'T

  • Use TCO for shallow recursion (overhead not worth it)
  • Assume all recursion is tail-recursive
  • Rely on TCO for non-tail calls
  • Forget about base cases

Common Patterns

Pattern 1: Accumulator

process = (list, result=[]) => {
    match list {
        [] => result
        [x, …rest] => {
            new_result = […result, transform(x)]
            process(rest, new_result)  # ✔ Tail call
        }
    }
}

Pattern 2: Multiple Accumulators

fold = (list, acc1=0, acc2=1) => {
    match list {
        [] => [acc1, acc2]
        [x, …rest] => fold(rest, acc1+x, acc2*x)  # ✔ Tail call
    }
}

Pattern 3: State Machine

parse = (tokens, state="start") => {
    match [tokens, state] {
        [[], _] => "done"
        [[t, …rest], "start"] => {
            if t == "(" { parse(rest, "paren") }
            else { parse(rest, "start") }  # ✔ Tail calls
        }
        …
    }
}

Comparison with Other Languages

Language TCO Support Default
Scheme Yes Always
Haskell Yes Always
JavaScript Limited ES6+
Python No Never
Catnip Yes Enabled ✔

See Also