Tail-Call Optimization (TCO) in Catnip

Catnip supports Tail-Call Optimization to transform recursive tail calls into iterative loops, preventing stack overflow and improving performance.

What is Tail-Call Optimization?

The Problem

Recursive functions can cause stack overflow for deep recursion:

factorial = (n, acc=1) => {
    if n <= 1 { acc } else { factorial(n-1, n*acc) }
}

factorial(10000)  # Stack overflow!

The Solution

TCO transforms tail-recursive calls into loops:

pragma("tco", True)

factorial = (n, acc=1) => {
    if n <= 1 { acc } else { factorial(n-1, n*acc) }
}

factorial(10000)  # No stack overflow!

The compiler transforms this into (conceptually):

def factorial(n, acc=1):
    while True:
        if n <= 1:
            return acc
        # Update parameters and continue loop
        n_new = n - 1
        acc_new = n * acc
        n = n_new
        acc = acc_new
        # Loop continues…

What is a Tail Call?

A tail call is a function call that is the last operation before returning:

✔ Tail Calls (Can be optimized)

sum = (n, acc=0) => {
    if n == 0 { acc }           # Not a call
    else { sum(n-1, acc+n) }    # ✔ Tail call
}

even = (n) => {
    if n == 0 { true }
    elif n == 1 { false }
    else { even(n-2) }          # ✔ Tail call
}

process = (list) => {
    match list {
        [] => 0
        [x, …rest] => process(rest)  # ✔ Tail call
    }
}

✘ Not Tail Calls (Cannot be optimized)

factorial_bad = (n) => {
    if n <= 1 { 1 }
    else { n * factorial_bad(n-1) }  # ✘ Not tail: multiply after call
}

fib = (n) => {
    if n <= 1 { n }
    else { fib(n-1) + fib(n-2) }    # ✘ Not tail: addition after calls
}

double_sum = (n) => {
    2 * sum_to(n)                    # ✘ Not tail: multiplication after call
}

Enabling TCO

Global Enable (Pragma)

pragma("tco", True)

# All functions in this file can be TCO-optimized
countdown = (n) => {
    if n == 0 { "Done!" }
    else { countdown(n-1) }
}

Global Disable

pragma("tco", False)

# TCO disabled for debugging or compatibility
recursive_func = (n) => {
    if n == 0 { 1 }
    else { recursive_func(n-1) }
}

Default Behavior

TCO is enabled by default in Catnip. Use pragma("tco", False) to turn it off.

TCO Detection

Catnip automatically detects tail calls:

Analyzes function body for recursive calls
Checks tail position: Is it the last operation?
Verifies self-recursion: Does it call itself?
Transforms if valid: Converts to iterative loop

Detection Rules

A call is in tail position if:

It's the last expression in a function
It's in the last branch of an if/match
It's the last statement in a block
Its result is directly returned

Examples

Example 1: Factorial (Tail-Recursive)

Before TCO:

factorial = (n, acc=1) => {
    if n <= 1 { acc }
    else { factorial(n-1, n*acc) }
}

factorial(5)  # 120
factorial(1000)  # Works with TCO!

What TCO does:

# Conceptual transformation
while True:
    if n <= 1:
        return acc
    # Tail call → parameter update
    n, acc = n-1, n*acc

Example 2: Sum (Tail-Recursive)

pragma("tco", True)

sum_to = (n, acc=0) => {
    if n == 0 { acc }
    else { sum_to(n-1, acc+n) }
}

sum_to(10000)  # 50005000 (no stack overflow!)

Example 3: List Length (Tail-Recursive)

length = (list, acc=0) => {
    match list {
        [] => acc
        [_, …rest] => length(rest, acc+1)
    }
}

length(range(10000))  # 10000 ✔

Example 4: Non-Tail Recursion (Cannot Optimize)

pragma("tco", True)

fib = (n) => {
    if n <= 1 { n }
    else { fib(n-1) + fib(n-2) }  # ✘ Not tail-recursive
}

# TCO has no effect here - still exponential time

Converting Non-Tail to Tail Recursion

Pattern: Add Accumulator

Non-tail:

sum_list = (list) => {
    match list {
        [] => 0
        [x, …rest] => x + sum_list(rest)  # ✘ Addition after call
    }
}

Tail-recursive:

sum_list = (list, acc=0) => {
    match list {
        [] => acc
        [x, …rest] => sum_list(rest, acc+x)  # ✔ Tail call
    }
}

Pattern: Continuation-Passing Style (CPS)

Non-tail:

factorial = (n) => {
    if n <= 1 { 1 }
    else { n * factorial(n-1) }  # ✘ Not tail
}

Tail-recursive with accumulator:

factorial = (n, acc=1) => {
    if n <= 1 { acc }
    else { factorial(n-1, n*acc) }  # ✔ Tail call
}

Performance Impact

Benchmark: Factorial

Implementation	n=100	n=1000	n=10000
Non-tail	0.5ms	Stack overflow	-
Tail + no TCO	0.6ms	Stack overflow	-
Tail + TCO	0.3ms	3ms	30ms ✔

Benchmark: Sum to N

Implementation	n=1000	n=10000	n=100000
No TCO	2ms	Stack overflow	-
With TCO	1ms	10ms	100ms ✔

Performance Benefits

No stack growth: O(1) stack space instead of O(n)
Faster execution: Loop is faster than function calls
No overhead: Eliminated call/return instructions
Cache friendly: Better CPU cache utilization

Implementation Details

Transformation Algorithm

Catnip uses a trampoline execution strategy for TCO:

Detect tail calls in semantic analysis python # In semantic/analyzer.py detector = TailCallDetector(function_name) tail_calls = detector.find_tail_calls(function_body) if tail_calls: mark_as_tail_optimizable(function)

Execute with trampoline loop ```python # In core/nodes_core.pyx Function.call() def call(self, args, *kwargs): ctx.push_scope() try: current_func = self current_args = args current_kwargs = kwargs

   while True:  # Trampoline loop
       # Bind parameters in current scope
       bind_parameters(current_func, current_args, current_kwargs)

       # Execute function body
       result = execute_body(current_func.body)

       # Check if result is a tail call signal
       if isinstance(result, TailCall):
           # Update params and continue (no new scope!)
           current_func = result.func
           current_args = result.args
           current_kwargs = result.kwargs
           ctx.locals.clear()  # Clear for rebinding
           continue

       # Normal return
       return result

finally: ctx.pop_scope() ```

Tail call detection at runtime python # Tail position calls return TailCall instead of executing if node.tail: # Marked by semantic analyzer return TailCall(func, args, kwargs) else: return func(*args, **kwargs)

Key Implementation Details

Trampoline Pattern: Instead of transforming the AST, Catnip uses a runtime trampoline:

Function creates scope once at entry
Tail calls return TailCall signal instead of executing
Trampoline loop rebinds parameters without growing the call stack
Only one scope used, regardless of recursion depth

Advantages:

Simple: No complex AST transformation
Robust: Works with all control flow (if/match/blocks)
Debuggable: Can trace virtual call stack
Fast: C-optimized parameter binding (Cython)

Performance:

O(1) stack space (constant single frame)
Fast parameter rebinding via Cython
No function call overhead on tail calls

Tail Position Analysis

The detector checks these positions:

Block: Last statement
If/Match: Each branch's last expression
Return: Returned expression
Direct call: Function call itself

Limitations

Current TCO limitations:

Self-recursion focus: Best optimization for direct self-calls
Limited mutual recursion: Basic support for f() → g() patterns

Planned improvements:

[ ] Enhanced mutual tail-call optimization
[x] Lambda tail-call support ✔
[ ] Automatic accumulator insertion
[x] Trampoline for tail recursion ✔

Debugging TCO

CLI Flags

Catnip provides a --tco flag to control TCO behavior:

# Enable TCO (default)
catnip -o tco:on script.cat

# Disable TCO completely
catnip -o tco:off script.cat

# Auto mode: respect pragma directives (default)
catnip -o tco:auto script.cat

TCO Tracing

Enable verbose tracing to see TCO in action:

# Run with TCO trace enabled
catnip -o tco:on --trace-tco script.cat

Example output:

TCO Call Stack:
  → factorial(n=5, acc=1)
  ↺ factorial(n=4, acc=5)    [tail call optimization]
  ↺ factorial(n=3, acc=20)   [tail call optimization]
  ↺ factorial(n=2, acc=60)   [tail call optimization]
  ↺ factorial(n=1, acc=120)  [tail call optimization]
  ← return 120

TCO Statistics:
  Function: factorial
    Total tail calls: 4
    Max depth: 1 (constant stack space!)
    Stack saved: 4 frames

Disable for Debugging

pragma("tco", False)

# Easier to debug with normal stack traces
buggy_recursive = (n) => {
    if n == 0 { 1 }
    else { buggy_recursive(n-1) }
}

Programmatic Inspection

from catnip import Catnip

code = '''
pragma("tco", True)

factorial = (n, acc=1) => {
    if n <= 1 { acc }
    else { factorial(n-1, n*acc) }
}
factorial(10)
'''

cat = Catnip()
result = cat.run(code)

# Check TCO statistics
stats = cat.context.tco_stats
print(f"Tail calls: {stats.get_total_tail_calls()}")
print(f"Functions: {stats.get_function_names()}")

Best Practices

✔ DO

Use tail recursion for deep recursion
Add accumulator parameters
Enable TCO for recursive algorithms
Test with large inputs

✘ DON'T

Use TCO for shallow recursion (overhead not worth it)
Assume all recursion is tail-recursive
Rely on TCO for non-tail calls
Forget about base cases

Common Patterns

Pattern 1: Accumulator

process = (list, result=[]) => {
    match list {
        [] => result
        [x, …rest] => {
            new_result = […result, transform(x)]
            process(rest, new_result)  # ✔ Tail call
        }
    }
}

Pattern 2: Multiple Accumulators

fold = (list, acc1=0, acc2=1) => {
    match list {
        [] => [acc1, acc2]
        [x, …rest] => fold(rest, acc1+x, acc2*x)  # ✔ Tail call
    }
}

Pattern 3: State Machine

parse = (tokens, state="start") => {
    match [tokens, state] {
        [[], _] => "done"
        [[t, …rest], "start"] => {
            if t == "(" { parse(rest, "paren") }
            else { parse(rest, "start") }  # ✔ Tail calls
        }
        …
    }
}

Comparison with Other Languages

Language	TCO Support	Default
Scheme	Yes	Always
Haskell	Yes	Always
JavaScript	Limited	ES6+
Python	No	Never
Catnip	Yes	Enabled ✔