Tail-Call Optimization (TCO) in Catnip
Catnip supports Tail-Call Optimization to transform recursive tail calls into iterative loops, preventing stack overflow and improving performance.
What is Tail-Call Optimization?
The Problem
Recursive functions can cause stack overflow for deep recursion:
factorial = (n, acc=1) => {
if n <= 1 { acc } else { factorial(n-1, n*acc) }
}
factorial(10000) # Stack overflow!
The Solution
TCO transforms tail-recursive calls into loops:
pragma("tco", True)
factorial = (n, acc=1) => {
if n <= 1 { acc } else { factorial(n-1, n*acc) }
}
factorial(10000) # No stack overflow!
The compiler transforms this into (conceptually):
def factorial(n, acc=1):
while True:
if n <= 1:
return acc
# Update parameters and continue loop
n_new = n - 1
acc_new = n * acc
n = n_new
acc = acc_new
# Loop continues…
What is a Tail Call?
A tail call is a function call that is the last operation before returning:
✔ Tail Calls (Can be optimized)
sum = (n, acc=0) => {
if n == 0 { acc } # Not a call
else { sum(n-1, acc+n) } # ✔ Tail call
}
even = (n) => {
if n == 0 { true }
elif n == 1 { false }
else { even(n-2) } # ✔ Tail call
}
process = (list) => {
match list {
[] => 0
[x, …rest] => process(rest) # ✔ Tail call
}
}
✘ Not Tail Calls (Cannot be optimized)
factorial_bad = (n) => {
if n <= 1 { 1 }
else { n * factorial_bad(n-1) } # ✘ Not tail: multiply after call
}
fib = (n) => {
if n <= 1 { n }
else { fib(n-1) + fib(n-2) } # ✘ Not tail: addition after calls
}
double_sum = (n) => {
2 * sum_to(n) # ✘ Not tail: multiplication after call
}
Enabling TCO
Global Enable (Pragma)
pragma("tco", True)
# All functions in this file can be TCO-optimized
countdown = (n) => {
if n == 0 { "Done!" }
else { countdown(n-1) }
}
Global Disable
pragma("tco", False)
# TCO disabled for debugging or compatibility
recursive_func = (n) => {
if n == 0 { 1 }
else { recursive_func(n-1) }
}
Default Behavior
TCO is enabled by default in Catnip. Use pragma("tco", False) to turn it off.
TCO Detection
Catnip automatically detects tail calls:
- Analyzes function body for recursive calls
- Checks tail position: Is it the last operation?
- Verifies self-recursion: Does it call itself?
- Transforms if valid: Converts to iterative loop
Detection Rules
A call is in tail position if:
- It's the last expression in a function
- It's in the last branch of an
if/match - It's the last statement in a block
- Its result is directly returned
Examples
Example 1: Factorial (Tail-Recursive)
Before TCO:
factorial = (n, acc=1) => {
if n <= 1 { acc }
else { factorial(n-1, n*acc) }
}
factorial(5) # 120
factorial(1000) # Works with TCO!
What TCO does:
# Conceptual transformation
while True:
if n <= 1:
return acc
# Tail call → parameter update
n, acc = n-1, n*acc
Example 2: Sum (Tail-Recursive)
pragma("tco", True)
sum_to = (n, acc=0) => {
if n == 0 { acc }
else { sum_to(n-1, acc+n) }
}
sum_to(10000) # 50005000 (no stack overflow!)
Example 3: List Length (Tail-Recursive)
length = (list, acc=0) => {
match list {
[] => acc
[_, …rest] => length(rest, acc+1)
}
}
length(range(10000)) # 10000 ✔
Example 4: Non-Tail Recursion (Cannot Optimize)
pragma("tco", True)
fib = (n) => {
if n <= 1 { n }
else { fib(n-1) + fib(n-2) } # ✘ Not tail-recursive
}
# TCO has no effect here - still exponential time
Converting Non-Tail to Tail Recursion
Pattern: Add Accumulator
Non-tail:
sum_list = (list) => {
match list {
[] => 0
[x, …rest] => x + sum_list(rest) # ✘ Addition after call
}
}
Tail-recursive:
sum_list = (list, acc=0) => {
match list {
[] => acc
[x, …rest] => sum_list(rest, acc+x) # ✔ Tail call
}
}
Pattern: Continuation-Passing Style (CPS)
Non-tail:
factorial = (n) => {
if n <= 1 { 1 }
else { n * factorial(n-1) } # ✘ Not tail
}
Tail-recursive with accumulator:
factorial = (n, acc=1) => {
if n <= 1 { acc }
else { factorial(n-1, n*acc) } # ✔ Tail call
}
Performance Impact
Benchmark: Factorial
| Implementation | n=100 | n=1000 | n=10000 |
|---|---|---|---|
| Non-tail | 0.5ms | Stack overflow | - |
| Tail + no TCO | 0.6ms | Stack overflow | - |
| Tail + TCO | 0.3ms | 3ms | 30ms ✔ |
Benchmark: Sum to N
| Implementation | n=1000 | n=10000 | n=100000 |
|---|---|---|---|
| No TCO | 2ms | Stack overflow | - |
| With TCO | 1ms | 10ms | 100ms ✔ |
Performance Benefits
- No stack growth: O(1) stack space instead of O(n)
- Faster execution: Loop is faster than function calls
- No overhead: Eliminated call/return instructions
- Cache friendly: Better CPU cache utilization
Implementation Details
Transformation Algorithm
Catnip uses a trampoline execution strategy for TCO:
-
Detect tail calls in semantic analysis
python # In semantic/analyzer.py detector = TailCallDetector(function_name) tail_calls = detector.find_tail_calls(function_body) if tail_calls: mark_as_tail_optimizable(function) -
Execute with trampoline loop ```python # In core/nodes_core.pyx Function.call() def call(self, args, *kwargs): ctx.push_scope() try: current_func = self current_args = args current_kwargs = kwargs
while True: # Trampoline loop # Bind parameters in current scope bind_parameters(current_func, current_args, current_kwargs) # Execute function body result = execute_body(current_func.body) # Check if result is a tail call signal if isinstance(result, TailCall): # Update params and continue (no new scope!) current_func = result.func current_args = result.args current_kwargs = result.kwargs ctx.locals.clear() # Clear for rebinding continue # Normal return return resultfinally: ctx.pop_scope() ```
-
Tail call detection at runtime
python # Tail position calls return TailCall instead of executing if node.tail: # Marked by semantic analyzer return TailCall(func, args, kwargs) else: return func(*args, **kwargs)
Key Implementation Details
Trampoline Pattern: Instead of transforming the AST, Catnip uses a runtime trampoline:
- Function creates scope once at entry
- Tail calls return
TailCallsignal instead of executing - Trampoline loop rebinds parameters without growing the call stack
- Only one scope used, regardless of recursion depth
Advantages:
- Simple: No complex AST transformation
- Robust: Works with all control flow (if/match/blocks)
- Debuggable: Can trace virtual call stack
- Fast: C-optimized parameter binding (Cython)
Performance:
- O(1) stack space (constant single frame)
- Fast parameter rebinding via Cython
- No function call overhead on tail calls
Tail Position Analysis
The detector checks these positions:
- Block: Last statement
- If/Match: Each branch's last expression
- Return: Returned expression
- Direct call: Function call itself
Limitations
Current TCO limitations:
- Self-recursion focus: Best optimization for direct self-calls
- Limited mutual recursion: Basic support for
f()→g()patterns
Planned improvements:
- [ ] Enhanced mutual tail-call optimization
- [x] Lambda tail-call support ✔
- [ ] Automatic accumulator insertion
- [x] Trampoline for tail recursion ✔
Debugging TCO
CLI Flags
Catnip provides a --tco flag to control TCO behavior:
# Enable TCO (default)
catnip -o tco:on script.cat
# Disable TCO completely
catnip -o tco:off script.cat
# Auto mode: respect pragma directives (default)
catnip -o tco:auto script.cat
TCO Tracing
Enable verbose tracing to see TCO in action:
# Run with TCO trace enabled
catnip -o tco:on --trace-tco script.cat
Example output:
TCO Call Stack:
→ factorial(n=5, acc=1)
↺ factorial(n=4, acc=5) [tail call optimization]
↺ factorial(n=3, acc=20) [tail call optimization]
↺ factorial(n=2, acc=60) [tail call optimization]
↺ factorial(n=1, acc=120) [tail call optimization]
← return 120
TCO Statistics:
Function: factorial
Total tail calls: 4
Max depth: 1 (constant stack space!)
Stack saved: 4 frames
Disable for Debugging
pragma("tco", False)
# Easier to debug with normal stack traces
buggy_recursive = (n) => {
if n == 0 { 1 }
else { buggy_recursive(n-1) }
}
Programmatic Inspection
from catnip import Catnip
code = '''
pragma("tco", True)
factorial = (n, acc=1) => {
if n <= 1 { acc }
else { factorial(n-1, n*acc) }
}
factorial(10)
'''
cat = Catnip()
result = cat.run(code)
# Check TCO statistics
stats = cat.context.tco_stats
print(f"Tail calls: {stats.get_total_tail_calls()}")
print(f"Functions: {stats.get_function_names()}")
Best Practices
✔ DO
- Use tail recursion for deep recursion
- Add accumulator parameters
- Enable TCO for recursive algorithms
- Test with large inputs
✘ DON'T
- Use TCO for shallow recursion (overhead not worth it)
- Assume all recursion is tail-recursive
- Rely on TCO for non-tail calls
- Forget about base cases
Common Patterns
Pattern 1: Accumulator
process = (list, result=[]) => {
match list {
[] => result
[x, …rest] => {
new_result = […result, transform(x)]
process(rest, new_result) # ✔ Tail call
}
}
}
Pattern 2: Multiple Accumulators
fold = (list, acc1=0, acc2=1) => {
match list {
[] => [acc1, acc2]
[x, …rest] => fold(rest, acc1+x, acc2*x) # ✔ Tail call
}
}
Pattern 3: State Machine
parse = (tokens, state="start") => {
match [tokens, state] {
[[], _] => "done"
[[t, …rest], "start"] => {
if t == "(" { parse(rest, "paren") }
else { parse(rest, "start") } # ✔ Tail calls
}
…
}
}
Comparison with Other Languages
| Language | TCO Support | Default |
|---|---|---|
| Scheme | Yes | Always |
| Haskell | Yes | Always |
| JavaScript | Limited | ES6+ |
| Python | No | Never |
| Catnip | Yes | Enabled ✔ |
See Also
- Pragmas - Pragma directives including
pragma("tco") - Optimizations - Other optimization techniques
- Language Reference - Function definitions