Tutorial: Lua-Lua Calls in LuaVela Interpreter¶
Introduction¶
This HOWTO discusses calling Lua functions from Lua code (Lua-Lua calls) and the way they are implemented in the LuaVela interpreter. Special attention is paid to implementation of tail calls.
Common Stack Layout and General CALL Semantics¶
Consider a simple Lua chunk where some Lua function is called from the Lua code:
function foo(bar)
print('bar=' .. tostring(bar))
end
foo(42)
Corresponding byte code for the chunk is as follows:
$ ujit -b- ./lua-lua-call.lua
-- BYTECODE -- lua-lua-call.lua:1-3
0001 GGET 1 0 ; "print"
0002 KSTR 2 1 ; "bar="
0003 GGET 3 2 ; "tostring"
0004 MOV 4 0
0005 CALL 3 2 2
0006 CAT 2 2 3
0007 CALL 1 1 2
0008 RET0 0 1
-- BYTECODE -- lua-lua-call.lua:0-7
0001 FNEW 0 0 ; lua-lua-call.lua:1
0002 GSET 0 1 ; "foo"
0003 GGET 0 1 ; "foo"
0004 KSHORT 1 42
0005 CALL 0 1 2
0006 RET0 0 1
In the dump above, one can see a byte code CALL
which obviously performs a call from one function to another. Let’s look closer what actually happens when semantics of this byte code is executed. First, here is the expected layout of Lua stack (RA
and RC
are operands of the CALL
byte code):
BASE - 1 | tp | func | <-- caller's TValue (notation follows declarations in lj_obj.h)
BASE | TValue | <-- caller's base
... | TValue | {
... | TValue | { other slots allocated during caller's execution
... | TValue | {
BASE + RA | TValue | <-- callee (LFUNC or an object with __call metamethod), RA holds its stack offset relative to caller's BASE
... | TValue | {
... | TValue | { callee's args (RC - 1 slots)
... | TValue | {
TOP |\\\\\\\\\\\\\| <-- first free slot on the stack
Before we go forward, note the stack slot stored at [BASE - 1]
. Its func
is actually a caller which is currently being executed and its tp
holds program counter (PC
) of the caller’s caller. This is LuaVela’s standard mechanism of linking stack frames together. Now what happens during CALL
execution:
- If
[BASE + RA]
is not a function object, its__call
metamethod is resolved. If this operation fails, an error is thrown. Otherwise the interpreter proceeds. - Caller’s
PC
is stored to thetp
part of the[BASE + RA]
slot. BASE
is set to[BASE + RA + 1]
effectively pointing to the first argument of the callee.- First byte code of the callee gets decoded and dispatched, and actual execution of the callee begins.
In other words, the stack looks like:
OLD_BASE - 1 +-->| tp' | func'| <-- caller's TValue (tp is a caller caller's PC; used as a frame link)
OLD_BASE | | TValue | <-- caller's base
... | | TValue | {
... | | TValue | { other slots allocated during caller's execution
... | | TValue | {
BASE - 1 +---| tp | func | <-- callee's TValue (tp is a caller's PC; used as a frame link)
BASE | TValue | <-- callee's base (points to the first argument)
... | TValue | { callee's arguments
... | TValue | { (from 2nd to last)
TOP |\\\\\\\\\\\\\| <-- first free slot on the stack
OLD_BASE
is not actually stored anywhere and is shown on this figure only for convenience. When the interpreter needs to set BASE
back to OLD_BASE
(e.g. during return from the callee), caller’s PC
is used to retrieve the CALL
byte code, read its RA
and set BASE
to the correct value.
CALL
is not the only only byte code for calling functions. The others are:
CALLT
: For tail-calling functionsCALLM
: For calling functions with variable number of argumentsCALLMT
: For tail-calling functions with variable number of arguments
All CALL*
byte codes expect the same stack layout as described above, but differ in semantics. In the next section, we’ll discuss implementation of tail calls.
Tail calls¶
Chunk below illustrates the concept of a tail call:
function bar(x)
return x + 40
end
function foo(x)
return bar(x) -- tail call to bar, no foo's code to execute after bar's execution
end
print(foo(2))
This chunk results in following byte code:
$ ujit -b- tailcall.lua
-- BYTECODE -- tailcall.lua:1-3
0001 KSHORT 1 40
0002 ADD 1 0 1
0003 RET1 1 2
-- BYTECODE -- tailcall.lua:4-6
0001 GGET 1 0 ; "bar"
0002 MOV 2 0
0003 CALLT 1 2
-- BYTECODE -- tailcall.lua:0-9
0001 FNEW 0 0 ; tailcall.lua:1
0002 GSET 0 1 ; "bar"
0003 FNEW 0 2 ; tailcall.lua:4
0004 GSET 0 3 ; "foo"
0005 GGET 0 4 ; "print"
0006 GGET 1 3 ; "foo"
0007 KSHORT 2 2
0008 CALL 1 0 2
0009 CALLM 0 1 0
0010 RET0 0 1
As one can see, LuaVela emits a special opcode CALLT
to handle a tail call. Its semantics is as follows (initial stack layout is the same as in the section above):
If [BASE + RA]
is not a function object, its__call
metamethod is resolved. If this operation fails, an error is thrown. Otherwise the interpreter proceeds.- Caller’s
func
at[BASE - 1]
is overwritten with callee’sfunc
. Please note that original caller’stp
is preserved. - All callee’s arguments are moved from from
[BASE + RA + 1]
to[BASE]
. - First byte code of the callee gets decoded and dispatched, and actual execution of the callee begins.
In other words, original caller’s stack gets destroyed during steps 2 and 3. The interpreter does not care about it as the nature of the call (tail call) guarantees that we will never return to the caller, so no need to maintain its stack frame. Besides, the described approach allows us to save some CPU cycles: After tail call finishes, we will use the original caller’s tp
to return to the parent (=caller’s caller) frame right away instead of performing an extra stack hop in case of preserving caller’s stack frame.
The figure below demonstrates the stack layout after CALLT
is executed:
BASE - 1 | tp | func | <-- callee's func + original tp (caller caller's PC; used as a frame link)
BASE | TValue | <-- callee's base (points to the first argument), written over caller's frame
... | TValue | { callee's arguments from 2nd to last
... | TValue | { (written over caller's frame)
TOP |\\\\\\\\\\\\\| <-- first free slot on the stack
In the next section we will discuss incompatibility between LuaVela and the stock PUC-Rio Lua interpreter caused by this implementation.
Tail Call Incompatibility: A Case Study¶
Consider a chunk tailcall-incompat.lua.
local function foo(env)
print(bar() == baz())
print(bar() == _G)
print(baz() == _G)
end
setfenv(foo, {
bar = function() local env = getfenv(1); return env end,
baz = function() return getfenv(1) end, -- NB! getfenv(1) is tail-called
print = print,
_G = _G,
})
foo()
In this chunk, we set a custom environment for the foo
function. Both bar
and baz
return their environments, with the latter doing it via a tail call.
Let’s run the chunk with the PUC-Rio interpreter:
$ lua5.1 ./tailcall-incompat.lua
true
true
true
Repeat the same with LuaVela:
$ ujit ./tailcall-incompat.lua
false
true
false
Please note following:
- According to the documentation,
gentfenv(1)
returns environment of the function that calledgetfenv
. - In Lua 5.1 environments are not inherited across function calls, the “default” environment is stored in a special variable
_G
.
Now the table below summarizes what happens:
Function / Interpreter | PUC-Rio Lua | LuaVela |
---|---|---|
bar() |
bar() is a called for getfenv(1) . bar() ’s environment is returned. |
bar() is a called for getfenv(1) . bar() ’s environment is returned. |
baz() |
baz() is a called for getfenv(1) . baz() ’s environment is returned. |
foo() is a run-time caller for getfenv(1) (due to tail call implementation in LuaVela). foo() ’s environment is returned. |
The takeaway is simple: Be careful when tail-calling functions that deal with the call stack in LuaVela. As caller frame gets overwritten by a tail-called callee, you’ll certainly not get what you want. First of all it applies to debug
library and functions like getfenv
/setfenv
.