Tutorial: buildvm Gotchas

Introduction

This document sheds some light on how functions from Lua standard library (builtins) are “registered” to become available to the platform’s end users. Following platform peculiarities should be taken into account:

  • Implementation of some builtins may be scattered across the code base. E.g. a “fast path” may be implemented as a fast function inside the VM code, and the “slow path” requiring some more processing or throwing an error may be implemented in C. However, some builtins may be “fast functions”-only, and finally, the third group is C-only builtins.
  • Some builtins may share a common “slow path”.
  • Implementation of some builtins may require access to certain upvalues.

You may be interested in this talk about LuaJIT’s original build chain.

buildvm Utility

Prior to building the platform’s core, a special utility called buildvm is built and ran. This utility scans the core codebase and, based on special macros, generates some headers (*def.h) which are used inside uj_lib.c for building the standard library.

Sharing Common Fallback

Consider following code:

LJLIB_ASM(math_abs) LJLIB_REC(.)
{
    lj_lib_checknum(L, 1);
    return FFH_RETRY;
}
LJLIB_ASM_(math_floor) LJLIB_REC(.)

In this notation, LJLIB_ASM_ means that a Lua function``math.floor`` is implemented as a fast function inside the VM (the LJLIB_ASM part of the macro), but its fallback is a body of the last functions declared with the LJLIB_ASM macro. Please note presence/absence of the final underscore. Here, this character carries information about where to find the fallback part:

$ nm ujit | fgrep -e ffh_math_floor -e ffh_math_abs
000000000041715d t lj_ffh_math_abs
$ gdb --args ./ujit -e 'print(math.floor("1.5"))'
...
(gdb) b lj_ffh_math_abs
...
(gdb) r
...
Breakpoint 1, lj_ffh_math_abs (L=0x0) at /vagrant/ujit/src/lib/math.c:28
28 {
(gdb) bt
#0 lj_ffh_math_abs (L=0x0) at /vagrant/ujit/src/lib/math.c:28
#1 0x00000000004c4d38 in lj_fff_fallback ()
#2 0x000000000040ebda in lua_pcall (L=0x7ffff7fcc378, nargs=0, nresults=0, errfunc=2) at /vagrant/ujit/src/uj_capi.c:884
#3 0x0000000000405f5b in docall (L=0x7ffff7fcc378, narg=0, clear=1) at /vagrant/ujit/src/ujit.c:118
...

Please be careful modifying code in src/lib/*.c.

Builtins Accessing Upvalues

As you probably know, pairs and ipairs are implemented via a call to to next. Because of late binding, it is crucial to know the exact location of the original next in run-time, and _G.next is obviously not an option. To solve the issue, the “native” implementation of next is set as an upvalue for both pairs and ipairs. In other words, the platform executes something like:

local next = _G.next -- here, _G.next is our builtin
_G.pairs = function(...)
    -- next is used somewhere here
end

But how can we emulate this behaviour? Using following magic macros:

LJLIB_ASM(next)
{
    lj_lib_checktab(L, 1);
    return FFH_UNREACHABLE;
}
...
LJLIB_PUSH(lastcl)
LJLIB_ASM(pairs)
{
    return ffh_pairs(L, MM_pairs);
}

buildvm scans src/lib/base.c and upon parsing LJLIB_PUSH(lastcl), stores a byte 253 (0xfd aka LIBINIT_LASTCL) into the array lj_lib_init_base (auto-generated in lj_libdef.h). After that, uj_lib_register will be run inside luaopen_base: At this time, the LIBINIT_LASTCL byte will be read from the lj_lib_init_base array, and the last registered builtin (next in our case, the order of definitions does matter here) will be pushed on the coroutine’s stack, which will be accessed an an upvalue during registering the pairs builtin.