Tutorial: Assembling Lookup in Tables

Introduction

Table is the only data structure in Lua, which makes understanding of its internals crucial for understanding our platform. Loads from / stores to tables are supported in the JIT compiler, but it may be difficult to understand corresponding code right off (the most obvious reason is that IR recording and machine code generation happen in opposite directions). This article provides a commented example.

Table Lookup in Lua, Byte Code and IR

Consider following Lua code:

function Class:foo(sbc, request)
    -- ...
    local user_id = request.user.id
    -- ...
end

We will be interested in how LuaVela interprets and compiles a load of value request.user.id. The byte code from accessing this value will look like:

0063 TGETS 9 2 25 ; "user" -- Class:foo has an implicit "self" argument, hence 'request' occupies the 2nd slot in frame
0064 TGETS 9 9 12 ; "id"

During trace recording, this byte code will be converted to an IR which will look like (see here for notation):

0009 [8]   >  tab SLOAD  #3    T
...
0166 rbx      p32 HREF   0009  "user"
0167 rdx   >  tab HLOAD  0166
0168 rbx      p32 HREF   0167  "id"
0169 rbx   >  str HLOAD  0168

Following should be taken into account:

  • Typed IR results like tab, str etc. should be interpreted as GCobj*. In other words, IR instruction #167 specifies that if the type assertion (denoted with >) succeeds, register rdx will contain a GCtab* value.
  • The HREF instruction returns a reference to a value in a table, which is currently a TValue* (see definition of struct Node for details).

That said, IR instruction #166 can be read as: Take the result of #9 (if we have reached #166, then we know for sure that this is a table), take the string key “user” (this is a compile-time constant as it originates from a string literal in the Lua code), look up a value corresponding to this key and place a pointer to this value to rbx. And the next instruction #167 is simply a (pseudo-assembly, Intel syntax):

mov rdx, tabV(rbx)

Assembling HREF

Machine code for various cases of HREF is one of the most tricky things in the code generator. Here is the commented example of the output for instruction #168:

; Prerequisites:
;  * rdx is a (GCtab *)
;  * Information about the "id" literal will be spread across the machine code
; size_t rbx = rdx->hmask
; 32-bit ebx is enough for holding hmask and derivatives, but I'll denote it as rbx for consistency
05fdeeaf mov ebx, [rdx+0x38]

; rbx &= hash("id")
; As we have a string literal as a key, we can insert its hash at the time of code generation:
05fdeeb2 and ebx, 0x9fcae5e1

; rbx *= sizeof(struct Node)
05fdeeb8 imul ebx, ebx, 0x28

; Node *rbx = (Node *)(((char *)(rdx->node))[rbx]) /* NB! rbx changes its type now */
05fdeebb add rbx, [rdx+0x28]

; if (!tvisstr(rbx->key)) goto next_key_in_bucket
check_key_in_bucket:
05fdeebf cmp dword [rbx+0x18], 0xfffffffb
05fdeec3 jnz 0x5fdeed5

; GCstr *r11 = (GCstr *)"id"
; As we have a string literal as a key, we can insert its address directly into machine code:
05fdeec5 mov r11, 0x7f7fdc36e670

; if (r11 == strV(rbx->key)) goto value_found
; Because strings are interned, we can compare them by comparing corresponding pointers to GCstr *
05fdeecf cmp r11, [rbx+0x10]
05fdeed3 jz 0x5fdeee8

; rbx = rbx->next
next_key_in_bucket:
05fdeed5 mov rbx, [rbx+0x20]
; if (rbx != NULL) goto check_key_in_bucket
05fdeed9 test rbx, rbx
05fdeedc jnz 0x5fdeebf

; rbx = niltvg(g)
05fdeede mov rbx, 0x7f7fdc3554f0

value_found:
; rbx is either a (TValue *) pointing to a value corresponding to the given key (offsetof(struct Node, val) == 0)
; ... or a pointer to a common nil value (a single value per global_State).
05fdeee8 ...

For completeness, here is the same code as it looks in the raw dump:

05fdeeaf mov ebx, [rdx+0x38]
05fdeeb2 and ebx, 0x9fcae5e1
05fdeeb8 imul ebx, ebx, 0x28
05fdeebb add rbx, [rdx+0x28]
05fdeebf cmp dword [rbx+0x18], 0xfffffffb
05fdeec3 jnz 0x5fdeed5
05fdeec5 mov r11, 0x7f7fdc36e670
05fdeecf cmp r11, [rbx+0x10]
05fdeed3 jz 0x5fdeee8
05fdeed5 mov rbx, [rbx+0x20]
05fdeed9 test rbx, rbx
05fdeedc jnz 0x5fdeebf
05fdeede mov rbx, 0x7f7fdc3554f0
05fdeee8 ...