Tutorial: Assembling Lookup in Tables¶
Introduction¶
Table is the only data structure in Lua, which makes understanding of its internals crucial for understanding our platform. Loads from / stores to tables are supported in the JIT compiler, but it may be difficult to understand corresponding code right off (the most obvious reason is that IR recording and machine code generation happen in opposite directions). This article provides a commented example.
Table Lookup in Lua, Byte Code and IR¶
Consider following Lua code:
function Class:foo(sbc, request)
-- ...
local user_id = request.user.id
-- ...
end
We will be interested in how LuaVela interprets and compiles a load of value request.user.id
. The byte code from accessing this value will look like:
0063 TGETS 9 2 25 ; "user" -- Class:foo has an implicit "self" argument, hence 'request' occupies the 2nd slot in frame
0064 TGETS 9 9 12 ; "id"
During trace recording, this byte code will be converted to an IR which will look like (see here for notation):
0009 [8] > tab SLOAD #3 T
...
0166 rbx p32 HREF 0009 "user"
0167 rdx > tab HLOAD 0166
0168 rbx p32 HREF 0167 "id"
0169 rbx > str HLOAD 0168
Following should be taken into account:
- Typed IR results like tab, str etc. should be interpreted as
GCobj*
. In other words, IR instruction #167 specifies that if the type assertion (denoted with >) succeeds, register rdx will contain aGCtab*
value. - The
HREF
instruction returns a reference to a value in a table, which is currently aTValue*
(see definition ofstruct Node
for details).
That said, IR instruction #166 can be read as: Take the result of #9 (if we have reached #166, then we know for sure that this is a table), take the string key “user” (this is a compile-time constant as it originates from a string literal in the Lua code), look up a value corresponding to this key and place a pointer to this value to rbx. And the next instruction #167 is simply a (pseudo-assembly, Intel syntax):
mov rdx, tabV(rbx)
Assembling HREF
¶
Machine code for various cases of HREF
is one of the most tricky things in the code generator. Here is the commented example of the output for instruction #168:
; Prerequisites:
; * rdx is a (GCtab *)
; * Information about the "id" literal will be spread across the machine code
; size_t rbx = rdx->hmask
; 32-bit ebx is enough for holding hmask and derivatives, but I'll denote it as rbx for consistency
05fdeeaf mov ebx, [rdx+0x38]
; rbx &= hash("id")
; As we have a string literal as a key, we can insert its hash at the time of code generation:
05fdeeb2 and ebx, 0x9fcae5e1
; rbx *= sizeof(struct Node)
05fdeeb8 imul ebx, ebx, 0x28
; Node *rbx = (Node *)(((char *)(rdx->node))[rbx]) /* NB! rbx changes its type now */
05fdeebb add rbx, [rdx+0x28]
; if (!tvisstr(rbx->key)) goto next_key_in_bucket
check_key_in_bucket:
05fdeebf cmp dword [rbx+0x18], 0xfffffffb
05fdeec3 jnz 0x5fdeed5
; GCstr *r11 = (GCstr *)"id"
; As we have a string literal as a key, we can insert its address directly into machine code:
05fdeec5 mov r11, 0x7f7fdc36e670
; if (r11 == strV(rbx->key)) goto value_found
; Because strings are interned, we can compare them by comparing corresponding pointers to GCstr *
05fdeecf cmp r11, [rbx+0x10]
05fdeed3 jz 0x5fdeee8
; rbx = rbx->next
next_key_in_bucket:
05fdeed5 mov rbx, [rbx+0x20]
; if (rbx != NULL) goto check_key_in_bucket
05fdeed9 test rbx, rbx
05fdeedc jnz 0x5fdeebf
; rbx = niltvg(g)
05fdeede mov rbx, 0x7f7fdc3554f0
value_found:
; rbx is either a (TValue *) pointing to a value corresponding to the given key (offsetof(struct Node, val) == 0)
; ... or a pointer to a common nil value (a single value per global_State).
05fdeee8 ...
For completeness, here is the same code as it looks in the raw dump:
05fdeeaf mov ebx, [rdx+0x38]
05fdeeb2 and ebx, 0x9fcae5e1
05fdeeb8 imul ebx, ebx, 0x28
05fdeebb add rbx, [rdx+0x28]
05fdeebf cmp dword [rbx+0x18], 0xfffffffb
05fdeec3 jnz 0x5fdeed5
05fdeec5 mov r11, 0x7f7fdc36e670
05fdeecf cmp r11, [rbx+0x10]
05fdeed3 jz 0x5fdeee8
05fdeed5 mov rbx, [rbx+0x20]
05fdeed9 test rbx, rbx
05fdeedc jnz 0x5fdeebf
05fdeede mov rbx, 0x7f7fdc3554f0
05fdeee8 ...