Tutorial: IR Loads and Stores¶
Introduction¶
Info from http://wiki.luajit.org/SSA-IR-2.0:
OP Left Right Description ALOAD aref Array load HLOAD href Hash load ULOAD uref Upvalue load FLOAD obj #field Object field load XLOAD xref #flags Extended load SLOAD #slot #flags Stack slot load VLOAD aref Vararg slot load ASTORE aref val Array store HSTORE href val Hash store USTORE uref val Upvalue store FSTORE fref val Object field store XSTORE xref val Extended store
Note
Loads and stores operate on memory references and either load a value (result of the instruction) or store a value (the right operand). To preserve higher-level semantics and to simplify alias analysis they are not unified or decomposed into lower-level operations.
FLOAD
and SLOAD
inline their memory references, all other loads and all stores have a memory reference as their left operand. All loads except FLOAD
and XLOAD
work on tagged values and simultaneously function as a guarded assertion that checks the loaded type.
LOAD
and FSTORE
access specific fields inside objects, identified by the field ID
of their reference (e.g. the metatable field in table or userdata objects).
XLOAD
works on lower-level types and the memory reference is either a STRREF
or decomposed into lower-level operations, a combination of ADD
, MUL
or BSHL
of pointers, offsets or indexes.
The slot number of SLOAD
is relative to the starting frame of a trace, where #0 indicates the closure/frame slot and #1 the first variable slot (corresponding to slot 0 of the bytecode).
Note, that RETF
shifts down BASE
and subsequent SLOAD
instructions refer to slots of the lower frame(s). Also note, there are no store operations for stack slots or vararg slots. All stores to stack slots are effectively sunk into exits or side traces. Snapshots efficiently manage the references that are to be stored. Vararg slots are read-only from the perspective of the called vararg function.
For the possible values of the field ID
in FLOAD
and the flags in SLOAD
and XLOAD
, see IRFLDEF
, IRSLOAD_*
and IRXLOAD_*
in src/lj_ir.h
.
Use this command line arguments to run code examples:
./ujit -j on -p- example.lua
FLOAD
, SLOAD
¶
Loads some field from of the object, i.e. (GCtab *)t->metatable
.
SLOAD
¶
Loads payload from stack slot, i.e. (TValue *)tv->gcr
. Optionally with type checks (grep for IRT_GUARD
).
Let’s run this code:
jit.opt.start("hotloop=1", "nohrefk")
local t = {}
for i = 1, 3 do
local o = t[0]
end
And see closer at IR and mcode:
---- TRACE 1 start example.lua:5
0012 TGETB 5 0 0
0014 FORL 1 => 0012
---- TRACE 1 IR
.... SNAP #0 [ ---- ]
0001 rbp int SLOAD #2 CI
0002 rdx > tab SLOAD #1 T
0003 rsi int FLOAD 0002 tab.asize
0004 > int ULE 0003 +0
0005 rbx int FLOAD 0002 tab.hmask
0006 > int EQ 0005 +0
0007 rax tab FLOAD 0002 tab.meta
0008 > tab EQ 0007 [NULL]
0009 rbp + int ADD 0001 +1
.... SNAP #1 [ ---- ---- ]
0010 > int LE 0009 +3
.... SNAP #2 [ ---- ---- 0009 ---- ---- 0009 ]
0011 ------------ LOOP ------------
0012 rbp + int ADD 0009 +1
.... SNAP #3 [ ---- ---- ]
0013 > int LE 0012 +3
0014 rbp int PHI 0009 0012
---- TRACE 1 mcode 100
// Standard prologue, see emit_vmstate(..) in asm_head_root() from lj_asm.h
/* PRL */ 0bd5ff99 mov r11, 0x7f37f9fe3620 // &g->vmstate field VA
/* PRL */ 0bd5ffa3 mov dword [r11], 0x1 // 1 is a current traceno
/* K */ 0bd5ffaa xor ecx, ecx // NULL constant
/* 001 */ 0bd5ffac cvtsd2si ebp, qword [r10+0x10] // Load 32-bit signed integer from 2-nd slot (counter).
// (CI means converted double and inherit by exit / size states. Grep IRSLOAD_ for more info.)
/* 002 */ 0bd5ffb2 cmp dword [r10+0x8], 0xfffffff4 // Typecheck that 1-st slot contains a table
/* 002 */ 0bd5ffb7 jnz 0xbd50010 ->0 // Guard, jump to the first snapshot
/* 002 */ 0bd5ffbd mov rdx, [r10] // Pointer to table from 1-st slot: (TValue *)->gcr
/* 003 */ 0bd5ffc0 mov esi, [rdx+0x30] // esi = tab->asize
/* 004 */ 0bd5ffc3 cmp esi, 0x0 // if array part is zero
/* 004 */ 0bd5ffc6 ja 0xbd50010 ->0 // then exit
/* 005 */ 0bd5ffcc mov ebx, [rdx+0x38] // ebx = tab->hmask
/* 006 */ 0bd5ffcf test ebx, ebx // same check for zero
/* 006 */ 0bd5ffd1 jnz 0xbd50010 ->0
/* 007 */ 0bd5ffd7 mov rax, [rdx+0x18] rax = tab->metatable
/* 008 */ 0bd5ffdb cmp rax, rcx // compare with nil
/* 008 */ 0bd5ffde jnz 0xbd50010 ->0 // has metatable? exit
/* 009 */ 0bd5ffe4 add ebp, 0x1 // add step (immediate constant)
/* 010 */ 0bd5ffe7 cmp ebp, 0x3 // compare for exit (immediate constant)
/* 010 */ 0bd5ffea jg 0xbd50014 ->1
-> LOOP:
/* 012 */ 0bd5fff0 add ebp, 0x1 // add step (immediate constant)
/* 013 */ 0bd5fff3 cmp ebp, 0x3 // compare for exit (immediate constant)
/* 013 */ 0bd5fff6 jle 0xbd5fff0 ->LOOP // continue loop
/* end */ 0bd5fff8 jmp 0xbd5001c ->3 // normal exit
---- TRACE 1 stop -> loop
HLOAD
¶
Loads payload from tagged value, uses HREF
as input.
jit.opt.start("hotloop=1", "nohrefk")
local t = {}
for i = 1, 3 do
setmetatable(t, nil)
end
---- TRACE 1 start example.lua:5
0012 GGET 5 5 ; "setmetatable"
0013 MOV 6 0
0014 KPRI 7 0
0015 CALL 5 1 3
0000 . FUNCC ; setmetatable
0017 FORL 1 => 0012
---- TRACE 1 IR
.... SNAP #0 [ ---- ]
0001 rbp int SLOAD #2 CI
0002 r12 fun SLOAD #0 R
0003 r9 tab FLOAD 0002 func.env
0004 r8 p32 HREF 0003 "setmetatable"
0005 rsi > fun HLOAD 0004
0006 rcx > tab SLOAD #1 T
0007 > fun EQ 0005 setmetatable
0008 rbx u8 FLOAD 0006 gco.marked
0009 u8 BAND 0008 +64
0010 > int EQ 0009 +0
0011 rdx tab FLOAD 0006 tab.meta
0012 > tab EQ 0011 [NULL]
0013 p32 FREF 0006 tab.meta
0014 tab FSTORE 0013 [NULL]
0015 rbp + int ADD 0001 +1
.... SNAP #1 [ ---- ---- ]
0016 > int LE 0015 +3
.... SNAP #2 [ ---- ---- 0015 ---- ---- 0015 ]
0017 ------------ LOOP ------------
0018 rbp + int ADD 0015 +1
.... SNAP #3 [ ---- ---- ]
0019 > int LE 0018 +3
0020 rbp int PHI 0015 0018
---- TRACE 1 mcode 200
/* PRL */ 0bd5ff35 mov r11, 0x7ff48c16e620 // Default root trace prologue, see FLOAD / SLOAD example
/* PRL */ 0bd5ff3f mov dword [r11], 0x1
/* K */ 0bd5ff46 mov rdi, 0x7ff48c172b10 // constant, setmetatable VA
/* K */ 0bd5ff50 xor eax, eax // NULL constant
/* 001 */ 0bd5ff52 cvtsd2si ebp, qword [r10+0x10] // load signed integer from slot 2
/* 002 */ 0bd5ff58 mov r12, [r10-0x10] // calle function object, read-only
/* 003 */ 0bd5ff5c mov r9, [r12+0x10] // r9 = (GCfunc *)->env
/* 004 */ 0bd5ff61 mov r8d, [r9+0x38]
/* 004 */ 0bd5ff65 and r8d, 0x5950030a
/* 004 */ 0bd5ff6c imul r8d, r8d, 0x28
/* 004 */ 0bd5ff70 add r8, [r9+0x28]
/* 004 */ 0bd5ff74 cmp dword [r8+0x18], 0xfffffffb
/* 004 */ 0bd5ff79 jnz 0xbd5ff8b
/* 004 */ 0bd5ff7b mov r11, 0x7ff48c172b48
/* 004 */ 0bd5ff85 cmp r11, [r8+0x10]
/* 004 */ 0bd5ff89 jz 0xbd5ff9e
/* 004 */ 0bd5ff8b mov r8, [r8+0x20]
/* 004 */ 0bd5ff8f test r8, r8
/* 004 */ 0bd5ff92 jnz 0xbd5ff74
/* 004 */ 0bd5ff94 mov r8, 0x7ff48c16e540
/* 005 */ 0bd5ff9e cmp dword [r8+0x8], 0xfffffff7
/* 005 */ 0bd5ffa3 jnz 0xbd50010 ->0
/* 005 */ 0bd5ffa9 mov rsi, [r8]
/* 006 */ 0bd5ffac cmp dword [r10+0x8], 0xfffffff4
/* 006 */ 0bd5ffb1 jnz 0xbd50010 ->0
/* 006 */ 0bd5ffb7 mov rcx, [r10]
/* 007 */ 0bd5ffba cmp rsi, rdi
/* 007 */ 0bd5ffbd jnz 0xbd50010 ->0
/* 008 */ 0bd5ffc3 movzx ebx, byte [rcx+0x8]
/* 009 */ 0bd5ffc7 test ebx, 0x40
/* 010 */ 0bd5ffcd jnz 0xbd50010 ->0
/* 011 */ 0bd5ffd3 mov rdx, [rcx+0x18]
/* 012 */ 0bd5ffd7 cmp rdx, rax
/* 012 */ 0bd5ffda jnz 0xbd50010 ->0
/* 013-014*/ 0bd5ffe0 mov [rcx+0x18], rax
/* 015 */ 0bd5ffe4 add ebp, 0x1
/* 016 */ 0bd5ffe7 cmp ebp, 0x3
/* 016 */ 0bd5ffea jg 0xbd50014 ->1
-> LOOP:
/* 018 */ 0bd5fff0 add ebp, 0x1
/* 019 */ 0bd5fff3 cmp ebp, 0x3
/* 019 */ 0bd5fff6 jle 0xbd5fff0 ->LOOP
/* end */0bd5fff8 jmp 0xbd5001c ->3
---- TRACE 1 stop -> loop
ALOAD
, ASTORE
¶
May be just type check after AREF
IR, let’s use Lua code from FLOAD
again.
Or perform load of the GCobj
from TValue
:
jit.opt.start("hotloop=1", "nohrefk")
local t = {1, 2, 3, 4}
for i = 1, 3 do
t[i] = t[i + 1]
end
---- TRACE 1 start example.lua:6
0012 KSHORT 5 1
0013 ADD 5 4 5
0014 TGETV 5 0 5
0015 TSETV 5 0 4
0017 FORL 1 => 0012
---- TRACE 1 IR
.... SNAP #0 [ ---- ]
0001 r9 int SLOAD #2 CI
0003 rdi > tab SLOAD #1 T
0004 [8] + int ADD 0001 +1
0005 r10 int FLOAD 0003 tab.asize
0006 > p32 ABC 0005 +4
0007 rax p32 FLOAD 0003 tab.array
0008 rbp + p32 AREF 0007 0004
0009 xmm0 > flt ALOAD 0008
0010 rcx p32 AREF 0007 0001
0011 r8 u8 FLOAD 0003 gco.marked
0012 u8 BAND 0011 +64
0013 > int EQ 0012 +0
0014 rdx tab FLOAD 0003 tab.meta
0015 > tab EQ 0014 [NULL]
0016 flt ASTORE 0010 0009
.... SNAP #1 [ ---- ---- ]
0017 > int LE 0004 +3
.... SNAP #2 [ ---- ---- 0004 ---- ---- 0004 ]
0018 ------------ LOOP ------------
0019 rbx + int ADD 0004 +1
0020 rbp + p32 AREF 0007 0019
0021 xmm7 > flt ALOAD 0020
0022 flt ASTORE 0008 0021
.... SNAP #3 [ ---- ---- ]
0023 > int LE 0019 +3
0024 rbx int PHI 0004 0019
0025 rbp p32 PHI 0008 0020
0026 r14 nil RENAME 0004 #32767
0027 r15 nil RENAME 0008 #2
---- TRACE 1 mcode 215
0bd5ff28 mov r11, 0x7f9243b22620
0bd5ff32 mov dword [r11], 0x1
0bd5ff39 xor esi, esi
0bd5ff3b cvtsd2si r9d, qword [r10+0x10]
0bd5ff41 cmp dword [r10+0x8], 0xfffffff4
0bd5ff46 jnz 0xbd50010 ->0
0bd5ff4c mov rdi, [r10]
0bd5ff4f lea ebx, [r9+0x1]
0bd5ff53 mov [rsp+0x8], ebx
0bd5ff57 mov r10d, [rdi+0x30]
0bd5ff5b cmp r10, 0x4
0bd5ff5f jbe 0xbd50010 ->0
0bd5ff65 mov rax, [rdi+0x10]
0bd5ff69 mov ebp, ebx
0bd5ff6b shl ebp, 0x4
0bd5ff6e add rbp, rax
0bd5ff71 cmp dword [rbp+0x8], 0xfffffff2
0bd5ff78 jnz 0xbd50010 ->0
0bd5ff7e movsd xmm0, qword [rbp]
0bd5ff84 mov ecx, r9d
0bd5ff87 shl ecx, 0x4
0bd5ff8a add rcx, rax
0bd5ff8d movzx r8d, byte [rdi+0x8]
0bd5ff92 test r8d, 0x40
0bd5ff99 jnz 0xbd50010 ->0
0bd5ff9f mov rdx, [rdi+0x18]
0bd5ffa3 cmp rdx, rsi
0bd5ffa6 jnz 0xbd50010 ->0
0bd5ffac mov dword [rcx+0x8], 0xfffffff2
0bd5ffb3 movsd [rcx], xmm0
0bd5ffb7 cmp ebx, 0x3
0bd5ffba jg 0xbd50014 ->1
-> LOOP:
0bd5ffc0 mov [rsp+0x8], ebx
0bd5ffc4 mov r15, rbp
0bd5ffc7 mov r14d, ebx
0bd5ffca add ebx, 0x1
0bd5ffcd mov ebp, ebx
0bd5ffcf shl ebp, 0x4
0bd5ffd2 add rbp, rax
0bd5ffd5 cmp dword [rbp+0x8], 0xfffffff2
0bd5ffdc jnz 0xbd50018 ->2
0bd5ffe2 movsd xmm7, qword [rbp]
0bd5ffe8 mov dword [r15+0x8], 0xfffffff2
0bd5fff0 movsd [r15], xmm7
0bd5fff5 cmp ebx, 0x3
0bd5fff8 jle 0xbd5ffc0 ->LOOP
0bd5fffa jmp 0xbd5001c ->3
---- TRACE 1 stop -> loop
HLOAD
, HSTORE
¶
Same as ASTORE
, but for hash part.
jit.opt.start("hotloop=1", "nohrefk")
local t = {["key"] = 1}
for i = 1, 3 do
t["newkey"] = t["key"]
end
And see closer at IR and mcode:
---- TRACE 1 start example.lua:5
0012 TGETS 5 0 7 ; "key"
0013 TSETS 5 0 6 ; "newkey"
0015 FORL 1 => 0012
---- TRACE 1 IR
.... SNAP #0 [ ---- ]
0001 rbp int SLOAD #2 CI
0002 rax > tab SLOAD #1 T
0003 r9 p32 HREF 0002 "key"
0004 xmm0 > flt HLOAD 0003
0005 rcx p32 HREF 0002 "newkey"
0006 r8 u8 FLOAD 0002 gco.marked
0007 u8 BAND 0006 +64
0008 > int EQ 0007 +0
0009 > p32 NE 0005 [0x7f48c628b540]
0010 rdx tab FLOAD 0002 tab.meta
0011 > tab EQ 0010 [NULL]
0012 flt HSTORE 0005 0004
0013 nil TBAR 0002
0014 rbp + int ADD 0001 +1
.... SNAP #1 [ ---- ---- ]
0015 > int LE 0014 +3
.... SNAP #2 [ ---- ---- 0014 ---- ---- 0014 ]
0016 ------------ LOOP ------------
0017 rbp + int ADD 0014 +1
.... SNAP #3 [ ---- ---- ]
0018 > int LE 0017 +3
0019 rbp int PHI 0014 0017
---- TRACE 1 mcode 302
0bd5fecf mov r11, 0x7f48c628b620
0bd5fed9 mov dword [r11], 0x1
0bd5fee0 mov rsi, 0x7f48c628b540
0bd5feea xor ebx, ebx
0bd5feec cvtsd2si ebp, qword [r10+0x10]
0bd5fef2 cmp dword [r10+0x8], 0xfffffff4
0bd5fef7 jnz 0xbd50010 ->0
0bd5fefd mov rax, [r10]
0bd5ff00 mov r9d, [rax+0x38]
0bd5ff04 and r9d, 0x68ca1d79
0bd5ff0b imul r9d, r9d, 0x28
0bd5ff0f add r9, [rax+0x28]
0bd5ff13 cmp dword [r9+0x18], 0xfffffffb
0bd5ff18 jnz 0xbd5ff2a
0bd5ff1a mov r11, 0x7f48c628d438
0bd5ff24 cmp r11, [r9+0x10]
0bd5ff28 jz 0xbd5ff3d
0bd5ff2a mov r9, [r9+0x20]
0bd5ff2e test r9, r9
0bd5ff31 jnz 0xbd5ff13
0bd5ff33 mov r9, 0x7f48c628b540
0bd5ff3d cmp dword [r9+0x8], 0xfffffff2
0bd5ff45 jnz 0xbd50010 ->0
0bd5ff4b movsd xmm0, qword [r9]
0bd5ff50 mov ecx, [rax+0x38]
0bd5ff53 and ecx, 0x7eaa3afd
0bd5ff59 imul ecx, ecx, 0x28
0bd5ff5c add rcx, [rax+0x28]
0bd5ff60 cmp dword [rcx+0x18], 0xfffffffb
0bd5ff64 jnz 0xbd5ff76
0bd5ff66 mov r11, 0x7f48c628d5b8
0bd5ff70 cmp r11, [rcx+0x10]
0bd5ff74 jz 0xbd5ff89
0bd5ff76 mov rcx, [rcx+0x20]
0bd5ff7a test rcx, rcx
0bd5ff7d jnz 0xbd5ff60
0bd5ff7f mov rcx, 0x7f48c628b540
0bd5ff89 movzx r8d, byte [rax+0x8]
0bd5ff8e test r8d, 0x40
0bd5ff95 jnz 0xbd50010 ->0
0bd5ff9b cmp rcx, rsi
0bd5ff9e jz 0xbd50010 ->0
0bd5ffa4 mov rdx, [rax+0x18]
0bd5ffa8 cmp rdx, rbx
0bd5ffab jnz 0xbd50010 ->0
0bd5ffb1 mov dword [rcx+0x8], 0xfffffff2
0bd5ffb8 movsd [rcx], xmm0
0bd5ffbc test byte [rax+0x8], 0x4
0bd5ffc0 jz 0xbd5ffe4
0bd5ffc2 and byte [rax+0x8], 0xfb
0bd5ffc6 mov r11, 0x7f48c628b4e0
0bd5ffd0 mov rdi, [r11]
0bd5ffd3 mov r11, 0x7f48c628b4e0
0bd5ffdd mov [r11], rax
0bd5ffe0 mov [rax+0x20], rdi
0bd5ffe4 add ebp, 0x1
0bd5ffe7 cmp ebp, 0x3
0bd5ffea jg 0xbd50014 ->1
-> LOOP:
0bd5fff0 add ebp, 0x1
0bd5fff3 cmp ebp, 0x3
0bd5fff6 jle 0xbd5fff0 ->LOOP
0bd5fff8 jmp 0xbd5001c ->3
---- TRACE 1 stop -> loop