Tutorial: IR Loads and Stores

Introduction

Info from http://wiki.luajit.org/SSA-IR-2.0:

OP Left Right Description
ALOAD aref   Array load
HLOAD href   Hash load
ULOAD uref   Upvalue load
FLOAD obj #field Object field load
XLOAD xref #flags Extended load
SLOAD #slot #flags Stack slot load
VLOAD aref   Vararg slot load
ASTORE aref val Array store
HSTORE href val Hash store
USTORE uref val Upvalue store
FSTORE fref val Object field store
XSTORE xref val Extended store

Note

Loads and stores operate on memory references and either load a value (result of the instruction) or store a value (the right operand). To preserve higher-level semantics and to simplify alias analysis they are not unified or decomposed into lower-level operations.

FLOAD and SLOAD inline their memory references, all other loads and all stores have a memory reference as their left operand. All loads except FLOAD and XLOAD work on tagged values and simultaneously function as a guarded assertion that checks the loaded type.

LOAD and FSTORE access specific fields inside objects, identified by the field ID of their reference (e.g. the metatable field in table or userdata objects).

XLOAD works on lower-level types and the memory reference is either a STRREF or decomposed into lower-level operations, a combination of ADD, MUL or BSHL of pointers, offsets or indexes.

The slot number of SLOAD is relative to the starting frame of a trace, where #0 indicates the closure/frame slot and #1 the first variable slot (corresponding to slot 0 of the bytecode).

Note, that RETF shifts down BASE and subsequent SLOAD instructions refer to slots of the lower frame(s). Also note, there are no store operations for stack slots or vararg slots. All stores to stack slots are effectively sunk into exits or side traces. Snapshots efficiently manage the references that are to be stored. Vararg slots are read-only from the perspective of the called vararg function.

For the possible values of the field ID in FLOAD and the flags in SLOAD and XLOAD, see IRFLDEF, IRSLOAD_* and IRXLOAD_* in src/lj_ir.h.

Use this command line arguments to run code examples:

./ujit -j on -p- example.lua

FLOAD, SLOAD

Loads some field from of the object, i.e. (GCtab *)t->metatable.

SLOAD

Loads payload from stack slot, i.e. (TValue *)tv->gcr. Optionally with type checks (grep for IRT_GUARD).

Let’s run this code:

jit.opt.start("hotloop=1", "nohrefk")

local t = {}

for i = 1, 3 do
        local o = t[0]
end

And see closer at IR and mcode:

---- TRACE 1 start example.lua:5
0012    TGETB    5   0   0
0014    FORL     1 => 0012
---- TRACE 1 IR
....              SNAP   #0   [ ---- ]
0001 rbp      int SLOAD  #2    CI
0002 rdx   >  tab SLOAD  #1    T
0003 rsi      int FLOAD  0002  tab.asize
0004       >  int ULE    0003  +0
0005 rbx      int FLOAD  0002  tab.hmask
0006       >  int EQ     0005  +0
0007 rax      tab FLOAD  0002  tab.meta
0008       >  tab EQ     0007  [NULL]
0009 rbp    + int ADD    0001  +1
....              SNAP   #1   [ ---- ---- ]
0010       >  int LE     0009  +3
....              SNAP   #2   [ ---- ---- 0009 ---- ---- 0009 ]
0011 ------------ LOOP ------------
0012 rbp    + int ADD    0009  +1
....              SNAP   #3   [ ---- ---- ]
0013       >  int LE     0012  +3
0014 rbp      int PHI    0009  0012
---- TRACE 1 mcode 100
// Standard prologue, see emit_vmstate(..) in asm_head_root() from lj_asm.h
/* PRL */ 0bd5ff99  mov r11, 0x7f37f9fe3620 // &g->vmstate field VA
/* PRL */ 0bd5ffa3  mov dword [r11], 0x1 // 1 is a current traceno
/*  K  */ 0bd5ffaa  xor ecx, ecx // NULL constant
/* 001 */ 0bd5ffac  cvtsd2si ebp, qword [r10+0x10] // Load 32-bit signed integer from 2-nd slot (counter).
                                                // (CI means converted double and inherit by exit / size states. Grep IRSLOAD_ for more info.)
/* 002 */ 0bd5ffb2  cmp dword [r10+0x8], 0xfffffff4 // Typecheck that 1-st slot contains a table
/* 002 */ 0bd5ffb7  jnz 0xbd50010               ->0 // Guard, jump to the first snapshot
/* 002 */ 0bd5ffbd  mov rdx, [r10] // Pointer to table from 1-st slot: (TValue *)->gcr
/* 003 */ 0bd5ffc0  mov esi, [rdx+0x30] // esi = tab->asize
/* 004 */ 0bd5ffc3  cmp esi, 0x0 // if array part is zero
/* 004 */ 0bd5ffc6  ja 0xbd50010                ->0 // then exit
/* 005 */ 0bd5ffcc  mov ebx, [rdx+0x38] // ebx = tab->hmask
/* 006 */ 0bd5ffcf  test ebx, ebx // same check for zero
/* 006 */ 0bd5ffd1  jnz 0xbd50010               ->0
/* 007 */ 0bd5ffd7  mov rax, [rdx+0x18] rax = tab->metatable
/* 008 */ 0bd5ffdb  cmp rax, rcx // compare with nil
/* 008 */ 0bd5ffde  jnz 0xbd50010               ->0 // has metatable? exit
/* 009 */ 0bd5ffe4  add ebp, 0x1 // add step (immediate constant)
/* 010 */ 0bd5ffe7  cmp ebp, 0x3 // compare for exit (immediate constant)
/* 010 */ 0bd5ffea  jg 0xbd50014                ->1
-> LOOP:
/* 012 */ 0bd5fff0  add ebp, 0x1 // add step (immediate constant)
/* 013 */ 0bd5fff3  cmp ebp, 0x3 // compare for exit (immediate constant)
/* 013 */ 0bd5fff6  jle 0xbd5fff0               ->LOOP // continue loop
/* end */ 0bd5fff8  jmp 0xbd5001c               ->3 // normal exit
---- TRACE 1 stop -> loop

FSTORE

Stores value to the some field of the given object, i.e. (GCtab *)->env = val;

HLOAD

Loads payload from tagged value, uses HREF as input.

jit.opt.start("hotloop=1", "nohrefk")

local t = {}

for i = 1, 3 do
    setmetatable(t, nil)
end
---- TRACE 1 start example.lua:5
0012    GGET     5   5      ; "setmetatable"
0013    MOV      6   0
0014    KPRI     7   0
0015    CALL     5   1   3
0000    . FUNCC               ; setmetatable
0017    FORL     1 => 0012
---- TRACE 1 IR
....              SNAP   #0   [ ---- ]
0001 rbp      int SLOAD  #2    CI
0002 r12      fun SLOAD  #0    R
0003 r9       tab FLOAD  0002  func.env
0004 r8       p32 HREF   0003  "setmetatable"
0005 rsi   >  fun HLOAD  0004
0006 rcx   >  tab SLOAD  #1    T
0007       >  fun EQ     0005  setmetatable
0008 rbx      u8  FLOAD  0006  gco.marked
0009          u8  BAND   0008  +64
0010       >  int EQ     0009  +0
0011 rdx      tab FLOAD  0006  tab.meta
0012       >  tab EQ     0011  [NULL]
0013          p32 FREF   0006  tab.meta
0014          tab FSTORE 0013  [NULL]
0015 rbp    + int ADD    0001  +1
....              SNAP   #1   [ ---- ---- ]
0016       >  int LE     0015  +3
....              SNAP   #2   [ ---- ---- 0015 ---- ---- 0015 ]
0017 ------------ LOOP ------------
0018 rbp    + int ADD    0015  +1
....              SNAP   #3   [ ---- ---- ]
0019       >  int LE     0018  +3
0020 rbp      int PHI    0015  0018
---- TRACE 1 mcode 200
/* PRL */ 0bd5ff35  mov r11, 0x7ff48c16e620 // Default root trace prologue, see FLOAD / SLOAD example
/* PRL */ 0bd5ff3f  mov dword [r11], 0x1
/*  K  */ 0bd5ff46  mov rdi, 0x7ff48c172b10 // constant, setmetatable VA
/*  K  */ 0bd5ff50  xor eax, eax // NULL constant
/* 001 */ 0bd5ff52  cvtsd2si ebp, qword [r10+0x10] // load signed integer from slot 2
/* 002 */ 0bd5ff58  mov r12, [r10-0x10] // calle function object, read-only
/* 003 */ 0bd5ff5c  mov r9, [r12+0x10] // r9 = (GCfunc *)->env
/* 004 */ 0bd5ff61  mov r8d, [r9+0x38]
/* 004 */ 0bd5ff65  and r8d, 0x5950030a
/* 004 */ 0bd5ff6c  imul r8d, r8d, 0x28
/* 004 */ 0bd5ff70  add r8, [r9+0x28]
/* 004 */ 0bd5ff74  cmp dword [r8+0x18], 0xfffffffb
/* 004 */ 0bd5ff79  jnz 0xbd5ff8b
/* 004 */ 0bd5ff7b  mov r11, 0x7ff48c172b48
/* 004 */ 0bd5ff85  cmp r11, [r8+0x10]
/* 004 */ 0bd5ff89  jz 0xbd5ff9e
/* 004 */ 0bd5ff8b  mov r8, [r8+0x20]
/* 004 */ 0bd5ff8f  test r8, r8
/* 004 */ 0bd5ff92  jnz 0xbd5ff74
/* 004 */ 0bd5ff94  mov r8, 0x7ff48c16e540
/* 005 */ 0bd5ff9e  cmp dword [r8+0x8], 0xfffffff7
/* 005 */ 0bd5ffa3  jnz 0xbd50010               ->0
/* 005 */ 0bd5ffa9  mov rsi, [r8]
/* 006 */ 0bd5ffac  cmp dword [r10+0x8], 0xfffffff4
/* 006 */ 0bd5ffb1  jnz 0xbd50010               ->0
/* 006 */ 0bd5ffb7  mov rcx, [r10]
/* 007 */ 0bd5ffba  cmp rsi, rdi
/* 007 */ 0bd5ffbd  jnz 0xbd50010               ->0
/* 008 */ 0bd5ffc3  movzx ebx, byte [rcx+0x8]
/* 009 */ 0bd5ffc7  test ebx, 0x40
/* 010 */ 0bd5ffcd  jnz 0xbd50010               ->0
/* 011 */ 0bd5ffd3  mov rdx, [rcx+0x18]
/* 012 */ 0bd5ffd7  cmp rdx, rax
/* 012 */ 0bd5ffda  jnz 0xbd50010               ->0
/* 013-014*/ 0bd5ffe0  mov [rcx+0x18], rax
/* 015 */ 0bd5ffe4  add ebp, 0x1
/* 016 */ 0bd5ffe7  cmp ebp, 0x3
/* 016 */ 0bd5ffea  jg 0xbd50014                ->1
-> LOOP:
/* 018 */ 0bd5fff0  add ebp, 0x1
/* 019 */ 0bd5fff3  cmp ebp, 0x3
/* 019 */ 0bd5fff6  jle 0xbd5fff0               ->LOOP
/* end */0bd5fff8  jmp 0xbd5001c            ->3
---- TRACE 1 stop -> loop

ALOAD, ASTORE

May be just type check after AREF IR, let’s use Lua code from FLOAD again.

Or perform load of the GCobj from TValue:

jit.opt.start("hotloop=1", "nohrefk")

local t = {1, 2, 3, 4}

for i = 1, 3 do
    t[i] = t[i + 1]
end
---- TRACE 1 start example.lua:6
0012    KSHORT   5   1
0013    ADD      5   4   5
0014    TGETV    5   0   5
0015    TSETV    5   0   4
0017    FORL     1 => 0012
---- TRACE 1 IR
....              SNAP   #0   [ ---- ]
0001 r9       int SLOAD  #2    CI
0003 rdi   >  tab SLOAD  #1    T
0004 [8]    + int ADD    0001  +1
0005 r10      int FLOAD  0003  tab.asize
0006       >  p32 ABC    0005  +4
0007 rax      p32 FLOAD  0003  tab.array
0008 rbp    + p32 AREF   0007  0004
0009 xmm0  >  flt ALOAD  0008
0010 rcx      p32 AREF   0007  0001
0011 r8       u8  FLOAD  0003  gco.marked
0012          u8  BAND   0011  +64
0013       >  int EQ     0012  +0
0014 rdx      tab FLOAD  0003  tab.meta
0015       >  tab EQ     0014  [NULL]
0016          flt ASTORE 0010  0009
....              SNAP   #1   [ ---- ---- ]
0017       >  int LE     0004  +3
....              SNAP   #2   [ ---- ---- 0004 ---- ---- 0004 ]
0018 ------------ LOOP ------------
0019 rbx    + int ADD    0004  +1
0020 rbp    + p32 AREF   0007  0019
0021 xmm7  >  flt ALOAD  0020
0022          flt ASTORE 0008  0021
....              SNAP   #3   [ ---- ---- ]
0023       >  int LE     0019  +3
0024 rbx      int PHI    0004  0019
0025 rbp      p32 PHI    0008  0020
0026 r14      nil RENAME 0004  #32767
0027 r15      nil RENAME 0008  #2
---- TRACE 1 mcode 215
0bd5ff28  mov r11, 0x7f9243b22620
0bd5ff32  mov dword [r11], 0x1
0bd5ff39  xor esi, esi
0bd5ff3b  cvtsd2si r9d, qword [r10+0x10]
0bd5ff41  cmp dword [r10+0x8], 0xfffffff4
0bd5ff46  jnz 0xbd50010             ->0
0bd5ff4c  mov rdi, [r10]
0bd5ff4f  lea ebx, [r9+0x1]
0bd5ff53  mov [rsp+0x8], ebx
0bd5ff57  mov r10d, [rdi+0x30]
0bd5ff5b  cmp r10, 0x4
0bd5ff5f  jbe 0xbd50010             ->0
0bd5ff65  mov rax, [rdi+0x10]
0bd5ff69  mov ebp, ebx
0bd5ff6b  shl ebp, 0x4
0bd5ff6e  add rbp, rax
0bd5ff71  cmp dword [rbp+0x8], 0xfffffff2
0bd5ff78  jnz 0xbd50010             ->0
0bd5ff7e  movsd xmm0, qword [rbp]
0bd5ff84  mov ecx, r9d
0bd5ff87  shl ecx, 0x4
0bd5ff8a  add rcx, rax
0bd5ff8d  movzx r8d, byte [rdi+0x8]
0bd5ff92  test r8d, 0x40
0bd5ff99  jnz 0xbd50010             ->0
0bd5ff9f  mov rdx, [rdi+0x18]
0bd5ffa3  cmp rdx, rsi
0bd5ffa6  jnz 0xbd50010             ->0
0bd5ffac  mov dword [rcx+0x8], 0xfffffff2
0bd5ffb3  movsd [rcx], xmm0
0bd5ffb7  cmp ebx, 0x3
0bd5ffba  jg 0xbd50014              ->1
-> LOOP:
0bd5ffc0  mov [rsp+0x8], ebx
0bd5ffc4  mov r15, rbp
0bd5ffc7  mov r14d, ebx
0bd5ffca  add ebx, 0x1
0bd5ffcd  mov ebp, ebx
0bd5ffcf  shl ebp, 0x4
0bd5ffd2  add rbp, rax
0bd5ffd5  cmp dword [rbp+0x8], 0xfffffff2
0bd5ffdc  jnz 0xbd50018             ->2
0bd5ffe2  movsd xmm7, qword [rbp]
0bd5ffe8  mov dword [r15+0x8], 0xfffffff2
0bd5fff0  movsd [r15], xmm7
0bd5fff5  cmp ebx, 0x3
0bd5fff8  jle 0xbd5ffc0             ->LOOP
0bd5fffa  jmp 0xbd5001c             ->3
---- TRACE 1 stop -> loop

HLOAD, HSTORE

Same as ASTORE, but for hash part.

jit.opt.start("hotloop=1", "nohrefk")

local t = {["key"] = 1}

for i = 1, 3 do
    t["newkey"] = t["key"]
end

And see closer at IR and mcode:

---- TRACE 1 start example.lua:5
0012    TGETS    5   0   7  ; "key"
0013    TSETS    5   0   6  ; "newkey"
0015    FORL     1 => 0012
---- TRACE 1 IR
....              SNAP   #0   [ ---- ]
0001 rbp      int SLOAD  #2    CI
0002 rax   >  tab SLOAD  #1    T
0003 r9       p32 HREF   0002  "key"
0004 xmm0  >  flt HLOAD  0003
0005 rcx      p32 HREF   0002  "newkey"
0006 r8       u8  FLOAD  0002  gco.marked
0007          u8  BAND   0006  +64
0008       >  int EQ     0007  +0
0009       >  p32 NE     0005  [0x7f48c628b540]
0010 rdx      tab FLOAD  0002  tab.meta
0011       >  tab EQ     0010  [NULL]
0012          flt HSTORE 0005  0004
0013          nil TBAR   0002
0014 rbp    + int ADD    0001  +1
....              SNAP   #1   [ ---- ---- ]
0015       >  int LE     0014  +3
....              SNAP   #2   [ ---- ---- 0014 ---- ---- 0014 ]
0016 ------------ LOOP ------------
0017 rbp    + int ADD    0014  +1
....              SNAP   #3   [ ---- ---- ]
0018       >  int LE     0017  +3
0019 rbp      int PHI    0014  0017
---- TRACE 1 mcode 302
0bd5fecf  mov r11, 0x7f48c628b620
0bd5fed9  mov dword [r11], 0x1
0bd5fee0  mov rsi, 0x7f48c628b540
0bd5feea  xor ebx, ebx
0bd5feec  cvtsd2si ebp, qword [r10+0x10]
0bd5fef2  cmp dword [r10+0x8], 0xfffffff4
0bd5fef7  jnz 0xbd50010             ->0
0bd5fefd  mov rax, [r10]
0bd5ff00  mov r9d, [rax+0x38]
0bd5ff04  and r9d, 0x68ca1d79
0bd5ff0b  imul r9d, r9d, 0x28
0bd5ff0f  add r9, [rax+0x28]
0bd5ff13  cmp dword [r9+0x18], 0xfffffffb
0bd5ff18  jnz 0xbd5ff2a
0bd5ff1a  mov r11, 0x7f48c628d438
0bd5ff24  cmp r11, [r9+0x10]
0bd5ff28  jz 0xbd5ff3d
0bd5ff2a  mov r9, [r9+0x20]
0bd5ff2e  test r9, r9
0bd5ff31  jnz 0xbd5ff13
0bd5ff33  mov r9, 0x7f48c628b540
0bd5ff3d  cmp dword [r9+0x8], 0xfffffff2
0bd5ff45  jnz 0xbd50010             ->0
0bd5ff4b  movsd xmm0, qword [r9]
0bd5ff50  mov ecx, [rax+0x38]
0bd5ff53  and ecx, 0x7eaa3afd
0bd5ff59  imul ecx, ecx, 0x28
0bd5ff5c  add rcx, [rax+0x28]
0bd5ff60  cmp dword [rcx+0x18], 0xfffffffb
0bd5ff64  jnz 0xbd5ff76
0bd5ff66  mov r11, 0x7f48c628d5b8
0bd5ff70  cmp r11, [rcx+0x10]
0bd5ff74  jz 0xbd5ff89
0bd5ff76  mov rcx, [rcx+0x20]
0bd5ff7a  test rcx, rcx
0bd5ff7d  jnz 0xbd5ff60
0bd5ff7f  mov rcx, 0x7f48c628b540
0bd5ff89  movzx r8d, byte [rax+0x8]
0bd5ff8e  test r8d, 0x40
0bd5ff95  jnz 0xbd50010             ->0
0bd5ff9b  cmp rcx, rsi
0bd5ff9e  jz 0xbd50010              ->0
0bd5ffa4  mov rdx, [rax+0x18]
0bd5ffa8  cmp rdx, rbx
0bd5ffab  jnz 0xbd50010             ->0
0bd5ffb1  mov dword [rcx+0x8], 0xfffffff2
0bd5ffb8  movsd [rcx], xmm0
0bd5ffbc  test byte [rax+0x8], 0x4
0bd5ffc0  jz 0xbd5ffe4
0bd5ffc2  and byte [rax+0x8], 0xfb
0bd5ffc6  mov r11, 0x7f48c628b4e0
0bd5ffd0  mov rdi, [r11]
0bd5ffd3  mov r11, 0x7f48c628b4e0
0bd5ffdd  mov [r11], rax
0bd5ffe0  mov [rax+0x20], rdi
0bd5ffe4  add ebp, 0x1
0bd5ffe7  cmp ebp, 0x3
0bd5ffea  jg 0xbd50014              ->1
-> LOOP:
0bd5fff0  add ebp, 0x1
0bd5fff3  cmp ebp, 0x3
0bd5fff6  jle 0xbd5fff0             ->LOOP
0bd5fff8  jmp 0xbd5001c             ->3
---- TRACE 1 stop -> loop