.. _tut-mem-layout:

Tutorial: Memory Layout for Machine Code and Exit Stubs
=======================================================

.. contents:: :local:

Introduction
------------

This article explains some details of machine code (mcode) memory management performed by |PROJECT| compiler.

Memory Area for mcode: Overview
-------------------------------

Each time the platform detects a hot execution path in the Lua code, it tries to compile it. This path is also called a trace. Process of compilation consists of translating Lua byte code of the trace to an intermediate representation (IR; phase also known as "trace recording"), running optimizations on generated IR and finally assembling IR, i.e. transforming it to the actual machine code, which is the ultimate goal of all JIT compilation efforts. The entire machine code generated by |PROJECT| is kept in ``jit_State.mcarea``. ``mcarea`` is a singly linked list of memory chunks (each being ``LJ_PAGESIZE``-aligned), each chunk has following structure:

.. code::

   +---------------------------+ <-- lowest address
   |link to the next area chunk|
   +---------------------------+
   |mcbot: area bottom         |
   +---------------------------+
   |##### area red zone #######|
   +---------------------------+
   |///////////////////////////|
   |///////free space//////////|
   |///////////////////////////|
   +---------------------------+
   |mctop: area top            |
   +---------------------------+
   |\\\possible padding crap\\\|
   +---------------------------+ <-- highest address

``mcbot`` and ``mctop`` grow towards each other, ``mcbot`` grows from lower to higher addresses, ``mctop`` grows from higher to lower addresses.

Memory at ``mctop`` and higher is occupied by machine code of traces:

.. code::

   +---------------------------+
   |\\\\\\lower addresses\\\\\\|
   +---------------------------+
   |mctop                      |
   +---------------------------+ <-- trace2->mcode
   |/////mcode for trace2//////|
   +---------------------------+ <-- trace1->mcode
   |/////mcode for trace1//////|
   +---------------------------+

Each time a new trace is about to be assembled, its array of IR instructions is processed *from bottom to top* and output machine code is copied to ``mcarea`` shifting ``mctop`` towards lower addresses. After that the processed trace which originally resided partly in ``jit_State.cur``, partly in other ``jit_State`` buffers (e.g.``jit_State.irbuf``) is compacted to a separate ``GCtrace`` object, but ``mcode`` pointer of the new ``GCtrace`` object still points to the memory owned by ``jit_State``.

Memory at ``mcbot`` and lower is occupied by trace-independent machine code which in x86-64 case consists of exit stubs only. Exit stubs are divided into groups, addresses of groups are kept in ``jit_State.exitstubgroup``.

Group 0 holds exit stubs 0..31, group 1 holds exit stubs 32..63 etc. Each group has following structure:

.. code::

   exit_stub_0:
   push i8                   ; 2 bytes: 1 for pushi8 opcode + 1 for immediate value
   jmp exit_handler_prologue ; 2 bytes: 1 for short jump    + 1 for 8-bit relative offset
   exit_stub_1:              ; ...
   push i8
   jmp exit_handler_prologue
   ...
   exit_handler_prologue:
   push i8
   mov r11, i64              ; This i64 is the address of the dispatch table
   mov [rsp+0x10], r11
   mov r11, @lj_vm_exit_handler
   jmp r11

Maximum number of stubs per group is limited because jump target has to be no longer than 1 byte "``jmp exit_handler_prologue``". Stub groups can be made trace-independent because whenever a new trace starts execution, it records its ID to ``global_State.vmstate``, so once we know location of global_State (and we do because it is saved in ``exit_handler_prologue``, see below) we can treat trace IDs and exit numbers independently.

Stub groups are written to ``mcarea`` "on demand", i.e. when we assemble a trace with ``n`` exits we create number of groups which will be enough to hold ``n`` exits. When we assemble the next trace with ``m <= n`` exits, nothing is done as the code is trace-independent and already assembled.

If ``m > n``, we add groups(s) needed to hold ``m - n`` stubs (``stub n, n + 1, ... m - 1``).

The code in ``lj_vm_exit_handler`` restores exit number, trace number and pushes regs on stack to form ``ExitState`` struct and eventually calls

.. code::

   call extern lj_trace_exit(jit_State *J, ExitState *ex)

To restore trace number, dispatch table address is needed (its value saved ``exit_handler_prologue``). It is implemented like this:

.. code::

   lea rbp, [rsp+88] ; magic offset is explained by massive pushing regs on the stack
   ; -- now rbp point at the slot which was addressed as [rsp+0x10]
   ;in exit_handler_prologue

mcode Area Protection Details
-------------------------------

TBD.