.. _tut-folding-engine:

Tutorial: How Folding Engine Works
==================================

.. contents:: :local:

When folding happens
---------------------

Every time you invoke ``emitir``, you actually pass IR to folding engine:

.. code::

                /* Pass IR on to next optimization in chain (FOLD). */
                #define emitir(ot, a, b)        (lj_ir_set(J, (ot), (a), (b)), lj_opt_fold(J))

IR emission, in turn, takes place on trace recording, on loop optimization phase and on snapshot state replay during side trace setup.

Folding engine is the first stage in a group of optimizations:

    -  **FOLD** – folding rule based engine.
    -  **CSE** – common subexpression elimination.
    -  **FWD** – load/store forwarding.
    -  **DSE** – dead store elimination.

And all these optimizations will be followed by an actual IR emission phase (in case it was not optimized away), which is simply placing IR in trace IR buffer while maintaining 'same opcode' chains of already emitted IRs. One can entirely bypass the folding engine and optimizations mentioned above via ``emitir_raw`` macro.

So when you deal with folding engine you should expect an type of IR as an input. As a consequence, for example, (remember that loads/stores optimizations cannot be  performed without proper alias analysis) there must be rules for loads and stores to skip folding and CSE for these IRs and dispatch them to FWD or DSE instead.

How folding works
-----------------

``emitir`` macro assigns passed IR to ``J->fold.ins`` and calls ``lj_opt_fold(J)`` , which does the following:

  -  Check compiler flags for early exit, which may be:

     -  CSE pass for normal IRs in case FOLD pass is disabled
     -  Emitter (i.e. ``emitir`` becomes effectively equal to ``emitir_raw`` ) for loads (except SLOAD, in case not all of FOLD, FWD and CSE are enabled) and stores (in case not both FOLD and DSE are enabled)

  -  Lookup for folding rule:

     -  Construct key, which is

          .. code::

                 |23_____17|16___10|9_____________0|
                 |folded ir|left ir|right ir opcode|
                 | opcode  | opcode| or literal    |

        At this stage ``J->fold.left`` and ``J->fold.right`` are also filled with folded IR operands to save typing while implementing fold rules.

     -  Apply any-masking. Folding engine tries to lookup rules not only for exact combination of opcode and left/right operands, but also successively masks operands with 'any' wildcard, from most specific to least specific:

        -  opcode left right
        -  opcode any right
        -  opcode left any
        -  opcode any any 

     -  Lookup a folding rule by masked key in semi-perfect
        `hash table <#Tutorial:HowFoldingEngineworks-hash_table>`__ (which is implemented as an array generated by ``buildvm`` that searches folding rules by ``LJFOLD`` defines, look into ``buildvm_fold.c`` for details).
     -  Run rule on successful lookup.

  -  Handle folding rule result, which may be:

     -  NEXTFOLD.
        Means that no folding was applied, because of additional test inside the folding rule. Matching continues with the next iteration of any-masking. If no rule was found for the least specific variant of instruction (with any-any operands), is passed to CSE.
     -  RETRYFOLD.
        Means instruction was modified in-place. Folding will
        be retried as if this instruction had just been
        received.
     -  FAILFOLD, DROPFOLD.
        Applied to instructions which have no result, for
        example guarded assertions.
        FAILFOLD means that guard will always fail and trace
        should be aborted.
        DROPFOLD means that guard is always successful.
     -  IR reference (IRRef).
        This reference may be a result of actual emission,
        reference to already emitted IR, result of CSE, etc.

How folding rule looks like
---------------------------

Here are examples of folding rules:

.. code-block:: c

  LJFOLD(EQ FLOAD KNULL)
  LJFOLD(NE FLOAD KNULL)
  LJFOLDX(lj_opt_cse)

  LJFOLD(ABS ABS KNUM)
  LJFOLDF(shortcut_left)
  {
      return LEFTFOLD;  /* f(g(x)) ==> g(x) */
  }

Defines used for description of folding rules:

- LJFOLD/LJFOLDX/LJFOLDF. They are used by ``buildvm`` to generate folding engine hash table. ``LJFOLD(opcode left right)`` declares folding rule for ``opcode`` with ``left`` and ``right`` operands, folding function is specified below with either already defined function (``LJFOLDX(funcname)``) or function defined in-place (``LJFOLDF(funcname)``). For the first example above, folding rule applies CSE to comparisons of FLOAD results with IR_KNULL. And the last example simplifies chains of successive IR_ABS opcodes (KNUM here matches ABS auxiliary implementation specific constant).
- LEFTFOLD/RIGHTFOLD.
            Means result is a reference provided by left/right operand.
- EMITFOLD/CSEFOLD.
            Means result is raw emission or instruction is passed to CSE.
- INTFOLD(i).
            Means that ``i`` will be interned and corresponding reference will be returned as folding result.
- NEXTFOLD, RETRYFOLD, FAILFOLD, DROPFOLD.
            Described above.
- CONDFOLD(cond).
            Is equal to "``cond ? DROPFOLD : FAILFOLD"``
- Defines to save typing:

  -  ``#define fins (&J->fold.ins)``
  -  ``#define fleft (&J->fold.left)``
  -  ``#define fright (&J->fold.right)``
  -  ``#define knumleft (ir_knum(fleft)->n)``
  -  ``#define knumright (ir_knum(fright)→n)``


Please take the following into account while creating/modifying folding rules:

  -  Fold rules must preserve destination type, i.e. use ``INTFOLD`` for KINT result ``lj_ir_knum`` for KNUM result, ``DROPFOLD/FAILFOLD/CONDFOLD`` for no result and never use ``lj_ir_knumint``.
  -  Fold rules should not create **new** instructions which reference operands **across** PHIs. E.g. a RETRYFOLD with ``fins->op1 = fleft->op1`` is invalid if the left operand is a PHI. One could workaround the requirement by issuing new PHIs, but this should be considered as counterproductive. The solution is to abort folding rule via NEXTFOLD if ``fleft`` is PHI, which could be shorthanded by equivalent ``PHIBARRIER(fleft)`` define. Even though this requirement is easy to slip while writing folding rules, you will likely be punished with either of consequences of corrupted control flow, for example with test timed out in an infinite loop. Note: returning  **existing** instructions (e.g. LEFTFOLD) is ok. 
  -  Fold rules must be monotonic to guarantee termination. That means rules should point towards eliminating instructions or replacing with one or more simpler instructions or moving constants to right operand to enable other folding rules.

List of available folding rules with examples
---------------------------------------------

-  **Constant folding for FP numbers**
-  **Constant folding of conversions**
-  **Constant folding of pointer arithmetic**
-  **Constant folding of equality checks**
-  **Constant folding for 32 bit integers**
-  **Constant folding for 64 bit integers**

   -  Computes result for constant operands and returns reference to it instead of actual emission of instruction. ``uj_math_foldarith`` and ``uj_math_foldfpm`` are used for FP folding.

-  **Constant folding for strings**

   -  Interns strings with constant payload, zero size or result of ``strcmp`` on constant strings.

-  **Algebraic shortcuts**

   -  round(round(x)) = round(x)
   -  abs(abs(x)) = abs(x)
   -  abs(neg(x)) = abs(x)
   -  neg(neg(x)) = x; same for bnot and bswap

- **Bit operations simplifications**

   -  i & 0 = 0; i & -1 = i
   -  i \| 0 = i; i \| -1 = -1
   -  i ^ 0 = i; i ^ -1 = ~i
   -  shifts with constants and masks simplifications

- **Integer algebraic simplifications**

   -  i + 0 = i; same for subtraction
   -  i \* 0 = 0
   -  i \* 1 = i
   -  i \* 2 = i + i
   -  i \* 2^k = i << k
   -  (i + j) - i = j and similar
   -  (i + j1) - (i + j2) = j1 - j2 and similar

- **FP algebraic simplifications**

   -  a + (-b) = a - b
   -  a - (-b) = a + b
   -  (-x) - k = (-k) - x
   -  x \* 1 = x; x \* (-1) = -x; same for division
   -  x - 0.0 = x
   -  x \* 2 = x + x
   -  (-x) \* (-x) = x \* x
   -  2.0 ^ i = ldexp(1.0, tonum(i))
   -  x ^ 0 = 1; x ^ 1 = x; x^k = x*x*...*x

- **Conversion simplifications**

   -  tonum(tonum(x)) = tonum(x)
   -  int(int(x)) = int(x)
   -  eliminate widening to 64-bits for results of 32-bit
      operations as they do an implicit zero-extension
   -  special CSE rule for conversions

- Reassociation
   -  (i + k1) + k2 = i + (k1 + k2); same for multiplication, bit operations
   -  min(min(a,b), a) = min(a,b); same for max 

- **Array bounds check elimination**

   -  ABC(asize, (i + k) + (-k)) → ABC(asize, i) if it
      already exists
   -  ABC(asize, k1), ABC(asize, k2) → ABC(asize, max(k1,
      k2)), drop second ABC if k2 is lower, otherwise patch
      first ABC with k2.
   -  Eliminates invariant ABC inside loop.

- **Commutativity folding**

    - Canonicalizes commutative instructions with lower refs going
      to the right operand (remember, that constants
      have lower refs than instructions). It helps to reduce
      folding rules (constants are always right operands), helps
      CSE to find more matches, helps assembler to generate better
      code with constants at the right.

- **Simplification of compound expressions**

   -  Turns

      .. code::

               string.sub(str, a, b) == kstr

       into

      .. code::

               string.byte(str, a) == string.byte(kstr, 1) && string.byte(str, a + 1) == string.byte(kstr, 2) && etc.

- **Load folding rule**

    -  Dispatches LOADs to various load forwarding optimizations.
    -  CSE for upvalue references.
    -  CSE for loads from strings as they are immutable.
    -  *Probably CSE for loads from immutable tables should also reside here.*

- **Write barrier folding rules**

   -  CSE for barriers between GC steps.

- **Store and allocation folding rules**

   -  Dispatches STOREs to various DSE optimizations.
   -  Rules to emit IRs with side effects and bypass CSE for them.

- **Concatenation folding**

    -  Handles folding concatenation of constant strings.