Tutorial: How Folding Engine Works¶
When folding happens¶
Every time you invoke emitir, you actually pass IR to folding engine:
/* Pass IR on to next optimization in chain (FOLD). */
#define emitir(ot, a, b) (lj_ir_set(J, (ot), (a), (b)), lj_opt_fold(J))
IR emission, in turn, takes place on trace recording, on loop optimization phase and on snapshot state replay during side trace setup.
Folding engine is the first stage in a group of optimizations:
FOLD – folding rule based engine.
CSE – common subexpression elimination.
FWD – load/store forwarding.
DSE – dead store elimination.
And all these optimizations will be followed by an actual IR emission phase (in case it was not optimized away), which is simply placing IR in trace IR buffer while maintaining ‘same opcode’ chains of already emitted IRs. One can entirely bypass the folding engine and optimizations mentioned above via emitir_raw macro.
So when you deal with folding engine you should expect an type of IR as an input. As a consequence, for example, (remember that loads/stores optimizations cannot be performed without proper alias analysis) there must be rules for loads and stores to skip folding and CSE for these IRs and dispatch them to FWD or DSE instead.
How folding works¶
emitir macro assigns passed IR to J->fold.ins and calls lj_opt_fold(J) , which does the following:
Check compiler flags for early exit, which may be:
CSE pass for normal IRs in case FOLD pass is disabled
Emitter (i.e.
emitirbecomes effectively equal toemitir_raw) for loads (except SLOAD, in case not all of FOLD, FWD and CSE are enabled) and stores (in case not both FOLD and DSE are enabled)Lookup for folding rule:
Construct key, which is
|23_____17|16___10|9_____________0| |folded ir|left ir|right ir opcode| | opcode | opcode| or literal |At this stage
J->fold.leftandJ->fold.rightare also filled with folded IR operands to save typing while implementing fold rules.Apply any-masking. Folding engine tries to lookup rules not only for exact combination of opcode and left/right operands, but also successively masks operands with ‘any’ wildcard, from most specific to least specific:
opcode left right
opcode any right
opcode left any
opcode any any
Lookup a folding rule by masked key in semi-perfect hash table (which is implemented as an array generated by
buildvmthat searches folding rules byLJFOLDdefines, look intobuildvm_fold.cfor details).Run rule on successful lookup.
Handle folding rule result, which may be:
NEXTFOLD. Means that no folding was applied, because of additional test inside the folding rule. Matching continues with the next iteration of any-masking. If no rule was found for the least specific variant of instruction (with any-any operands), is passed to CSE.
RETRYFOLD. Means instruction was modified in-place. Folding will be retried as if this instruction had just been received.
FAILFOLD, DROPFOLD. Applied to instructions which have no result, for example guarded assertions. FAILFOLD means that guard will always fail and trace should be aborted. DROPFOLD means that guard is always successful.
IR reference (IRRef). This reference may be a result of actual emission, reference to already emitted IR, result of CSE, etc.
How folding rule looks like¶
Here are examples of folding rules:
LJFOLD(EQ FLOAD KNULL)
LJFOLD(NE FLOAD KNULL)
LJFOLDX(lj_opt_cse)
LJFOLD(ABS ABS KNUM)
LJFOLDF(shortcut_left)
{
return LEFTFOLD; /* f(g(x)) ==> g(x) */
}
Defines used for description of folding rules:
LJFOLD/LJFOLDX/LJFOLDF. They are used by
buildvmto generate folding engine hash table.LJFOLD(opcode left right)declares folding rule foropcodewithleftandrightoperands, folding function is specified below with either already defined function (LJFOLDX(funcname)) or function defined in-place (LJFOLDF(funcname)). For the first example above, folding rule applies CSE to comparisons of FLOAD results with IR_KNULL. And the last example simplifies chains of successive IR_ABS opcodes (KNUM here matches ABS auxiliary implementation specific constant).- LEFTFOLD/RIGHTFOLD.
Means result is a reference provided by left/right operand.
- EMITFOLD/CSEFOLD.
Means result is raw emission or instruction is passed to CSE.
- INTFOLD(i).
Means that
iwill be interned and corresponding reference will be returned as folding result.
- NEXTFOLD, RETRYFOLD, FAILFOLD, DROPFOLD.
Described above.
- CONDFOLD(cond).
Is equal to “
cond ? DROPFOLD : FAILFOLD"
Defines to save typing:
#define fins (&J->fold.ins)#define fleft (&J->fold.left)#define fright (&J->fold.right)#define knumleft (ir_knum(fleft)->n)#define knumright (ir_knum(fright)→n)
Please take the following into account while creating/modifying folding rules:
Fold rules must preserve destination type, i.e. use
INTFOLDfor KINT resultlj_ir_knumfor KNUM result,DROPFOLD/FAILFOLD/CONDFOLDfor no result and never uselj_ir_knumint.Fold rules should not create new instructions which reference operands across PHIs. E.g. a RETRYFOLD with
fins->op1 = fleft->op1is invalid if the left operand is a PHI. One could workaround the requirement by issuing new PHIs, but this should be considered as counterproductive. The solution is to abort folding rule via NEXTFOLD iffleftis PHI, which could be shorthanded by equivalentPHIBARRIER(fleft)define. Even though this requirement is easy to slip while writing folding rules, you will likely be punished with either of consequences of corrupted control flow, for example with test timed out in an infinite loop. Note: returning existing instructions (e.g. LEFTFOLD) is ok.Fold rules must be monotonic to guarantee termination. That means rules should point towards eliminating instructions or replacing with one or more simpler instructions or moving constants to right operand to enable other folding rules.
List of available folding rules with examples¶
Constant folding for FP numbers
Constant folding of conversions
Constant folding of pointer arithmetic
Constant folding of equality checks
Constant folding for 32 bit integers
Constant folding for 64 bit integers
Computes result for constant operands and returns reference to it instead of actual emission of instruction.
uj_math_foldarithanduj_math_foldfpmare used for FP folding.
Constant folding for strings
Interns strings with constant payload, zero size or result of
strcmpon constant strings.
Algebraic shortcuts
round(round(x)) = round(x)
abs(abs(x)) = abs(x)
abs(neg(x)) = abs(x)
neg(neg(x)) = x; same for bnot and bswap
Bit operations simplifications
i & 0 = 0; i & -1 = i
i | 0 = i; i | -1 = -1
i ^ 0 = i; i ^ -1 = ~i
shifts with constants and masks simplifications
Integer algebraic simplifications
i + 0 = i; same for subtraction
i * 0 = 0
i * 1 = i
i * 2 = i + i
i * 2^k = i << k
(i + j) - i = j and similar
(i + j1) - (i + j2) = j1 - j2 and similar
FP algebraic simplifications
a + (-b) = a - b
a - (-b) = a + b
(-x) - k = (-k) - x
x * 1 = x; x * (-1) = -x; same for division
x - 0.0 = x
x * 2 = x + x
(-x) * (-x) = x * x
2.0 ^ i = ldexp(1.0, tonum(i))
x ^ 0 = 1; x ^ 1 = x; x^k = x*x*…*x
Conversion simplifications
tonum(tonum(x)) = tonum(x)
int(int(x)) = int(x)
eliminate widening to 64-bits for results of 32-bit operations as they do an implicit zero-extension
special CSE rule for conversions
- Reassociation
(i + k1) + k2 = i + (k1 + k2); same for multiplication, bit operations
min(min(a,b), a) = min(a,b); same for max
Array bounds check elimination
ABC(asize, (i + k) + (-k)) → ABC(asize, i) if it already exists
ABC(asize, k1), ABC(asize, k2) → ABC(asize, max(k1, k2)), drop second ABC if k2 is lower, otherwise patch first ABC with k2.
Eliminates invariant ABC inside loop.
Commutativity folding
Canonicalizes commutative instructions with lower refs going to the right operand (remember, that constants have lower refs than instructions). It helps to reduce folding rules (constants are always right operands), helps CSE to find more matches, helps assembler to generate better code with constants at the right.
Simplification of compound expressions
Turns
string.sub(str, a, b) == kstr
into
string.byte(str, a) == string.byte(kstr, 1) && string.byte(str, a + 1) == string.byte(kstr, 2) && etc.
Load folding rule
Dispatches LOADs to various load forwarding optimizations.
CSE for upvalue references.
CSE for loads from strings as they are immutable.
Probably CSE for loads from immutable tables should also reside here.
Write barrier folding rules
CSE for barriers between GC steps.
Store and allocation folding rules
Dispatches STOREs to various DSE optimizations.
Rules to emit IRs with side effects and bypass CSE for them.
Concatenation folding
Handles folding concatenation of constant strings.