.. _test-benchmark: Case Study: Writing Benchmarks for |PROJECT| ============================================ Introduction ------------ Suppose that you want to micro-benchmark a certain part of the platform, say you want to know how much ``tonumber`` costs you. You know that |PROJECT| can be downloaded and used as a stand-alone application through Obtaining |PROJECT| and you have prepared a simple benchmark, like this: .. code-block:: lua -- file1.lua: local x for i = 1, 100000000 do x = tonumber("320") end -- file2.lua: local x for i = 1, 100000000 do x = 320 end Then you run it like this: .. code:: $ /usr/bin/time -f%U ujit file1.lua ... $ /usr/bin/time -f%U ujit file2.lua ... Will the results be representative? Definitely not. Interested why? Then read on! Analyzing the Benchmark ----------------------- First, please note that the body of the loop in any original benchmark contains only loop invariants, i.e. expressions that do not depend on the loop variable. It means that the first benchmark can be rewritten as follows, with the body loop *hoisted* out of the loop: .. code-block:: lua local x = tonumber("320") for i = 1, 100000000 do end Fortunately, the JIT compiler is smart enough to perform this hoisting, too. Unfortunately, for this particular case, ``ujit`` stand-alone application starts with JIT compile on. This means effectively that the two original benchmarks compare the speed of executing the same empty loop, which is definitely not what you wanted. If you dump the progress of the JIT compiler, you will see something like this: .. code:: $ ./ujit -p- -e 'local x; for i = 1, 100000000 do x = tonumber("320") end' ---- TRACE 1 start =(command line):1 0006 GGET 5 0 ; "tonumber" 0007 KSTR 6 1 ; "320" 0008 CALL 5 2 2 0000 . FUNCC ; tonumber 0009 MOV 0 5 0011 FORL 1 => 0006 ---- TRACE 1 IR .... SNAP #0 [ ---- ] 0001 rbp int SLOAD #2 CI 0002 r9 fun SLOAD #0 R 0003 rsi tab FLOAD 0002 func.env 0004 r8 int FLOAD 0003 tab.hmask 0005 > int EQ 0004 +63 0006 rbx p32 FLOAD 0003 tab.node 0007 rdx > p32 HREFK 0006 "tonumber" @13 0008 rax > fun HLOAD 0007 0009 > fun EQ 0008 tonumber 0010 rbp + int ADD 0001 +1 .... SNAP #1 [ ---- 320 ] 0011 > int LE 0010 +100000000 .... SNAP #2 [ ---- 320 0010 ---- ---- 0010 ] 0012 ------------ LOOP ------------ 0013 rbp + int ADD 0010 +1 .... SNAP #3 [ ---- 320 ] 0014 > int LE 0013 +100000000 0015 rbp int PHI 0010 0013 ---- TRACE 1 mcode 155 0bd6ff65 mov r11, 0x7fc692377620 0bd6ff6f mov dword [r11], 0x1 0bd6ff76 mov rcx, 0x7fc69237be98 0bd6ff80 cvtsd2si ebp, qword [r10+0x10] 0bd6ff86 mov r9, [r10-0x10] 0bd6ff8a mov rsi, [r9+0x10] 0bd6ff8e mov r8d, [rsi+0x38] 0bd6ff92 cmp r8d, 0x3f 0bd6ff96 jnz 0xbd60010 ->0 0bd6ff9c mov rbx, [rsi+0x28] 0bd6ffa0 mov rdi, 0x7fc69237bed0 0bd6ffaa cmp rdi, [rbx+0x218] 0bd6ffb1 jnz 0xbd60010 ->0 0bd6ffb7 cmp dword [rbx+0x220], 0xfffffffb 0bd6ffbe jnz 0xbd60010 ->0 0bd6ffc4 lea rdx, [rbx+0x208] 0bd6ffcb cmp dword [rdx+0x8], 0xfffffff7 0bd6ffcf jnz 0xbd60010 ->0 0bd6ffd5 mov rax, [rdx] 0bd6ffd8 cmp rax, rcx 0bd6ffdb jnz 0xbd60010 ->0 0bd6ffe1 add ebp, 0x1 0bd6ffe4 cmp ebp, 0x5f5e100 0bd6ffea jg 0xbd60014 ->1 -> LOOP: 0bd6fff0 add ebp, 0x1 0bd6fff3 cmp ebp, 0x5f5e100 0bd6fff9 jle 0xbd6fff0 ->LOOP 0bd6fffb jmp 0xbd6001c ->3 ---- TRACE 1 stop -> loop Without diving into the details, please pay attention to the ``------------ LOOP ------------`` line: Everything above it is, roughly speaking, a loop invariant code. As you can see, everything that resembles a ``tonumber`` call is exactly above that line, while only some few pieces of ehm... something (IR instructions, but this really does not matter now) is below the line. Luckily, there is no need to examine dumps and dive into gory details of the compiler each time you benchmark something, just remember this: .. note:: When benchmarking any code by wrapping the code into a loop, do one of the following: 1. Switch the compiler off; 2. Ensure that the benchmarked code is **not** loop-invariant, i.e. it must depend on the looping variable Benchmarking with Compiler Off ------------------------------ Running ``ujit`` with compiler off lets you avoid any invisible side effects on your code, in this case the interpreter literally executes what you have written. Simply start ``ujit`` with ``-joff``, or switch the compiler off directly in the Lua chunk with ``jit.off()``. With this technique, you gain following: - You can estimate, how your hypotheses perform relatively to each other. - You can estimate, how your code will perform if this particular part of the code fails to JIT-compile (say, in case it unluckily becomes a part of the trace that contains a non-JITtable thing like a call to C API). With this technique, you obviously lose following: - You'll never know what performance you can achieve if the benchmarked code JIT compiles. Now let' see how we can fix the benchmark: .. code-block:: lua assert(jit.status() == false, "This benchmark is designed to run without JIT, please either -joff in command line, or jit.off() in Lua") local N = 1e8 local function assign1() print("With tonumber") local n for i = 1, 100000000 do n = tonumber("320") end return n end local function assign2() print("Without tonumber") local n for i = 1, N do n = 320 end return n end local function benchmark(name, t1, t2) if name == "tonumber" then assign1(t1) else assign2(t2) end end benchmark(arg[1], t1, t2) And run it: .. code:: $ /usr/bin/time -f%U ./ujit -joff tonumber-no-jit.lua ... $ /usr/bin/time -f%U ./ujit -joff tonumber-no-jit.lua tonumber ... Benchmarking with Compiler On ----------------------------- As demonstrated above, running the benchmark with compiler on immediately exposes your code to various transformations done by the compiler. On other hand, this is much more fun! Just be careful: - No loop-invariants, remember? - You may want to read :ref:`Reduce test cases ` to gain more inspiration and knowledge about interacting with the compiler. - You may ultimately want to learn the dump format, at least to estimate that you benchmark what you intended to. This one may be really tricky, |PROJECT| team understands that it is too cruel to MAKE you do so and always welcomes you to ask any questions of you feel you are stuck with benchmarking with the compiler turned on. And despite these pitfalls, you definitely gain following: - You can estimate, how your hypotheses perform relatively to each other with JIT compilation on. - You can estimate, how your code will perform if this particular part of the code is lucky to JIT-compile. So let's see one possible variant of the benchmark. First, let's generate some data: .. code:: $ ujit -e 'print("local t1 = {"); for i = 100, 999 do print("\t\"" .. i .. "\",") end; print "}"; print "local t2 = {"; for i = 100, 999 do print("\t" .. i .. ",") end; print "}"; print("return {t1 = t1, t2 = t2}") ' >data.lua And the benchmark itself: .. code-block:: lua assert(jit.status() == true, "This benchmark is designed to run with JIT, please either -jon in command line, or jit.on() in Lua") local N = 1e8 local data = require("data") assert(type(data) == "table") assert(type(data.t1) == "table") assert(type(data.t2) == "table") local function assign1(t) print("With tonumber") local nels = #t local n for i = 1, N do n = tonumber(t[i % nels + 1]) end return n end local function assign2(t) print("Without tonumber") local nels = #t local n for i = 1, N do n = t[i % nels + 1] end return n end local function benchmark(name, t1, t2) if name == "tonumber" then assign1(t1) else assign2(t2) end end benchmark(arg[1], data.t1, data.t2) And run it: .. code:: $ /usr/bin/time -f%U ./ujit -jon tonumber-jit.lua ... $ /usr/bin/time -f%U ./ujit -jon tonumber-jit.lua tonumber ... Conclusion ---------- - Be careful when benchmarking with the compiler turned on: In particular, do not allow loop hoisting to spoil you benchmark and make it totally non-representative. - Do two-fold benchmarking: - To evaluate worst-case performance, benchmark with JIT compiler off; - To evaluate best-case performance, benchmark with JIT compiler on. - Do not hesitate to contact |PROJECT| team if you get puzzled by any result you observe.