LuaVela ujit.iprof: Platform-level Instrumenting Profiler for Lua Code

Purpose

This document is a specification for LuaVela instrumenting profiler usage including usage examples describing its features and limitations.

Target Audience

Lua developers and anyone interested in Lua code performance assessment.

Interface: ujit.iprof

LuaVela currently provides only Lua-level API for instrumenting profiling, however, statistics can be passed via callback either provided by platform, within LuaVela is integrated or written in pure Lua (see some examples below). The resulting table will contain up to four values per each profiled entity:

  • calls – how many times profiled entity has been called.
  • wall – processing time for profiled entity (aka wall clock).
  • lua – total time spent within profiled Lua coroutine.
  • subs – a table with callees of profiled entity. Omitted if no callees within.

ujit.iprof.start

Synopsis

local status, errmsg = ujit.iprof.start(label, mode, limit)

ujit.iprof.start sets up resources for instrumenting profiler and puts interpreter into profiling mode.

Arguments

  1. label (required) – name of profiling entity.
  2. mode (optional) – profile mode from values listed below (default is ujit.iprof.PLAIN).
  • ujit.iprof.PLAIN – counts inclusive time only for profiled lines.
  • ujit.iprof.INCLUSIVE – counts inclusive time for profiled lines and all nested callees up to the limit depth.
  • ujit.iprof.EXCLUSIVE – counts exclusive (self) time for profiled lines and all nested callees up to the limit depth.
  1. limit (optional) – max Lua frame depth to be profiled (default is LJ_MAX_XLEVEL).

Return value

In case of errors ujit.iprof.start returns false with the corresponding error message. Otherwise it returns true.

Errors

  1. "wrong profiling mode, use ujit.iprof.{PLAIN,INCLUSIVE,EXLCUSIVE}" – error raised when profiling mode parameter is out of acceptable set.
  2. "wrong profiling limit, use non-negative number less than <number>" – error raised when profiling limit is out of acceptable range.
  3. "Inappropriate ujit build" – error raised while using profiler interfaces when profiler is not available.

Also there are two errors that can be removed in future:

  1. "JIT is enabled" – error raised while using profiler with enabled jit. The reason of this behaviour is described below.
  2. "Error occured while profiling" – error raised for the case of nested profiling. The reason of this behaviour is described below.

ujit.iprof.stop

Synopsis

local result, errmsg = ujit.iprof.stop()

ujit.iprof.stop releases resources being allocated for profiling and puts interpreter out of profiling mode.

Arguments

No arguments are required for this function.

Return value

If there are no errors, ujit.iprof.stop returns a table with profiler results. Otherwise it returns nil, plus the error message.

Errors

  1. "Inappropriate ujit build" – error raised while using profiler interfaces when profiler is not available.
  2. "Error occured while profiling" – error raised while calling ujit.iprof.stop with no prior ujit.iprof.start call.

ujit.iprof.profile

This subroutine is not provided directly by uJIT considering two following problems we faced while implementing ujit.iprof.profile:

  • Coroutine isn’t able to yield through C function (platform design limit).
  • Today there is no way to implement LuaVela builtin in pure Lua.

Thereby we can just propose an interface and an example of pure Lua implementation you can embed to your platform/profiling suite/etc (see Notes section below).

Synopsis

local wpfunction = ujit.iprof.profile(pfunction, pcallback,
                                      label, mode, limit)

ujit.iprof.profile is a wrapper for function to be profiled with ujit.iprof.start and ujit.iprof.stop underneath.

Arguments

  1. pfunction (required) – function to be profiled
  2. pcallback (required) – function expecting table with profiler results as a first argument.
  3. label (optional) – the same as ujit.iprof.start label argument
  4. mode (optional) – the same as ujit.iprof.start mode argument
  5. limit (optional) – the same as ujit.iprof.start limit argument

Return value

ujit.iprof.profile returns the wrapped function wpfunction. All profiler-related errors will be passed as argument to pcallback with foregoing nil. At the end of wrapped function the profiling is complete with no errors then the table with profiler stats will be passed to the pcallback.

Errors

According to ujit.iprof.profile synopsis all possible errors can be found within corresponding sections for ujit.iprof.start and ujit.iprof.stop.

Notes

Here is the list with limitations for current ujit.iprof implementation:

  1. Nested profiling: ujit.iprof.start call when profiling has been initiated does not spawn another “profiling frame” for now. There are no internal confines related to it in platform, so let me say it’s just another NYI in LuaVela.
  2. Traces profiling: ujit.iprof.start obliges JIT to be disabled while profiling process.

As mentioned before here is an example of proposed wrapper written in pure Lua:

function ujit.iprof.profile(pfunction, pcallback, label, mode, limit)
  return function(...)
     local s, e = ujit.iprof.start(label, mode, limit)
     local t = { pfunction(...) }
     if s then pcallback(ujit.iprof.stop()) else pcallback(nil, e) end
     return table.unpack(t)
  end
end

Examples

Simple example

Let’s try ujit.iprof with the classic program to start anything in programming with.

print('Hello, world!')

Everything we need to profile this snippet is to add a directive to start profiling and a corresponding one to stop it. For a consistency we will also check that both calls succeeded.

local s, se = ujit.iprof.start('HELLO-IPROF')
if not s then error(se) end

print('Hello, world!')

local r, re = ujit.iprof.stop()
if not r then error(re) end

The r variable stores a table with the report produced by profiler, thus let’s dump it with the auxiliary library that can be found here.

local inspect = require 'inspect'

local s, se = ujit.iprof.start('HELLO-IPROF')
if not s then error(se) end

print('Hello, world!')

local r, re = ujit.iprof.stop()
if not r then error(re) end

print(inspect(r, { indent = '\t' }))

Now we a ready to profile this chunk. Run the command below in terminal.

$ ujit -joff hello-world.lua
Hello, world!
{
      ["HELLO-IPROF"] = {
              calls = 1,
              lua = 4.4150016037747e-05,
              wall = 4.4150016037747e-05
      }
}

Here you see the desired output followed by the resulting table with the fields described above.

OK then, we need to go deeper.

Other modes examples

As it was prior mentioned there are three different modes provided for instrumenting profiler: plain, inclusive and exclusive (for more information see description above).

So let’s just add an optional CLI argument to define profiling mode.

local inspect = require 'inspect'

local mode = ujit.iprof[arg[1] or 'PLAIN']

local s, se = ujit.iprof.start('HELLO-IPROF', mode)
if not s then error(se) end

print('Hello, world!')

local r, re = ujit.iprof.stop()
if not r then error(re) end

print(inspect(r, { indent = '\t' }))

Let’s see what we’ll get for the previous command.

$ ujit -joff hello-iprof.lua
Hello, world!
{
      ["HELLO-IPROF"] = {
              calls = 1,
              lua = 4.2922969441861e-05,
              wall = 4.2922969441861e-05
      }
}

Exactly the same structure, since ujit.iprof.start second argument defaults to ujit.iprof.PLAIN if omitted.

Let’s try an inclusive one (mind the uppercase for the mode names).

$ ujit -joff hello-iprof.lua INCLUSIVE
Hello, world!
{
      ["HELLO-IPROF"] = {
              calls = 1,
              lua = 7.5099989771843e-05,
              subs = {
                      ["builtin #223"] = {
                              calls = 1,
                              lua = 9.0009998530149e-06,
                              wall = 9.0009998530149e-06
                      },
                      ["builtin #224"] = {
                              calls = 1,
                              lua = 2.0881998352706e-05,
                              wall = 2.0881998352706e-05
                      },
                      ["builtin #30"] = {
                              calls = 1,
                              lua = 4.0316954255104e-05,
                              wall = 4.0316954255104e-05
                      }
              },
              wall = 7.5099989771843e-05
      }
}

Now we see slightly different structure: there is a subs field now for "HELLO-IPROF" entity containing three builtin functions:

  • builtin #223ujit.iprof.start internal identifier
  • builtin #224ujit.iprof.start internal identifier
  • builtin #30print internal identifier

Another detail need to be mentioned: lua and processing times for "HELLO-IPROF" entity include the timings of all its callees. If you want to see "HELLO-IPROF" “self” time just change mode to exclusive.

$ ujit -joff hello-iprof.lua EXCLUSIVE
Hello, world!
{
      ["HELLO-IPROF"] = {
              calls = 1,
              lua = 4.6360655687749e-06,
              subs = {
                      ["builtin #223"] = {
                              calls = 1,
                              lua = 9.4409915618598e-06,
                              wall = 9.4409915618598e-06
                      },
                      ["builtin #224"] = {
                              calls = 1,
                              lua = 1.5364959836006e-05,
                              wall = 1.5364959836006e-05
                      },
                      ["builtin #30"] = {
                              calls = 1,
                              lua = 4.0235987398773e-05,
                              wall = 4.0235987398773e-05
                      }
              },
              wall = 4.6360655687749e-06
      }
}

Now all reported timings corresponds directly to each entity.

Function profiling

The more convenient way to instrument the function itself instead all places it is used. As mentioned above there is no such functions wrapper provided by platform but we propose an interface and pure Lua implementation you can find in this document.

Let’s change a bit more our snippet to try ujit.iprof.profile wrapper.

local inspect = require 'inspect'

function ujit.iprof.profile(pfunction, pcallback, name, mode, level)
  return function(...)
    local s, e = ujit.iprof.start(name, mode, level)
    local t = { pfunction(...) }
    if s then pcallback(ujit.iprof.stop()) else pcb(nil, e) end
    return table.unpack(t)
  end
end

local function stats(t, e)
  if not t then error(e) end
  print(inspect(t, { indent = '\t' }))
end

local mode = ujit.iprof[arg[1] or 'PLAIN']
local wprint = ujit.iprof.profile(print, stats,
                                  'HELLO-IPROF-PROFILE', mode)

local function echo(s) return s end

wprint(echo('Hello, world!'))

Except ujit.iprof.profile we added two additional functions. The first one – stats – to be invoked when profiler stops and is obliged to have a special signature considering ujit.iprof.stop return values. stats callback throws an error whether profiling was failed and dumps the report table otherwise. The purpose of the other function – echo – is described below.

So let’s test this chunk the way we used to in the latter section.

$ ujit -joff hello-iprof-profile.lua
Hello, world!
{
      ["HELLO-IPROF-PROFILE"] = {
              calls = 1,
              lua = 4.9892987590283e-05,
              wall = 4.9892987590283e-05
      }
}
$ ujit -joff hello-iprof-profile.lua INCLUSIVE
Hello, world!
{
      ["HELLO-IPROF-PROFILE"] = {
              calls = 1,
              lua = 7.5692019890994e-05,
              subs = {
                      ["builtin #223"] = {
                              calls = 1,
                              lua = 9.3870330601931e-06,
                              wall = 9.3870330601931e-06
                      },
                      ["builtin #224"] = {
                              calls = 1,
                              lua = 1.4016986824572e-05,
                              wall = 1.4016986824572e-05
                      },
                      ["builtin #30"] = {
                              calls = 1,
                              lua = 4.0391983930022e-05,
                              wall = 4.0391983930022e-05
                      }
              },
              wall = 7.5692019890994e-05
      }
}
$ ujit -joff hello-iprof-profile.lua EXCLUSIVE
Hello, world!
{
      ["HELLO-IPROF-PROFILE"] = {
              calls = 1,
              lua = 1.798098674044e-05,
              subs = {
                      ["builtin #223"] = {
                              calls = 1,
                              lua = 1.1770986020565e-05,
                              wall = 1.1770986020565e-05
                      },
                      ["builtin #224"] = {
                              calls = 1,
                              lua = 2.080702688545e-05,
                              wall = 2.080702688545e-05
                      },
                      ["builtin #30"] = {
                              calls = 1,
                              lua = 5.6924007367343e-05,
                              wall = 5.6924007367343e-05
                      }
              },
              wall = 1.798098674044e-05
      }
}

And here is the purpose of the echo function: as you can see there is no entry for it within all three reports because its execution was finished before ujit.iprof.start invocation. Reports comparison for profiling approach provided in previous section with the echo proxy is left for the reader.

Profiling tail calls

There are some difference in Lua stack layout for plain call (CALL bytecode) and tail call (CALLT bytecode) while execution and this also reflects on profiling reports. For advance please read Tutorial: Lua-Lua Calls in LuaVela Interpreter.

Here is a code snippet emitting both plain call bytecode and tail call bytecode and initializing q according to the arguments passed with CLI.

local inspect = require 'inspect'

function ujit.iprof.profile(pfunction, pcallback, name, mode, level)
  return function(...)
    local s, e = ujit.iprof.start(name, mode, level)
    local t = { pfunction(...) }
    if s then pcallback(ujit.iprof.stop()) else pcb(nil, e) end
    return table.unpack(t)
  end
end

local function stats(t, e)
  if not t then error(e) end
  print(inspect(t, { indent = '\t' }))
end

local function qq(str) print(str) end

local mode = ujit.iprof[arg[1] or 'PLAIN']

local q = tostring(arg[2]) == 'CALLT'
  and ujit.iprof.profile(function(str) return qq(str) end, stats,
                         'HELLO-IPROF-CALLT', mode)
  or  ujit.iprof.profile(function(str) qq(str) end, stats,
                         'HELLO-IPROF-CALL', mode)

q('Hello, world!')

As a result of the command below q will be initialized with the function containing CALL + RET0 bytecode combination. For a plain call the nesting you will see in report is natural: function originated for q and defined at hello-iprof-callt.lua:24 includes qq defined at hello-iprof-callt.lua:17.

$ ujit -joff hello-iprof-callt.lua EXCLUSIVE
Hello, world!
{
      ["HELLO-IPROF-CALL"] = {
              calls = 1,
              lua = 1.150497701019e-05,
              subs = {
                      ["builtin #223"] = {
                              calls = 1,
                              lua = 1.0108051355928e-05,
                              wall = 1.0108051355928e-05
                      },
                      ["builtin #224"] = {
                              calls = 1,
                              lua = 1.6201985999942e-05,
                              wall = 1.6201985999942e-05
                      },
                      ["function @hello-iprof-callt.lua:24"] = {
                              calls = 1,
                              lua = 3.4548982512206e-05,
                              subs = {
                                      ["function @hello-iprof-callt.lua:17"] = {
                                              calls = 1,
                                              lua = 8.3340564742684e-06,
                                              subs = {
                                                      ["builtin #30"] = {
                                                              calls = 1,
                                                              lua = 4.7281966544688e-05,
                                                              wall = 4.7281966544688e-05
                                                      }
                                              },
                                              wall = 8.3340564742684e-06
                                      }
                              },
                              wall = 3.4548982512206e-05
                      }
              },
              wall = 1.150497701019e-05
      }
}

The following command will result to q being initialized with a function terminating with CALLT bytecode. Hence according to manipulations with Lua stack for this bytecode you will see the report containing the origin function for q defined at hello-iprof-callt.lua:22 at the same nesting level as its callee defined at hello-iprof-callt.lua:17.

$ ujit -joff hello-iprof-callt.lua EXCLUSIVE CALLT
Hello, world!
{
      ["HELLO-IPROF-CALLT"] = {
              calls = 1,
              lua = 1.5244004316628e-05,
              subs = {
                      ["builtin #223"] = {
                              calls = 1,
                              lua = 1.1546013411134e-05,
                              wall = 1.1546013411134e-05
                      },
                      ["builtin #224"] = {
                              calls = 1,
                              lua = 1.5365018043667e-05,
                              wall = 1.5365018043667e-05
                      },
                      ["function @hello-iprof-callt.lua:17"] = {
                              calls = 1,
                              lua = 1.3436016160995e-05,
                              subs = {
                                      ["builtin #30"] = {
                                              calls = 1,
                                              lua = 8.5713982116431e-05,
                                              wall = 8.5713982116431e-05
                                      }
                              },
                              wall = 1.3436016160995e-05
                      },
                      ["function @hello-iprof-callt.lua:22"] = {
                              calls = 1,
                              lua = 4.2876985389739e-05,
                              wall = 4.2876985389739e-05
                      }
              },
              wall = 1.5244004316628e-05
      }
}

This is another acknowledgment for the remark that profiler describes the way code is actually executed.

Limiting the profiling depth

An attentive reader could notice that ujit.iprof.start have three arguments but we still use only two. Let’s figure out what limit argument is for, considering the following snippet.

local inspect = require 'inspect'

function ujit.iprof.profile(pfunction, pcallback, name, mode, level)
  return function(...)
    local s, e = ujit.iprof.start(name, mode, level)
    local t = { pfunction(...) }
    if s then pcallback(ujit.iprof.stop()) else pcb(nil, e) end
    return table.unpack(t)
  end
end

local function stats(t, e)
  if not t then error(e) end
  print(inspect(t, { indent = '\t' }))
end

function qr(message) print(message) end
function qx(message) qr(message) end
function qw(message) qx(message) end
function qq(message) qw(message) end

local mode = ujit.iprof[arg[1] or 'PLAIN']
local limit = tonumber(arg[2] or 42)

local q = ujit.iprof.profile(function (message) return qq(message) end, stats,
                             'HELLO-IPROF-LIMITS', mode, limit)

q('Hello, world!')

We add four more functions forming a chain of calls with print in the most inner one and wrapped a function with tail call to the most outer one. After running the following commands you will see reports like those below.

$ ujit -joff hello-iprof-limits.lua INCLUSIVE
Hello, world!
{
      ["HELLO-IPROF-LIMITS"] = {
              calls = 1,
              lua = 0.00013722601579502,
              subs = {
                      ["builtin #223"] = {
                              calls = 1,
                              lua = 3.6281999200583e-05,
                              wall = 3.6281999200583e-05
                      },
                      ["builtin #224"] = {
                              calls = 1,
                              lua = 1.7224054317921e-05,
                              wall = 1.7224054317921e-05
                      },
                      ["function @hello-iprof-limits.lua:20"] = {
                              calls = 1,
                              lua = 6.3598970882595e-05,
                              subs = {
                                      ["function @hello-iprof-limits.lua:19"] = {
                                              calls = 1,
                                              lua = 5.5114971473813e-05,
                                              subs = {
                                                      ["function @hello-iprof-limits.lua:18"] = {
                                                              calls = 1,
                                                              lua = 4.9913011025637e-05,
                                                              subs = {
                                                                      ["function @hello-iprof-limits.lua:17"] = {
                                                                              calls = 1,
                                                                              lua = 4.4531014282256e-05,
                                                                              subs = {
                                                                                      ["builtin #30"] = {
                                                                                              calls = 1,
                                                                                              lua = 3.8045982364565e-05,
                                                                                              wall = 3.8045982364565e-05
                                                                                      }
                                                                              },
                                                                              wall = 4.4531014282256e-05
                                                                      }
                                                              },
                                                              wall = 4.9913011025637e-05
                                                      }
                                              },
                                              wall = 5.5114971473813e-05
                                      }
                              },
                              wall = 6.3598970882595e-05
                      },
                      ["function @hello-iprof-limits.lua:25"] = {
                              calls = 1,
                              lua = 5.9989979490638e-06,
                              wall = 5.9989979490638e-06
                      }
              },
              wall = 0.00013722601579502
      }
}
$ ujit -joff hello-iprof-limits.lua EXCLUSIVE
Hello, world!
{
      ["HELLO-IPROF-LIMITS"] = {
              calls = 1,
              lua = 1.368101220578e-05,
              subs = {
                      ["builtin #223"] = {
                              calls = 1,
                              lua = 2.4234992451966e-05,
                              wall = 2.4234992451966e-05
                      },
                      ["builtin #224"] = {
                              calls = 1,
                              lua = 1.3143988326192e-05,
                              wall = 1.3143988326192e-05
                      },
                      ["function @hello-iprof-limits.lua:20"] = {
                              calls = 1,
                              lua = 8.6590298451483e-06,
                              subs = {
                                      ["function @hello-iprof-limits.lua:19"] = {
                                              calls = 1,
                                              lua = 5.8080186136067e-06,
                                              subs = {
                                                      ["function @hello-iprof-limits.lua:18"] = {
                                                              calls = 1,
                                                              lua = 5.1020178943872e-06,
                                                              subs = {
                                                                      ["function @hello-iprof-limits.lua:17"] = {
                                                                              calls = 1,
                                                                              lua = 5.7529541663826e-06,
                                                                              subs = {
                                                                                      ["builtin #30"] = {
                                                                                              calls = 1,
                                                                                              lua = 3.9995997212827e-05,
                                                                                              wall = 3.9995997212827e-05
                                                                                      }
                                                                              },
                                                                              wall = 5.7529541663826e-06
                                                                      }
                                                              },
                                                              wall = 5.1020178943872e-06
                                                      }
                                              },
                                              wall = 5.8080186136067e-06
                                      }
                              },
                              wall = 8.6590298451483e-06
                      },
                      ["function @hello-iprof-limits.lua:25"] = {
                              calls = 1,
                              lua = 6.1789760366082e-06,
                              wall = 6.1789760366082e-06
                      }
              },
              wall = 1.368101220578e-05
      }
}

No one will argue this report is too complicated for human analysis. Thus you can limit the profiling depth. Run both previous commands with the second parameter set to 1.

$ ujit -joff hello-iprof-limits.lua INCLUSIVE 1
Hello, world!
{
      ["HELLO-IPROF-LIMITS"] = {
              calls = 1,
              lua = 7.4664014391601e-05,
              subs = {
                      ["builtin #223"] = {
                              calls = 1,
                              lua = 8.6250365711749e-06,
                              wall = 8.6250365711749e-06
                      },
                      ["builtin #224"] = {
                              calls = 1,
                              lua = 8.2789920270443e-06,
                              wall = 8.2789920270443e-06
                      },
                      ["function @hello-iprof-limits.lua:20"] = {
                              calls = 1,
                              lua = 4.3603999074548e-05,
                              wall = 4.3603999074548e-05
                      },
                      ["function @hello-iprof-limits.lua:25"] = {
                              calls = 1,
                              lua = 5.8030127547681e-06,
                              wall = 5.8030127547681e-06
                      }
              },
              wall = 7.4664014391601e-05
      }
}
$ ujit -joff hello-iprof-limits.lua EXCLUSIVE 1
Hello, world!
{
      ["HELLO-IPROF-LIMITS"] = {
              calls = 1,
              lua = 1.579598756507e-05,
              subs = {
                      ["builtin #223"] = {
                              calls = 1,
                              lua = 1.6681966371834e-05,
                              wall = 1.6681966371834e-05
                      },
                      ["builtin #224"] = {
                              calls = 1,
                              lua = 1.3125012628734e-05,
                              wall = 1.3125012628734e-05
                      },
                      ["function @hello-iprof-limits.lua:20"] = {
                              calls = 1,
                              lua = 0.00010146002750844,
                              wall = 0.00010146002750844
                      },
                      ["function @hello-iprof-limits.lua:25"] = {
                              calls = 1,
                              lua = 6.9909729063511e-06,
                              wall = 6.9909729063511e-06
                      }
              },
              wall = 1.579598756507e-05
      }
}

Now, you see only the first-level callees of q function. Much more pretty report for your eyes, isn’t it?

Note

OK let’s take another look into the second report. Mention the fact that though mode is exclusive all stats for callees’ of qq defined at hello-iprof-limits.lua:20 are aggregated under its label as if all callees bodies are placed directly within qq.

Another one limit parameter benefit is as follows: there are no internal limits in platform for profiler memory usage, only system ones. Thereby proper usage of this parameter also allows to reduce memory overhead while profiling.

Lua stack unwinding while profiling

This section is proposed to shed some light on exception handling while profiling. Profiling process doesn’t spawn a “safe” scope similar to one pcall does hence this responsibility is shifted to Lua developers.

Let’s see the following snippet calling a wrapped function in a safe scope.

local inspect = require 'inspect'

function ujit.iprof.profile(pfunction, pcallback, name, mode, level)
  return function(...)
    local s, e = ujit.iprof.start(name, mode, level)
    local t = { pfunction(...) }
    if s then pcallback(ujit.iprof.stop()) else pcb(nil, e) end
    return table.unpack(t)
  end
end

local function dead(t, e)
  assert(false, 'This message is not being printed')
  print(inspect(t, { indent = '\t' }))
end

local function croak(message) error(message) end

local mode = ujit.iprof[arg[1] or 'PLAIN']

local die = ujit.iprof.profile(function(iter)
  local __ = 0
  for _ = 1, iter do
    __ = __ + _
    if _ == iter then
      croak('Hello, world!')
    end
  end
end, dead, 'SUB-NOT-BEING-PROFILED', ujit.iprof.PLAIN)

local ok, err = pcall(die, tonumber(arg[2] or 42))
if not ok then print(err) end

It throws an error on the last loop iteration and this error unwinds Lua stack up to the most outer frame related to the main chunk. Thus you don’t see a report produced by profiler in the output below.

$ ujit -joff hello-iprof-die.lua INCLUSIVE
hello-iprof-die.lua:17: Hello, world!

Now let’s change a bit our example and move a safe scope into the origin function.

local inspect = require 'inspect'

function ujit.iprof.profile(pfunction, pcallback, name, mode, level)
  return function(...)
    local s, e = ujit.iprof.start(name, mode, level)
    local t = { pfunction(...) }
    if s then pcallback(ujit.iprof.stop()) else pcb(nil, e) end
    return table.unpack(t)
  end
end

local function stats(t, e)
  if not t then error(e) end
  print(inspect(t, { indent = '\t' }))
end

local function croak(message) error(message) end

local mode = ujit.iprof[arg[1] or 'PLAIN']

local unwind = ujit.iprof.profile(function(iter)
  local __ = 0
  for _ = 1, iter do
    __ = __ + _
    if _ == iter then
      local ok, err = pcall(croak, 'Hello, world!')
      if not ok then print(err) end
    end
  end
end, stats, 'HELLO-IPROF-UNWIND', mode)

unwind(tonumber(arg[2] or 42))

This time we see the report but mention the fact that timings are misrepresented considering Lua stack unwinding.

$ ujit -joff hello-iprof-unwind.lua INCLUSIVE
hello-iprof-unwind.lua:17: Hello, world!
{
      ["HELLO-IPROF-UNWIND"] = {
              calls = 1,
              lua = 0.00025242200354114,
              subs = {
                      ["builtin #223"] = {
                              calls = 1,
                              lua = 8.8699744082987e-06,
                              wall = 8.8699744082987e-06
                      },
                      ["builtin #224"] = {
                              calls = 1,
                              lua = 1.317000715062e-05,
                              wall = 1.317000715062e-05
                      },
                      ["function @hello-iprof-unwind.lua:21"] = {
                              calls = 1,
                              lua = 0.0002179789589718,
                              wall = 0.0002179789589718
                      }
              },
              wall = 0.00025242200354114
      }
}

Coroutine profiling

Now let’s puzzle out the difference between lua and wall time values in report. The code below create a new child coroutine to be profiled with a single yield within it. After the child coroutine yields, the main coroutine sleeps for a few seconds, prints a message and resumes the child coroutine execution.

local inspect = require 'inspect'

local ffi = require 'ffi'
assert(ffi, 'Please ensure that you have an analogue for sleep(3)')
ffi.cdef('unsigned int sleep(unsigned int seconds);')
local sleep = ffi.C.sleep;

function ujit.iprof.profile(pfunction, pcallback, label, mode, limit)
  return function(...)
     local s, e = ujit.iprof.start(label, mode, limit)
     local t = { pfunction(...) }
     if s then pcallback(ujit.iprof.stop()) else pcallback(nil, e) end
     return table.unpack(t)
  end
end

local function stats(t, e)
  if not t then error(e) end
  print(inspect(t, { indent = '\t' }))
end

function qr(seconds) coroutine.yield(seconds) end
function qx(seconds) qr(seconds) end
function qw(seconds) qx(seconds) end
function qq(seconds) qw(seconds) end

local mode = ujit.iprof[arg[1] or 'PLAIN']
local sleep = tonumber(arg[2] or 16)
local limit = tonumber(arg[3] or 9)

local coro = coroutine.create(ujit.iprof.profile(function(seconds)
  return qq(seconds)
end, stats, 'HELLO-IPROF-CORO', mode, limit))

local ok, result = coroutine.resume(coro, sleep)
if not ok then error(result) end

-- Some function that sleeps for `result` seconds
sleep(result)
print('Hello, world!')

ok, result = coroutine.resume(coro)

assert(ok and coroutine.status(coro) == 'dead', 'Unexpected behaviour')

The difference between the time the child coroutine being executed and the profiling time equals exactly the time between its yield and resume.

$ ujit -joff hello-iprof-coro.lua
Hello, world!
{
      ["HELLO-IPROF-CORO"] = {
              calls = 1,
              lua = 1.7445010598749e-05,
              wall = 16.000266182993
      }
}

As you can see the main contribution in this difference is produced by the sleep the main coroutine does. Let’s decrease the time to sleep and look at the report for inclusive mode.

$ ujit -joff hello-iprof-coro.lua INCLUSIVE 5
Hello, world!
{
      ["HELLO-IPROF-CORO"] = {
              calls = 1,
              lua = 7.5089978054166e-05,
              subs = {
                      ["builtin #223"] = {
                              calls = 1,
                              lua = 8.7990192696452e-06,
                              wall = 8.7990192696452e-06
                      },
                      ["builtin #224"] = {
                              calls = 1,
                              lua = 1.8060964066535e-05,
                              wall = 1.8060964066535e-05
                      },
                      ["function @hello-iprof-coro.lua:23"] = {
                              calls = 1,
                              lua = 3.6311976145953e-05,
                              subs = {
                                      ["function @hello-iprof-coro.lua:22"] = {
                                              calls = 1,
                                              lua = 2.7927977498621e-05,
                                              subs = {
                                                      ["function @hello-iprof-coro.lua:21"] = {
                                                              calls = 1,
                                                              lua = 2.1678977645934e-05,
                                                              subs = {
                                                                      ["function @hello-iprof-coro.lua:20"] = {
                                                                              calls = 1,
                                                                              lua = 1.5911005903035e-05,
                                                                              subs = {
                                                                                      ["builtin #34"] = {
                                                                                              calls = 1,
                                                                                              lua = 8.3139748312533e-06,
                                                                                              wall = 5.0002951360075
                                                                                      }
                                                                              },
                                                                              wall = 5.0003027330386
                                                                      }
                                                              },
                                                              wall = 5.0003085010103
                                                      }
                                              },
                                              wall = 5.0003147500101
                                      }
                              },
                              wall = 5.0003231340088
                      },
                      ["function @hello-iprof-coro.lua:29"] = {
                              calls = 1,
                              lua = 5.3319963626564e-06,
                              wall = 5.3319963626564e-06
                      }
              },
              wall = 5.0003619120107
      }
}

Mention the time attributed to builtin #34 – it is a coroutine.yield call in the child coroutine, that lasts for a jiffy in Lua interpreter and spent 5 seconds according to the wall clock. Let’s limit the profiling depth and see how stats change.

$ ujit -joff hello-iprof-coro.lua INCLUSIVE 5 2
Hello, world!
{
      ["HELLO-IPROF-CORO"] = {
              calls = 1,
              lua = 6.4426043536514e-05,
              subs = {
                      ["builtin #223"] = {
                              calls = 1,
                              lua = 8.9500099420547e-06,
                              wall = 8.9500099420547e-06
                      },
                      ["builtin #224"] = {
                              calls = 1,
                              lua = 1.9173021428287e-05,
                              wall = 1.9173021428287e-05
                      },
                      ["function @hello-iprof-coro.lua:23"] = {
                              calls = 1,
                              lua = 2.3994012735784e-05,
                              subs = {
                                      ["function @hello-iprof-coro.lua:22"] = {
                                              calls = 1,
                                              lua = 1.2719014193863e-05,
                                              wall = 5.0002876989893
                                      }
                              },
                              wall = 5.0002989739878
                      },
                      ["function @hello-iprof-coro.lua:29"] = {
                              calls = 1,
                              lua = 5.6329881772399e-06,
                              wall = 5.6329881772399e-06
                      }
              },
              wall = 5.0003394060186
      }
}

All time the child coroutine waits to resume its execution is now attributed to the most inner function containing a coroutine.yield call.

Please mention, that inclusive mode accumulates the time coroutine waits to all callers. If we change mode to exclusive one the report differs.

$ ujit -joff hello-iprof-coro.lua EXCLUSIVE 5
Hello, world!
{
      ["HELLO-IPROF-CORO"] = {
              calls = 1,
              lua = 6.9399829953909e-06,
              subs = {
                      ["builtin #223"] = {
                              calls = 1,
                              lua = 1.2408010661602e-05,
                              wall = 1.2408010661602e-05
                      },
                      ["builtin #224"] = {
                              calls = 1,
                              lua = 1.490896102041e-05,
                              wall = 1.490896102041e-05
                      },
                      ["function @hello-iprof-coro.lua:23"] = {
                              calls = 1,
                              lua = 9.115959983319e-06,
                              subs = {
                                      ["function @hello-iprof-coro.lua:22"] = {
                                              calls = 1,
                                              lua = 6.5400381572545e-06,
                                              subs = {
                                                      ["function @hello-iprof-coro.lua:21"] = {
                                                              calls = 1,
                                                              lua = 6.0539459809661e-06,
                                                              subs = {
                                                                      ["function @hello-iprof-coro.lua:20"] = {
                                                                              calls = 1,
                                                                              lua = 7.9820165410638e-06,
                                                                              subs = {
                                                                                      ["builtin #34"] = {
                                                                                              calls = 1,
                                                                                              lua = 9.3470443971455e-06,
                                                                                              wall = 5.0001770540257
                                                                                      }
                                                                              },
                                                                              wall = 7.9820165410638e-06
                                                                      }
                                                              },
                                                              wall = 6.0539459809661e-06
                                                      }
                                              },
                                              wall = 6.5400381572545e-06
                                      }
                              },
                              wall = 9.115959983319e-06
                      },
                      ["function @hello-iprof-coro.lua:29"] = {
                              calls = 1,
                              lua = 5.7680299505591e-06,
                              wall = 5.7680299505591e-06
                      }
              },
              wall = 6.9399829953909e-06
      }
}

Now the time coroutine waits its execution is attributed only to the yielding functions or its most inner caller if the limit depth covers yielding call in report.

$ ujit -joff hello-iprof-coro.lua EXCLUSIVE 5 2
Hello, world!
{
      ["HELLO-IPROF-CORO"] = {
              calls = 1,
              lua = 8.0890022218227e-06,
              subs = {
                      ["builtin #223"] = {
                              calls = 1,
                              lua = 1.1270982213318e-05,
                              wall = 1.1270982213318e-05
                      },
                      ["builtin #224"] = {
                              calls = 1,
                              lua = 1.4292018022388e-05,
                              wall = 1.4292018022388e-05
                      },
                      ["function @hello-iprof-coro.lua:23"] = {
                              calls = 1,
                              lua = 1.3369950465858e-05,
                              subs = {
                                      ["function @hello-iprof-coro.lua:22"] = {
                                              calls = 1,
                                              lua = 1.2389034964144e-05,
                                              wall = 5.0002167590428
                                      }
                              },
                              wall = 1.3369950465858e-05
                      },
                      ["function @hello-iprof-coro.lua:29"] = {
                              calls = 1,
                              lua = 6.5490021370351e-06,
                              wall = 6.5490021370351e-06
                      }
              },
              wall = 8.0890022218227e-06
      }
}