C API to check VM is running?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

C API to check VM is running?

Xin Zhao
Hello,
I'm current write a lua perftool like gperftools, it use signals call back to set lua hook by lua_sethook, and in the hook function get the lua call stack by lua_getinfo and save it, because some lua C API running in signals call back can cause crash.
and in pure lua program it runs well.
but sometimes the program may like this:

while (true)
{
     do_something_1();
     do_something_2();
     do_something_3();
     ......
     call_lua_func("lua_main");
}

when the signals trigged, it may running in C native code "do_something_2()",  and in the hook function I get the lua info will be the "lua_main", it is not correct.
so I used an ugly code like:

static void SignalHandler(int sig, siginfo_t *sinfo, void *ucontext)
{
    // L-nny == 0 && L-nCcalls == 0
    unsigned short nny = *(unsigned short *)((char*)L+196);
    unsigned short nCcalls = *(unsigned short *)((char*)L+198);
    if (nny == 0 && nCcalls == 0)
    {
        return;
    }
    lua_sethook(gL, SignalHandlerHook, LUA_MASKCOUNT, 1);
}

so is there a C API can check VM is running?
ps, my code is in https://github.com/esrrhs/pLua

Regards
Zhao Xin
Reply | Threaded
Open this post in threaded view
|

Re: C API to check VM is running?

Sean Conner
It was thus said that the Great Xin Zhao once stated:
>  Hello,
> I'm current write a lua perftool like gperftools, it use signals call back
> to set lua hook by lua_sethook, and in the hook function get the lua call
> stack by lua_getinfo and save it, because some lua C API running in signals
> call back can cause crash.
> and in pure lua program it runs well.

  I would avoid the use of signal() (or sigaction()) entirely.  It is too
hard to write sane signal handlers as there are nearly no standard C
function [1] that can be safely called, and a limited number of POSIX
functions that can be called [2].  Just avoid any use of signals.

  I recently had to profile an application that is mostly written in Lua
(with some C code).  I first profiled the code at the C level (this under
Linux---easy enough to do by using the "-pg" linking flag when making the
final executable, running it, and checking the output afterwards with
'gprof'):

        http://boston.conman.org/2019/08/20.1

  This revealved that LPEG was the hot spot.  To profile the Lua code, I
wrote some code *in Lua* to collect the profile code [3]:

        http://boston.conman.org/2019/08/21.1

  That wasn't that surprising---the code does a ton of LPEG processing (SIP
messaging).  What was surprising was the top hotspot.  It took a few days of
thinking, but I did clean up that hot spot:

        http://boston.conman.org/2019/08/29.1

as well as only parsing the SIP headers we actually care about (instead of
the nearly 100, all in alphabetical order---that code was written when I
wasn't sure what we needed) and we did get a decent increase in the
performance.

> but sometimes the program may like this:
>
> while (true)
> {
>      do_something_1();
>      do_something_2();
>      do_something_3();
>      ......
>      call_lua_func("lua_main");
> }
>
> when the signals trigged, it may running in C native code
> "do_something_2()",  and in the hook function I get the lua info will be
> the "lua_main", it is not correct.
> so I used an ugly code like:
>
> static void SignalHandler(int sig, siginfo_t *sinfo, void *ucontext)
> {
>     // L-nny == 0 && L-nCcalls == 0
>     unsigned short nny = *(unsigned short *)((char*)L+196);
>     unsigned short nCcalls = *(unsigned short *)((char*)L+198);
>     if (nny == 0 && nCcalls == 0)
>     {
>         return;
>     }
>     lua_sethook(gL, SignalHandlerHook, LUA_MASKCOUNT, 1);
> }
>
> so is there a C API can check VM is running?
> ps, my code is in https://github.com/esrrhs/pLua

  I don't think there's any meaningful way to profile both C and Lua code in
the same application, and that signal handler is ... well, I would reject
that signal handler outright in any code review (even as a hack).  How did
you determine that the nny field is at offset 196?  Or the nCcalls field is
at 198?  I would at the very least include lstate.h into that source file
and rewrite the function as:

        static void SignalHandler(int sig,siginfo_t *sinfo,void *ucontext)
        {
          (void)sig;
          (void)sinfo;
          (void)ucontext;

          /* shouldn't this be gL?  Where's L defined? */
          if ((L->nny == 0) && (L->nCcalls == 0))
          {
            return;
          }
          lua_sethook(gL,SignalHandlerHook,LUA_MASKCOUNT,1);
        }

  But overall, I think your approach with signals is not the way to go.

  -spc

[1] memset() (and memmove()/memcpy() functions from string.h) were only
        added to the "safe to call from signal handler" list in 2016!  See
        the discussion here for just how crazy this stuff can get:

        https://news.ycombinator.com/item?id=13313563

[2] There's a list of safe functions, about half way down the page, just
        prior to section 2.5.

        https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html

[3] The code runs lots of coroutines.  When I created a coroutine, I add
        a call to the sample2() function below to record its execution (it
        saves the filename, function name (if one is available) or the line
        number (if the function name isn't available).

        At the end of the program run, I called sample2_dump() to dump the
        information.  Then I ran "sort -rn sample2.txt" to see the resulting
        profile.

        local PROFINFO = {}
       
        function sample2(fun,freq)
          local function hook()
            local info = debug.getinfo(2,"nSl")
            local key
            local val
       
            if not info.name or info.name == "" or info.name == "?" then
              key = string.format("%s:%d",info.source,info.currentline)
            else
              key = string.format("%s:%s",info.source,info.name)
            end
       
            val = PROFINFO[key] or 1
            PROFINFO[key] = val + 1
          end
       
          debug.sethook(fun,hook,"",freq or 97)
        end
       
        function sample2_dump()
          local f = io.open("sample2.txt","w") or io.stderr
          for key,val in pairs(PROFINFO) do
            f:write(string.format("%8d\t%s\n",val,key))
          end
          f:close()
        end

Reply | Threaded
Open this post in threaded view
|

Re: C API to check VM is running?

Xin Zhao
thank you for your reply.
I read your blog and code carefully. It made me learn a lot.

Yes, using signal functions is very dangerous, but read the source
code, lua_sethook is safe to be called in signals. so I used it.

/*
** This function can be called asynchronously (e.g. during a signal).
** Fields 'oldpc', 'basehookcount', and 'hookcount' (set by
** 'resethookcount') are for debug only, and it is no problem if they
** get arbitrary values (causes at most one wrong hook call). 'hookmask'
** is an atomic value. We assume that pointers are atomic too (e.g., gcc
** ensures that for all platforms where it runs). Moreover, 'hook' is
** always checked before being called (see 'luaD_hook').
*/
LUA_API void lua_sethook (lua_State *L, lua_Hook func, int mask, int count) {

My goal is to only analyze lua code hotspots.
And the reason why I did not use lua debug sampling is because 90% of
the code in our program is lua, and lua also calls many bound C++
functions.
The problem with lua debug is slow and inaccurate, because the
execution time of each instruction is different. It may take more time
to run 1 instructions than 100 instruction, such as calling a bound
C++ function to write a file.
So I use signals to sample and get the real hotspots.When the signal
is called back, it is found that if lua vm is not executed, then this
sampling is directly ignored.

Currently this profiling tool is running well, of course, as you see,
the code below is bad, and may be only work at my machine.

unsigned short nny = *(unsigned short *)((char*)L+196);
unsigned short nCcalls = *(unsigned short *)((char*)L+198);

so may be it would be better if there is a C API here. to check lua VM
is running.


Regards

Reply | Threaded
Open this post in threaded view
|

Re: C API to check VM is running?

Sean Conner
It was thus said that the Great Xin Zhao once stated:
> thank you for your reply.
> I read your blog and code carefully. It made me learn a lot.
>
> Yes, using signal functions is very dangerous, but read the source
> code, lua_sethook is safe to be called in signals. so I used it.

  I'm wondering why you are using signals for this.  I know on Linux, you
can link a program with the "-pg" options to generate an executable that can
be run to generate profile information.  On the Mac and Solaris (and I
believe FreeBSD), you don't even need to do that as you can use dtrace to
profile any arbitrary program.  And I'm sure there are ways of profiling
under Windows (although I am not familiar with Windows).

  In my experience, signal handlers are not worth the effort.

> /*
> ** This function can be called asynchronously (e.g. during a signal).
> ** Fields 'oldpc', 'basehookcount', and 'hookcount' (set by
> ** 'resethookcount') are for debug only, and it is no problem if they
> ** get arbitrary values (causes at most one wrong hook call). 'hookmask'
> ** is an atomic value. We assume that pointers are atomic too (e.g., gcc
> ** ensures that for all platforms where it runs). Moreover, 'hook' is
> ** always checked before being called (see 'luaD_hook').
> */
> LUA_API void lua_sethook (lua_State *L, lua_Hook func, int mask, int count) {
>
> My goal is to only analyze lua code hotspots.
> And the reason why I did not use lua debug sampling is because 90% of
> the code in our program is lua, and lua also calls many bound C++
> functions.
> The problem with lua debug is slow and inaccurate, because the
> execution time of each instruction is different.

  A hot spot will show up regardless of frequency of sample.  Yes, profiling
code may slow it down, but the profiling overhead isn't counted when
profiling.  As I mentioned in my post

        http://boston.conman.org/2019/08/21.1

I used a rather large timing interval at first, and the hot spot jumped out.
When I ran it with an interval that was 1/10th the size (10 times more
frequently) the hot stop didn't change.

> It may take more time
> to run 1 instructions than 100 instruction, such as calling a bound
> C++ function to write a file.

  I only wrote the output at the end of the run, not as it was running.

> Currently this profiling tool is running well, of course, as you see,
> the code below is bad, and may be only work at my machine.
>
> unsigned short nny = *(unsigned short *)((char*)L+196);
> unsigned short nCcalls = *(unsigned short *)((char*)L+198);
>
> so may be it would be better if there is a C API here. to check lua VM
> is running.

  Include lstate.h so you don't have to manually calculate the offsets.

  -spc