OP_HALT is really useful

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

OP_HALT is really useful

Sven Olsen
I've just spent some time playing with VS Code, which has a couple very nice community written Lua debuggers.  However, as Seungjae Lee notes in his README.md, the performance hit that one sees from using one of these debuggers is reduced dramatically if one registers breakpoints using Dan Tull's OP_HALT patch in place of debug.sethook().

I don't fully understand the reasons /why/ OP_HALT is so useful, but, I can confirm that, in my own use case at least, implementing OP_HALT changes the overhead from running with a debugger from 'severe' to 'unnoticeable'.

Unfortunately, Dan's original patch is now 7+ years old, and Seungjae Lee's 5.1.5 version is written against a version of the codebase that's only slightly newer.  I have managed to apply Seungjae Lee's version of OP_HALT to my own modded copy of Lua 5.2, and it appears to work well.  So, if there's a need for a 5.2 version, interested parties should let me know, and perhaps we can create an official-ish entry for OP_HALT on the power patches page.  (This is not an entirely altruistic suggestion -- I suspect there may be some some quirks with the interaction of OP_HALT and 5.2-era features like GOTOs that could use a bit more investigation.)

Beyond that, I would submit to Roberto et al. that adding something OP_HALT-ish to the official Lua sources might not be a bad idea -- I was shocked by what a difference it makes.

-Sven
Reply | Threaded
Open this post in threaded view
|

Re: OP_HALT is really useful

Russell Haley


On Sun, May 13, 2018 at 5:09 PM, Sven Olsen <[hidden email]> wrote:
I've just spent some time playing with VS Code, which has a couple very nice community written Lua debuggers.  However, as Seungjae Lee notes in his README.md, the performance hit that one sees from using one of these debuggers is reduced dramatically if one registers breakpoints using Dan Tull's OP_HALT patch in place of debug.sethook().

I don't fully understand the reasons /why/ OP_HALT is so useful, but, I can confirm that, in my own use case at least, implementing OP_HALT changes the overhead from running with a debugger from 'severe' to 'unnoticeable'.

Unfortunately, Dan's original patch is now 7+ years old, and Seungjae Lee's 5.1.5 version is written against a version of the codebase that's only slightly newer.  I have managed to apply Seungjae Lee's version of OP_HALT to my own modded copy of Lua 5.2, and it appears to work well. 

Well, I just happen to by blindly thrashing at the Lua 5.4 code as we speak. Sean Connor just mentioned on a different thread the need for a true breakpoint. Thanks for the tip!

Russ

 
So, if there's a need for a 5.2 version, interested parties should let me know, and perhaps we can create an official-ish entry for OP_HALT on the power patches page.  (This is not an entirely altruistic suggestion -- I suspect there may be some some quirks with the interaction of OP_HALT and 5.2-era features like GOTOs that could use a bit more investigation.)

Beyond that, I would submit to Roberto et al. that adding something OP_HALT-ish to the official Lua sources might not be a bad idea -- I was shocked by what a difference it makes.

-Sven

Reply | Threaded
Open this post in threaded view
|

Re: OP_HALT is really useful

Dan Tull
In reply to this post by Sven Olsen

> ... implementing OP_HALT changes the overhead from running with a debugger from 'severe' to 'unnoticeable'.

Sweet! That's exactly why I wrote it. Glad it was helpful (if rather ancient at this point). The difference between slowing down every opcode to check if there's a breakpoint and breakpoints that only have overhead if they get hit is like night and day.


> I would submit to Roberto et al. that adding something OP_HALT-ish to the official Lua sources might not be a bad idea...

One other fun experiment I did with OP_HALT is to use it to pepper a whole codebase with self-clearing breakpoints for nearly zero overhead instruction level code coverage (code coverage with the line hook is punishingly slow in some of our apps whereas I accidentally left my experimental breakpoint based code coverage on for a couple of weeks because I didn't notice the speed penalty). It can also be used for dtrace-like instrumentation points.


At one point, I did apply the OP_HALT patches to 5.2 and 5.3 internally, but we have never really found a compelling reason to actually move from our 5.1.x derivative, so I never published those patches (it was fairly straightforward but not automatic to port the patch from 5.1 as I recall) since they never got enough use to be all that confident in their correctness.


DT


There were a couple of other bugs found since that patch was posted as well (mostly quite obscure, but no sense keeping them to myself):

commit e861f4f9cd3440d2710f8f1d66c3db541aae52b5
Author: Dan Tull <[hidden email]>
Date:   Thu Jul 18 18:01:07 2013 -0500

    Fix obscure bug in breakpoint offset adjustment.
    
    These two opcode tests should be done on the same opcode. Otherwise, it could result in the breakpoint landing on the previous opcode and thus the previous line. This would only have a user visible effect in the rare case of a SETLIST opcode with a non-zero C argument as the last argument on its line followed by an opcode whose C argument was 0 on the subsequent line. In the even rarer case where the opcode prior to the SETLIST opcode was also one that should not be patched, it could even cause corruption of the bytecode's validity that could lead to undefined behavior (including crashing).

diff --git a/src/ldebug.c b/src/ldebug.c
index 5a20b7e..c096f62 100644
--- a/src/ldebug.c
+++ b/src/ldebug.c
@@ -991,7 +991,7 @@ static int avoidpseudo(lua_State *L, Proto *p, int offset) {
         default:
           return offset; // bare JMP, which is fine.
       }
-    } else if (GET_OPCODE(pr) == OP_SETLIST && GETARG_C(i) == 0) {
+    } else if (GET_OPCODE(pr) == OP_SETLIST && GETARG_C(pr) == 0) {
       // a SETLIST with a C of 0 treats the next opcode as a raw int
       // to do very large batch settable operations, which means that
       // "opcode" should not be patched, obviously.

commit febf510f9aecc96871bcef411df258bbc297cbad
Author: Dan Tull <[hidden email]>
Date:   Thu Mar 28 14:04:55 2013 -0500

    Fixes to keep OP_HALT from confusing debug.getinfo.
    
    Some of the routines in ldebug.c scan the bytecode to decide how to describe registers and stack frames.
    If there are OP_HALT opcodes (breakpoints) in just the wrong places, it can confuse this logic and cause backtraces to have less name information than expected or incorrect names/types in error messages.
    
    Merged from //lua/tools/halide CL 892871 and 893037.

diff --git a/src/ldebug.c b/src/ldebug.c
index 1dfb1b7..5a20b7e 100644
--- a/src/ldebug.c
+++ b/src/ldebug.c
@@ -314,6 +314,13 @@ LUA_API int lua_getinfo (lua_State *L, const char *what, lua_Debug *ar) {
 
 #define checkreg(pt,reg) check((reg) < (pt)->maxstacksize)
 
+// note: duplicated in lvm.c and ldo.c due to dependency tangle (below requires lopcodes.h and lobject.h)
+#ifdef LUA_HALT_OP
+#define GET_REAL_INSTR(i,p) (GET_OPCODE(i) == OP_HALT ? (p->halts[GETARG_Bx(i)].orig) : (i))
+#else
+#define GET_REAL_INSTR(i,p) (i)
+#endif
+
 
 
 static int precheck (const Proto *pt) {
@@ -323,7 +330,7 @@ static int precheck (const Proto *pt) {
               (pt->is_vararg & VARARG_HASARG));
   check(pt->sizeupvalues <= pt->nups);
   check(pt->sizelineinfo == pt->sizecode || pt->sizelineinfo == 0);
-  check(pt->sizecode > 0 && GET_OPCODE(pt->code[pt->sizecode-1]) == OP_RETURN);
+  check(pt->sizecode > 0 && GET_OPCODE(GET_REAL_INSTR(pt->code[pt->sizecode-1], pt)) == OP_RETURN);
   return 1;
 }
 
@@ -370,7 +377,7 @@ static Instruction symbexec (const Proto *pt, int lastpc, int reg) {
   last = pt->sizecode-1;  /* points to final return (a `neutral' instruction) */
   check(precheck(pt));
   for (pc = 0; pc < lastpc; pc++) {
-    Instruction i = pt->code[pc];
+    Instruction i = GET_REAL_INSTR(pt->code[pc], pt);
     OpCode op = GET_OPCODE(i);
     int a = GETARG_A(i);
     int b = 0;
@@ -403,7 +410,7 @@ static Instruction symbexec (const Proto *pt, int lastpc, int reg) {
                go all the way back to the first of them (if any) */
             for (j = 0; j < dest; j++) {
               Instruction d = pt->code[dest-1-j];
-              if (!(GET_OPCODE(d) == OP_SETLIST && GETARG_C(d) == 0)) break;
+              if (!(GET_OPCODE(GET_REAL_INSTR(d, pt)) == OP_SETLIST && GETARG_C(GET_REAL_INSTR(d, pt)) == 0)) break;
             }
             /* if 'j' is even, previous value is not a setlist (even if
                it looks like one) */
@@ -418,13 +425,13 @@ static Instruction symbexec (const Proto *pt, int lastpc, int reg) {
     }
     if (testTMode(op)) {
       check(pc+2 < pt->sizecode);  /* check skip */
-      check(GET_OPCODE(pt->code[pc+1]) == OP_JMP);
+      check(GET_OPCODE(GET_REAL_INSTR(pt->code[pc+1], pt)) == OP_JMP);
     }
     switch (op) {
       case OP_LOADBOOL: {
         if (c == 1) {  /* does it jump? */
           check(pc+2 < pt->sizecode);  /* check its jump */
-          check(GET_OPCODE(pt->code[pc+1]) != OP_SETLIST ||
+          check(GET_OPCODE(GET_REAL_INSTR(pt->code[pc+1], pt)) != OP_SETLIST ||
                 GETARG_C(pt->code[pc+1]) != 0);
         }
         break;
@@ -503,7 +510,7 @@ static Instruction symbexec (const Proto *pt, int lastpc, int reg) {
         nup = pt->p[b]->nups;
         check(pc + nup < pt->sizecode);
         for (j = 1; j <= nup; j++) {
-          OpCode op1 = GET_OPCODE(pt->code[pc + j]);
+          OpCode op1 = GET_OPCODE(GET_REAL_INSTR(pt->code[pc + j], pt));
           check(op1 == OP_GETUPVAL || op1 == OP_MOVE);
         }
         if (reg != NO_REG)  /* tracing? */
@@ -521,7 +528,7 @@ static Instruction symbexec (const Proto *pt, int lastpc, int reg) {
       default: break;
     }
   }
-  return pt->code[last];
+  return GET_REAL_INSTR(pt->code[last], pt);
 }
 
 #undef check
@@ -596,7 +603,7 @@ static const char *getfuncname (lua_State *L, CallInfo *ci, const char **name) {
   if ((isLua(ci) && ci->tailcalls > 0) || !isLua(ci - 1))
     return NULL;  /* calling function is not Lua (or is unknown) */
   ci--;  /* calling function */
-  i = ci_func(ci)->l.p->code[currentpc(L, ci)];
+  i = GET_REAL_INSTR(ci_func(ci)->l.p->code[currentpc(L, ci)], ci_func(ci)->l.p);
   if (GET_OPCODE(i) == OP_CALL || GET_OPCODE(i) == OP_TAILCALL ||
       GET_OPCODE(i) == OP_TFORLOOP)
     return getobjname(L, ci, GETARG_A(i), name);


commit 761cd7022def90b2953ab71dcc6109235af91eea
Author: Dan Tull <[hidden email]>
Date:   Thu Jul 10 18:00:35 2014 -0500

    Fix for instruction operand truncation issue.
    
    The Bx operand can only hold values in the range [0, 2^18), so at most 2^18 breakpoints can be set.
    Previously, an attempt to set breakpoints beyond that point resulted in application of incorrect original opcodes which could lead to bad behavior.
    
    However, aside from experimentation with programmatically setting many breakpoints this limit is highly unlikely to be reached.

diff --git a/src/ldebug.c b/src/ldebug.c
index c096f62..c6a39ed 100644
--- a/src/ldebug.c
+++ b/src/ldebug.c
@@ -1031,6 +1031,10 @@ static int sethalt(lua_State *L, Proto *p, int offset, lua_Hook hook) {
   existing = findhalt(L, p, offset);
 
   if (existing < 0) {
+    if (p->sizehalts > MAXARG_Bx) {
+      return -1;
+    }
+
     luaM_reallocvector(L, p->halts, p->sizehalts, p->sizehalts + 1, Halt);
     existing = p->sizehalts;
     p->sizehalts = p->sizehalts + 1;





From: [hidden email] <[hidden email]> on behalf of Sven Olsen <[hidden email]>
Sent: Sunday, May 13, 2018 7:09 PM
To: Lua mailing list
Subject: OP_HALT is really useful
 
I've just spent some time playing with VS Code, which has a couple very nice community written Lua debuggers.  However, as Seungjae Lee notes in his README.md, the performance hit that one sees from using one of these debuggers is reduced dramatically if one registers breakpoints using Dan Tull's OP_HALT patch in place of debug.sethook().

I don't fully understand the reasons /why/ OP_HALT is so useful, but, I can confirm that, in my own use case at least, implementing OP_HALT changes the overhead from running with a debugger from 'severe' to 'unnoticeable'.

Unfortunately, Dan's original patch is now 7+ years old, and Seungjae Lee's 5.1.5 version is written against a version of the codebase that's only slightly newer.  I have managed to apply Seungjae Lee's version of OP_HALT to my own modded copy of Lua 5.2, and it appears to work well.  So, if there's a need for a 5.2 version, interested parties should let me know, and perhaps we can create an official-ish entry for OP_HALT on the power patches page.  (This is not an entirely altruistic suggestion -- I suspect there may be some some quirks with the interaction of OP_HALT and 5.2-era features like GOTOs that could use a bit more investigation.)

Beyond that, I would submit to Roberto et al. that adding something OP_HALT-ish to the official Lua sources might not be a bad idea -- I was shocked by what a difference it makes.

-Sven
Reply | Threaded
Open this post in threaded view
|

Re: OP_HALT is really useful

Dan Tull

Oh wait, one more (not sure if these are still applicable, but maybe worth a look):


This won't be directly applicable because it has LUA_TRIGGERHOOK code in it (related to a sampling profiler hook we added but never shared), but the principle of the bug may still apply and the description + patch should make it reasonably easy to confirm and apply a corresponding fix.


DT


commit 024bcdb6f71d01534b72fd7ea19ebf26b5a95914

Author: Dan Tull <[hidden email]>

Date:   Thu Sep 27 10:48:48 2012 -0800


    (@852507)

    Fix a very rare but nasty bug in breakpoints in the Lua VM code:

    1. Set a single breakpoint (it helps to pick a common entry point from native code).

    2. Pause the app so that you stop on the same opcode via the trigger hook.

    3. While paused via trigger, clear the breakpoint.

    4. Resume the debuggee.

    

    Result: SEGFAULT

    

    Solution: don't fetch the instruction until after the hook executes so you don't end up with a dangling halt.

    

    Normally, I'd delimit the patch based so the original version is preserved

    It would be overly messy for no behavioral difference in this case, though.

    

    We never saw this because pause didn't used to be as fast as it is now, so there was only a tiny probability of stopping due to a pause request on a line that also had a breakpoint (and clearing that breakpoint).

    

    [git-p4: depot-paths = "//lua/main/third_party/lua_tp/lua_5_1/": change = 852508]


diff --git a/src/lvm.c b/src/lvm.c

index e970cd9..c7f02b7 100644

--- a/src/lvm.c

+++ b/src/lvm.c

@@ -417,8 +417,6 @@ void luaV_execute (lua_State *L, int nexeccalls) {

   k = cl->p->k;

   /* main loop of interpreter */

   for (;;) {

-    Instruction i = *pc++;

-    StkId ra;

 #ifdef LUA_TRIGGERHOOK

     if ((L->hookmask & (LUA_MASKLINE | LUA_MASKCOUNT | LUA_MASKTRIGGER)) &&

         (--L->hookcount == 0 || L->hookmask & LUA_MASKLINE || G(L)->triggers != 0)) {

@@ -426,13 +424,15 @@ void luaV_execute (lua_State *L, int nexeccalls) {

     if ((L->hookmask & (LUA_MASKLINE | LUA_MASKCOUNT)) &&

         (--L->hookcount == 0 || L->hookmask & LUA_MASKLINE)) {

 #endif

-      traceexec(L, pc);

+      traceexec(L, pc + 1);

       if (L->status == LUA_YIELD) {  /* did hook yield? */

-        L->savedpc = pc - 1;

+        L->savedpc = pc;

         return;

       }

       base = L->base;

     }

+    Instruction i = *pc++;

+    StkId ra;

     /* warning!! several calls may realloc the stack and invalidate `ra' */

 #ifdef LUA_HALT_OP

    resume:





From: [hidden email] <[hidden email]> on behalf of Dan Tull <[hidden email]>
Sent: Thursday, May 24, 2018 6:47 PM
To: Lua mailing list
Subject: Re: OP_HALT is really useful
 

> ... implementing OP_HALT changes the overhead from running with a debugger from 'severe' to 'unnoticeable'.

Sweet! That's exactly why I wrote it. Glad it was helpful (if rather ancient at this point). The difference between slowing down every opcode to check if there's a breakpoint and breakpoints that only have overhead if they get hit is like night and day.


> I would submit to Roberto et al. that adding something OP_HALT-ish to the official Lua sources might not be a bad idea...

One other fun experiment I did with OP_HALT is to use it to pepper a whole codebase with self-clearing breakpoints for nearly zero overhead instruction level code coverage (code coverage with the line hook is punishingly slow in some of our apps whereas I accidentally left my experimental breakpoint based code coverage on for a couple of weeks because I didn't notice the speed penalty). It can also be used for dtrace-like instrumentation points.


At one point, I did apply the OP_HALT patches to 5.2 and 5.3 internally, but we have never really found a compelling reason to actually move from our 5.1.x derivative, so I never published those patches (it was fairly straightforward but not automatic to port the patch from 5.1 as I recall) since they never got enough use to be all that confident in their correctness.


DT


There were a couple of other bugs found since that patch was posted as well (mostly quite obscure, but no sense keeping them to myself):

commit e861f4f9cd3440d2710f8f1d66c3db541aae52b5
Author: Dan Tull <[hidden email]>
Date:   Thu Jul 18 18:01:07 2013 -0500

    Fix obscure bug in breakpoint offset adjustment.
    
    These two opcode tests should be done on the same opcode. Otherwise, it could result in the breakpoint landing on the previous opcode and thus the previous line. This would only have a user visible effect in the rare case of a SETLIST opcode with a non-zero C argument as the last argument on its line followed by an opcode whose C argument was 0 on the subsequent line. In the even rarer case where the opcode prior to the SETLIST opcode was also one that should not be patched, it could even cause corruption of the bytecode's validity that could lead to undefined behavior (including crashing).

diff --git a/src/ldebug.c b/src/ldebug.c
index 5a20b7e..c096f62 100644
--- a/src/ldebug.c
+++ b/src/ldebug.c
@@ -991,7 +991,7 @@ static int avoidpseudo(lua_State *L, Proto *p, int offset) {
         default:
           return offset; // bare JMP, which is fine.
       }
-    } else if (GET_OPCODE(pr) == OP_SETLIST && GETARG_C(i) == 0) {
+    } else if (GET_OPCODE(pr) == OP_SETLIST && GETARG_C(pr) == 0) {
       // a SETLIST with a C of 0 treats the next opcode as a raw int
       // to do very large batch settable operations, which means that
       // "opcode" should not be patched, obviously.

commit febf510f9aecc96871bcef411df258bbc297cbad
Author: Dan Tull <[hidden email]>
Date:   Thu Mar 28 14:04:55 2013 -0500

    Fixes to keep OP_HALT from confusing debug.getinfo.
    
    Some of the routines in ldebug.c scan the bytecode to decide how to describe registers and stack frames.
    If there are OP_HALT opcodes (breakpoints) in just the wrong places, it can confuse this logic and cause backtraces to have less name information than expected or incorrect names/types in error messages.
    
    Merged from //lua/tools/halide CL 892871 and 893037.

diff --git a/src/ldebug.c b/src/ldebug.c
index 1dfb1b7..5a20b7e 100644
--- a/src/ldebug.c
+++ b/src/ldebug.c
@@ -314,6 +314,13 @@ LUA_API int lua_getinfo (lua_State *L, const char *what, lua_Debug *ar) {
 
 #define checkreg(pt,reg) check((reg) < (pt)->maxstacksize)
 
+// note: duplicated in lvm.c and ldo.c due to dependency tangle (below requires lopcodes.h and lobject.h)
+#ifdef LUA_HALT_OP
+#define GET_REAL_INSTR(i,p) (GET_OPCODE(i) == OP_HALT ? (p->halts[GETARG_Bx(i)].orig) : (i))
+#else
+#define GET_REAL_INSTR(i,p) (i)
+#endif
+
 
 
 static int precheck (const Proto *pt) {
@@ -323,7 +330,7 @@ static int precheck (const Proto *pt) {
               (pt->is_vararg & VARARG_HASARG));
   check(pt->sizeupvalues <= pt->nups);
   check(pt->sizelineinfo == pt->sizecode || pt->sizelineinfo == 0);
-  check(pt->sizecode > 0 && GET_OPCODE(pt->code[pt->sizecode-1]) == OP_RETURN);
+  check(pt->sizecode > 0 && GET_OPCODE(GET_REAL_INSTR(pt->code[pt->sizecode-1], pt)) == OP_RETURN);
   return 1;
 }
 
@@ -370,7 +377,7 @@ static Instruction symbexec (const Proto *pt, int lastpc, int reg) {
   last = pt->sizecode-1;  /* points to final return (a `neutral' instruction) */
   check(precheck(pt));
   for (pc = 0; pc < lastpc; pc++) {
-    Instruction i = pt->code[pc];
+    Instruction i = GET_REAL_INSTR(pt->code[pc], pt);
     OpCode op = GET_OPCODE(i);
     int a = GETARG_A(i);
     int b = 0;
@@ -403,7 +410,7 @@ static Instruction symbexec (const Proto *pt, int lastpc, int reg) {
                go all the way back to the first of them (if any) */
             for (j = 0; j < dest; j++) {
               Instruction d = pt->code[dest-1-j];
-              if (!(GET_OPCODE(d) == OP_SETLIST && GETARG_C(d) == 0)) break;
+              if (!(GET_OPCODE(GET_REAL_INSTR(d, pt)) == OP_SETLIST && GETARG_C(GET_REAL_INSTR(d, pt)) == 0)) break;
             }
             /* if 'j' is even, previous value is not a setlist (even if
                it looks like one) */
@@ -418,13 +425,13 @@ static Instruction symbexec (const Proto *pt, int lastpc, int reg) {
     }
     if (testTMode(op)) {
       check(pc+2 < pt->sizecode);  /* check skip */
-      check(GET_OPCODE(pt->code[pc+1]) == OP_JMP);
+      check(GET_OPCODE(GET_REAL_INSTR(pt->code[pc+1], pt)) == OP_JMP);
     }
     switch (op) {
       case OP_LOADBOOL: {
         if (c == 1) {  /* does it jump? */
           check(pc+2 < pt->sizecode);  /* check its jump */
-          check(GET_OPCODE(pt->code[pc+1]) != OP_SETLIST ||
+          check(GET_OPCODE(GET_REAL_INSTR(pt->code[pc+1], pt)) != OP_SETLIST ||
                 GETARG_C(pt->code[pc+1]) != 0);
         }
         break;
@@ -503,7 +510,7 @@ static Instruction symbexec (const Proto *pt, int lastpc, int reg) {
         nup = pt->p[b]->nups;
         check(pc + nup < pt->sizecode);
         for (j = 1; j <= nup; j++) {
-          OpCode op1 = GET_OPCODE(pt->code[pc + j]);
+          OpCode op1 = GET_OPCODE(GET_REAL_INSTR(pt->code[pc + j], pt));
           check(op1 == OP_GETUPVAL || op1 == OP_MOVE);
         }
         if (reg != NO_REG)  /* tracing? */
@@ -521,7 +528,7 @@ static Instruction symbexec (const Proto *pt, int lastpc, int reg) {
       default: break;
     }
   }
-  return pt->code[last];
+  return GET_REAL_INSTR(pt->code[last], pt);
 }
 
 #undef check
@@ -596,7 +603,7 @@ static const char *getfuncname (lua_State *L, CallInfo *ci, const char **name) {
   if ((isLua(ci) && ci->tailcalls > 0) || !isLua(ci - 1))
     return NULL;  /* calling function is not Lua (or is unknown) */
   ci--;  /* calling function */
-  i = ci_func(ci)->l.p->code[currentpc(L, ci)];
+  i = GET_REAL_INSTR(ci_func(ci)->l.p->code[currentpc(L, ci)], ci_func(ci)->l.p);
   if (GET_OPCODE(i) == OP_CALL || GET_OPCODE(i) == OP_TAILCALL ||
       GET_OPCODE(i) == OP_TFORLOOP)
     return getobjname(L, ci, GETARG_A(i), name);


commit 761cd7022def90b2953ab71dcc6109235af91eea
Author: Dan Tull <[hidden email]>
Date:   Thu Jul 10 18:00:35 2014 -0500

    Fix for instruction operand truncation issue.
    
    The Bx operand can only hold values in the range [0, 2^18), so at most 2^18 breakpoints can be set.
    Previously, an attempt to set breakpoints beyond that point resulted in application of incorrect original opcodes which could lead to bad behavior.
    
    However, aside from experimentation with programmatically setting many breakpoints this limit is highly unlikely to be reached.

diff --git a/src/ldebug.c b/src/ldebug.c
index c096f62..c6a39ed 100644
--- a/src/ldebug.c
+++ b/src/ldebug.c
@@ -1031,6 +1031,10 @@ static int sethalt(lua_State *L, Proto *p, int offset, lua_Hook hook) {
   existing = findhalt(L, p, offset);
 
   if (existing < 0) {
+    if (p->sizehalts > MAXARG_Bx) {
+      return -1;
+    }
+
     luaM_reallocvector(L, p->halts, p->sizehalts, p->sizehalts + 1, Halt);
     existing = p->sizehalts;
     p->sizehalts = p->sizehalts + 1;





From: [hidden email] <[hidden email]> on behalf of Sven Olsen <[hidden email]>
Sent: Sunday, May 13, 2018 7:09 PM
To: Lua mailing list
Subject: OP_HALT is really useful
 
I've just spent some time playing with VS Code, which has a couple very nice community written Lua debuggers.  However, as Seungjae Lee notes in his README.md, the performance hit that one sees from using one of these debuggers is reduced dramatically if one registers breakpoints using Dan Tull's OP_HALT patch in place of debug.sethook().

I don't fully understand the reasons /why/ OP_HALT is so useful, but, I can confirm that, in my own use case at least, implementing OP_HALT changes the overhead from running with a debugger from 'severe' to 'unnoticeable'.

Unfortunately, Dan's original patch is now 7+ years old, and Seungjae Lee's 5.1.5 version is written against a version of the codebase that's only slightly newer.  I have managed to apply Seungjae Lee's version of OP_HALT to my own modded copy of Lua 5.2, and it appears to work well.  So, if there's a need for a 5.2 version, interested parties should let me know, and perhaps we can create an official-ish entry for OP_HALT on the power patches page.  (This is not an entirely altruistic suggestion -- I suspect there may be some some quirks with the interaction of OP_HALT and 5.2-era features like GOTOs that could use a bit more investigation.)

Beyond that, I would submit to Roberto et al. that adding something OP_HALT-ish to the official Lua sources might not be a bad idea -- I was shocked by what a difference it makes.

-Sven
Reply | Threaded
Open this post in threaded view
|

Re: OP_HALT is really useful

Sven Olsen
In reply to this post by Dan Tull
It can also be used for dtrace-like instrumentation points.

Interesting.  After mucking around in the VM for the purposes of applying your patch, I've started daydreaming about writing some instrumentation hooks of my own.  What I'd like to do is generate a call stack trace + runtime info for a given entry point, so that I can more easily track down runtime bottlenecks.  I'm beginning to believe that this may be pretty easy, my current plan is more or less:

1) Hack into OP_CALL and OP_TAILCALL to update some lightweight C structures with timing info in the case that we're running with instrumentation switched on.
2) Maybe reverse engineer Visual Studio's .vsp or .vspx formats so I can turn my raw data into something viewable in a nice GUI.

However, after reading your posts, I'm guessing that you've implemented some instrumentation and profiling hooks that are quite a bit more sophisticated.  

Do you have any words of wisdom for someone just starting down this path? (It sounds like, maybe, implementing some sort of sampling-based instrumentation using OP_HALT-like hooks turns out to be faster than hacking a new switch into the core VM?)

There were a couple of other bugs found since that patch was posted as well (mostly quite obscure, but no sense keeping them to myself):

Many thanks for these -- some of those issues looks pretty nasty.  I'll ping Seungjae Lee and point him at this thread as well; his debugger appears to be quite widely used, and I'd bet he'd be interested in updating his own VM patch.

-Sven

Reply | Threaded
Open this post in threaded view
|

Re: OP_HALT is really useful

Sean Conner
It was thus said that the Great Sven Olsen once stated:

> >
> > It can also be used for dtrace-like instrumentation points.
>
> Interesting.  After mucking around in the VM for the purposes of applying
> your patch, I've started daydreaming about writing some instrumentation
> hooks of my own.  What I'd like to do is generate a call stack trace +
> runtime info for a given entry point, so that I can more easily track down
> runtime bottlenecks.  I'm beginning to believe that this may be pretty
> easy, my current plan is more or less:
>
> 1) Hack into OP_CALL and OP_TAILCALL to update some lightweight C
> structures with timing info in the case that we're running with
> instrumentation switched on.

  You might want to look at what Lua already provides before spending time
on this.  A simple way to profile is to simply sample where the program is
at regular intervals and that can be done with a function like:

function profile(f)
  local data = {}

  local function hook()
    local info = debug.getinfo(2,'Sl')
    local key  = string.format("%s:%s",info.source,info.currentline)
    if not data[key] then data[key] = 0 end
    data[key] = data[key] + 1
  end

  debug.sethook(hook,"",7) -- check location after 7 VM cycles
  f()
  debug.sethook()

  local res = {}
  for key,value in pairs(data) do
    table.insert(res,{ where = key , hits = value })
  end

  table.sort(res,function(a,b)
    if a.hits > b.hits then return true end
    if a.hits < b.hits then return false end
    return a.where < b.where
  end)

  return res
end

I ran this with luacheck (because that does significant work and is CPU
bound) and got the following results:

     29057 @/home/spc/.luarocks/share/lua/5.1/luacheck/standards.lua:130
     16605 @/home/spc/.luarocks/share/lua/5.1/luacheck/standards.lua:98
     16604 @/home/spc/.luarocks/share/lua/5.1/luacheck/standards.lua:97
     12451 @/home/spc/.luarocks/share/lua/5.1/luacheck/standards.lua:107
      9648 @/home/spc/.luarocks/share/lua/5.1/luacheck/standards.lua:108
      8303 @/home/spc/.luarocks/share/lua/5.1/luacheck/standards.lua:103

Here, line 130 of standards.lua was most often seen running when we checked.
To do what you want to do, hooking CP_CALL and CP_TAILCALL to get function
time overhead, I would first do this in Lua to see if it gives you the
information you want.  You'll want the 'c' and 'r' hooks so you get the
entry and exit point for functions.  Something like:

function clock() end -- return current time (preferrably a monotonic
                        -- clock and not a wall clock, which can change time
                        -- on you at least twice a year) at required
                        -- precision.

function profile_time(f)
  local stack = { clock() }
  local data  = {}

  local function hook(what)
    local info = debug.getinfo(2,"Sn")
    local key  = string.format("%s %s",info.source,info.name)
   
    if not data[key] then
      data[key] = 0
    end

    if what == 'call' then
      table.insert(stack,clock())

    elseif what == 'return' then
      local now  = clock()
      data[key] = data[key] + (now - table.remove(stack))

    elseif what == 'tail return' then
      local now = clock()
      data[key] = data[key] + (now - table.remove(stack))
    end
  end
 
  debug.sethook(hook,"cr")
  f()
  debug.sethook()
 
  local res = {}
  for key,value in pairs(data) do
    table.insert(res,{ where = key , time = value })
  end
 
  table.sort(res,function(a,b)
    if a.time > b.time then return true end
    if a.time < b.time then return false end
    return a.where < b.where
  end)

  return res
end

I ran this over luacheck and got:

   3942314 @/home/spc/.luarocks/share/lua/5.1/luacheck/check.lua
   3185576 @/home/spc/.luarocks/share/lua/5.1/luacheck/main.lua
   2877918 =[C] require
   2763300 =[C] xpcall
   2716661 @/home/spc/.luarocks/share/lua/5.1/luacheck/utils.lua try
   2210802 @/home/spc/.luarocks/lib/luarocks/rocks/luacheck/0.21.2-1/bin/luacheck f
   1603145 =[C] pcall
   1376311 @/home/spc/.luarocks/share/lua/5.1/luacheck/parser.lua
   1312949 @/home/spc/.luarocks/share/lua/5.1/luacheck/parser.lua parse_block
   1310442 @/home/spc/.luarocks/share/lua/5.1/luacheck/parser.lua parse_statement
   1301092 @/home/spc/.luarocks/share/lua/5.1/luacheck/standards.lua add_fields
   1132164 @/home/spc/.luarocks/share/lua/5.1/luacheck/utils.lua map

  Note---this is total accumulated time, including subroutine calls.  From the
first output, showing that standards.lua:130 was most often seen, that line
is in the function add_fields(), which here we can see took 1301092
microseconds (that's the unit here) to run.  

  It should be easy to combine the two to get a stack dump and timing
information to produce those flame graphs
(http://www.brendangregg.com/flamegraphs.html) everybody seems to be
producing nowadays.

  Do it in Lua, profile and then decide if you need to modify the Lua
interpeter.

  -spc (Profile and enjoy ... )


Reply | Threaded
Open this post in threaded view
|

Re: OP_HALT is really useful

Sven Olsen
It should be easy to combine the two to get a stack dump and timing
information to produce those flame graphs
(http://www.brendangregg.com/flamegraphs.html) everybody seems to be
producing nowadays.

Many thanks for this link -- that's exactly the sort of reference I was fishing for when I mentioned "reverse engineering the .vsp format".  I assumed *someone* must have generated a full-featured tool for inspecting generic instrumentation data; and it looks like Brendan has done exactly that (in Perl / SVG).  I will steal his code happily :)

Do it in Lua, profile and then decide if you need to modify the Lua interpeter.

Your advice is very reasonable, of course.  But, I'm afraid I have already started down this dark path -- and it looks like writing a data collector that hooks in via luaD_poscall and luaD_precall is going to suit my own purposes nicely ;)  (Functionally, there's really not much difference between this approach and the debug.sethook(hook,"cr") template code you posted -- but working in C++ lets me keep the instrumentation overhead a bit lower.)
Reply | Threaded
Open this post in threaded view
|

Re: OP_HALT is really useful

Sean Conner
It was thus said that the Great Sven Olsen once stated:
>
> Your advice is very reasonable, of course.  But, I'm afraid I have already
> started down this dark path -- and it looks like writing a data collector
> that hooks in via luaD_poscall and luaD_precall is going to suit my own
> purposes nicely ;)  (Functionally, there's really not much difference
> between this approach and the debug.sethook(hook,"cr") template code you
> posted -- but working in C++ lets me keep the instrumentation overhead a
> bit lower.)

  You can also do what I did via the Lua C API.  Then there's the advice:
make it work, make it right, make it fast, in that order.  By doing the work
in Lua, you can save time long term by finding out what exactly you want to
measure (quickly in development time) and only then drop down to a lower
level (to do it quickly at runtime).

  -spc



Reply | Threaded
Open this post in threaded view
|

Re: OP_HALT is really useful

nobody
In reply to this post by Sven Olsen
On 2018-05-27 21:22, Sven Olsen wrote:

> Interesting.  After mucking around in the VM for the purposes of
> applying your patch, I've started daydreaming about writing some
> instrumentation hooks of my own.
>
> *snip*
>
> Do you have any words of wisdom for someone just starting down this
> path? (It sounds like, maybe, implementing some sort of
> sampling-based instrumentation using OP_HALT-like hooks turns out to
> be faster than hacking a new switch into the core VM?

A while ago I hacked a stupid-simple profiler into Lua - just constantly
and unconditionally barfs tons of info into fd3 (or /dev/null, if you
don't assign that from the shell (3>somewhere))… It's counting all
calls, instructions (one counter per instruction), allocations
(increases counters for _all_ functions on the call stack), loads
(load,loadstring,require,… incl. dumping the loaded code), … and the
counters get dumped whenever the thing is GCed/freed.  All of that
causes it to run roughly 1.5-1.7x as long.  (Additionally dumping a full
stack trace every Nth call makes it… 1.7x (every 107th), 2.8x (every
11th), 13x (trace _ALL_ the calls!) slower in total for a very
call-heavy program (~70M calls, otherwise just summing values) with
mostly <=5 stack depth.)

So if you have some idea of what info you need, you can probably afford
to have that unconditionally enabled in a profiling build.

I don't have the time to clean up the changes & turn them into a patch,
but here's a bunch of notes that may be useful:

  *  lobject.h/ClosureHeader: nice place for counters (ncalls,nbytes,…)
     (initialize in lfunc.c/luaF_new[CL]closure)
  *  ldo.c/luaD_precall: all calls thru here -> ncalls++ / dump stack
  *  lapi.c/pushcclosure: Kill the `if (n == 0)` branch to disable light
     C functions / force C closures, so that you have the counter fields

  *  lobject.h/Proto: add instruction counters?
     (NULL-init in lfunc.c/luaFnewproto, alloc in lparser.c/close_func
     using luaM_newvector(L, fs->pc, size_t) and in lundump.c/LoadCode
     using luaM_newvector(S->L, n, size_t), then zero-init all counts)
     (change size_t to whatever counter size you're using)
  *  lvm.c/vmfetch, lvm.c/donextjump: increase instr.-counter:
     cl->p->prof_icounts[(ci->u.l.savedpc)-(cl->p->code)]++;
     (prof_icounts is whatever you're calling the instr. counter field)

  *  lstate.h/global_State: per Lua state, and
     lstate.h/lua_State: per thread within a Lua state
     (init in lstate.c: lua_newstate, preinit_thread (no allocs) or
     f_luaopen, lua_newthread (allocs ok))

  *  lmem.c/luaM_realloc_: all allocations go through here
  *  do whatever you do BEFORE the realloc call, as it might be moving
     the stuff that you wanted to touch
  *  if tracking allocations, blame (nsize-realosize) bytes if that's >0
  *  if block == NULL, osize may be != 0 but a type hint (LUA_TFOO),
     may want to count those to see who's slowing down the GC by creating
     lots of objects (tables, strings, …)
  *  may also want to walk a few stack levels & track indirect counts,
     just blaming your low-level constructors (Object.new, map, …)
     doesn't tell you what parts of the code are actually causing this

  *  when you want to touch the stack, guard:
     if (!G(L)->version || !L->ci)  return; /* still building state */
     CallInfo *ci = L->ci;
     if (ci->previous == ci->next)  return; /* setting up first func */
     (this *seems* to take care of every wonky stack state?)
  *  stack traversal: just walk the ci->previous chain until NULL

  *  dumping accumulated info from the lua*_free* functions works well
     if you properly close the state at the end (so no os.exit(foo), but
     os.exit(foo,true) is ok – or patch os.exit)

That's done against 5.3.4, but only used a couple of times so far, so
the above may be incomplete / missing critical things / contain bugs.

(For stack traces, you may want to make lgc.c/freeobj (cases LUA_TLCL,
LUA_TCCL) and lfunc.c/luaF_freeproto report the closure kind (C/Lua) /
closure->cfunc (gco2ccl(o)->f) / closure->proto (gco2lcl(o)->p) /
proto->source (f->source) mapping so your stack traces can simply be a
list of closure pointers, no need to constantly translate those when you
can do that later.  Then just keep a counter or timer in the state &
increment/check in ldo.c/luaD_precall whether you should dump a trace…
should be good enough, and fast.)

Have fun!
-- nobody

Reply | Threaded
Open this post in threaded view
|

Re: OP_HALT is really useful

Dirk Laurie-2
In reply to this post by Sean Conner
2018-05-28 6:04 GMT+02:00 Sean Conner <[hidden email]>:

> It was thus said that the Great Sven Olsen once stated:
>>
>> Your advice is very reasonable, of course.  But, I'm afraid I have already
>> started down this dark path -- and it looks like writing a data collector
>> that hooks in via luaD_poscall and luaD_precall is going to suit my own
>> purposes nicely ;)  (Functionally, there's really not much difference
>> between this approach and the debug.sethook(hook,"cr") template code you
>> posted -- but working in C++ lets me keep the instrumentation overhead a
>> bit lower.)
>
>   You can also do what I did via the Lua C API.  Then there's the advice:
> make it work, make it right, make it fast, in that order.  By doing the work
> in Lua, you can save time long term by finding out what exactly you want to
> measure (quickly in development time) and only then drop down to a lower
> level (to do it quickly at runtime).

In a TeX package I use[1], the following comment has survived regular
updates for over 20 years:

% why simple, when it can be done complicated ?

[1] musixtex

Reply | Threaded
Open this post in threaded view
|

Re: OP_HALT is really useful

Dan Tull
In reply to this post by Sean Conner

> > It can also be used for dtrace-like instrumentation points.

>
> Interesting.  After mucking around in the VM for the purposes of applying
> your patch, I've started daydreaming about writing some instrumentation
> hooks of my own...

> However, after reading your posts, I'm guessing that you've implemented
> some instrumentation and profiling hooks...
Our profiler is fairly simple and I think I described it in another thread on this list a while back which inspired some "but Lua already handles that" responses which were partially but not completely accurate given our full context. I don't recall how much detail I responded to those. I don't actually follow this list much in the last few years, just scan every few months or so to see if any topics of interest have come up.

>   Do it in Lua, profile and then decide if you need to modify the Lua interpeter.
Fair points. I did (profile) and I did (have to modify the interpreter).

Our internal profiler came out of finding various ways the traditional count and line hooks produced misleading data.

Just as the OP_HALT patch came out of finding that it was worth modifying the interpreter to change the fundamental performance characteristics* of the usual (even in the most optimized case) Lua way of implementing breakpoints with the debug hooks.


All that said, this above response is on point in the sense that it is very easy to create something sophisticated when something simpler will suffice and that it is very important to prototype an answer to confirm the utility of the data and to profile the result to understand its characteristics before going too far down a rabbit hole.

I have two other main pieces of feedback:

- In any solution to a profiling problem, be aware of what biases it will produce in the data due to its own overhead/measurement methods.

For example, using the opcode count hook will disproportionately punish opcode heavy code even if it is isn't particularly slow and miss expensive opcodes that are causing slowness. For example, if you have a loop that is doing computation it might have lots of math opcodes, but they might be very fast and low overhead. On the other hand, a single slow native call might just be a couple of opcodes, so the count hook will run right over it and blame something a few opcodes down the line. Honestly, this might still be a good enough clue, but depending on what it blames it might take you a while to realize the real culprit.

So you're stuck between a high frequency (low number) count hook lying to you about opcode heavy code that's not really slow and a low frequency (high number) count hook potentially pointing to completely irrelevant code potentially very far from the real culprit.

Similarly, a hook that gathers a full stack trace can disproportionately punish code tends to be called in a deep call stack even if it isn't slow (because gathering the stack itself is slow when there are lots of frames). The posted Lua based profiler code smartly avoids this by just sampling one frame, not the whole stack. That does mean it can be harder to reconstruct the execution to understand the context, but does avoid recursion depth distortions. In practice, if the call tree is a real problem, you'll see multiple frames high in the results and probably piece it together anyway. You can also pick a middle ground where you walk some fixed number of frames upward in order to get a little more context, but less distortion in the data.

Also be careful what you measure. For example, the os.clock() API measures CPU time. Sometimes this is what you want to measure, but it won't help you find slowness that affects wall time but is not CPU time intensive (some blocking APIs will leave the CPU quite idle, but will still take a lot of time).

- I eventually discarded my experiments with dtrace-like profiling despite it showing some significant promise for certain situations

"Trace points" were starting to prove very handy as a quick/dynamic (and very low overhead) way to measure time/paths taken in certain segments of code that were executing. Combined with our low overhead sampling profiler's more holistic view of which code was running when pausing periodically, it could save a lot of time quantifying the costs/distributions of different code paths of particular interest. The two approaches (top down and bottom up) were proving complementary.

The best part of it was that it only imposed overhead in proportion to the number of opcodes instrumented and their call frequency whereas the debug.hook answers impose a load system-wide that distorts unrelated operations as well.

That said, doing that instrumentation more manually wasn't that difficult/costly, so I decided not to pursue it further.

The other thing that made it less useful than hoped was that some code I wanted to instrument this way could be re-entered from multiple co-routines** and/or involved async calling conventions so it could be difficult to match up which start/stop points go together. Usually there was a place to grab some context to match it up, but not always and it could get complicated.

Anyway, because we had low overhead breakpoints that could run code, the technique I used instead of further investment in specialized instrumentation was to use breakpoints that ran code to log/collect information. It was higher overhead and more manual, but also much more flexible, so it was a better answer.

So the answer I picked _was_ to do more of it in Lua and OP_HALT was what made that viable whereas the debug.hook based solutions imposed too much overhead to get accurate results. It is actually still dtrace-like, but our regular code/logging breakpoints instead of a new type of tracing breakpoint.

The code coverage with self-clearing breakpoints wound up getting shelved as well, but only because it wasn't until later I pinned the bug that had been causing it to sporadically crash (one of the fixes I posted in my earlier replies on this thread). I may still resurrect that experiment.

DT

* OP_HALT has no overhead until the opcode is hit, line/opcode hook imposes a (non-trivial) cost on every opcode
** IIRC the call/return/tail return hooks only implicitly inform you about coroutine switches via coroutine.running() changing, so be careful of that when assembling call stacks using those hooks

Reply | Threaded
Open this post in threaded view
|

Re: OP_HALT is really useful

Sven Olsen
Anyway, because we had low overhead breakpoints that could run code, the technique I used instead of further investment in specialized instrumentation was to use breakpoints that ran code to log/collect information. It was higher overhead and more manual, but also much more flexible, so it was a better answer.

So the answer I picked _was_ to do more of it in Lua and OP_HALT was what made that viable whereas the debug.hook based solutions imposed too much overhead to get accurate results. It is actually still dtrace-like, but our regular code/logging breakpoints instead of a new type of tracing breakpoint.

Interesting.  Though I'll admit, I'm having a hard time reading between the lines here to figure out just what you did to implement this "manual instrumentation" method.  It sounds roughly like a sample gathering technique in which you peppered OP_HALTs into the bytecode for specific parts of the code base you wanted more information on?
 
"Trace points" were starting to prove very handy as a quick/dynamic (and very low overhead) way to measure time/paths taken in certain segments of code that were executing. Combined with our low overhead sampling profiler's more holistic view of which code was running when pausing periodically, it could save a lot of time quantifying the costs/distributions of different code paths of particular interest. The two approaches (top down and bottom up) were proving complementary.

In my own (far less sophisticated) experiments with instrumentation, I've quickly come to a similar high-level conclusion.  What one seems to need are multiple, complimentary tools rather than one single technique.  

The technique I started with was to gather runtimes for a complete call stack.  That proved dramatically expensive; even after modding the VM with an eye towards minimizing data collection overhead, collecting my call stack data increased the runtime of the function I was profiling from 2.5 seconds to 6 seconds (somewhat better than the 13x cost nobody reported, but, nonetheless very expensive).  The high cost of complete call trace instrumentation makes the data I'm collecting suspect -- as true bottlenecks (particularly C-function calls), won't suffer nearly as much runtime inflation as other areas of the code.  However, even so, those traces did give me a rough sense of the "lay of the land", and I could then follow up on that with hand-targeted instrumentation.  And of course, when I'm just instrumenting a handful of key function calls, instead of hooking into each and every OP_CALL, it's much easier to get undistorted runtime information.  So both techniques have their drawbacks, but, they're also synergistic.  

It seems to me like there's a lot to be said for having a couple different tools in one's instrumentation toolbox.

Thanks again for your replies Dan, it's been fascinating to learn a bit about how these issues played out in the context of a bigger project.

-Sven
Reply | Threaded
Open this post in threaded view
|

Re: OP_HALT is really useful

Sven Olsen
In reply to this post by Sean Conner
    elseif what == 'tail return' then
      local now = clock()
      data[key] = data[key] + (now - table.remove(stack))
    end

Is this a bug?  In Lua 5.2 at least, it looks like the 'what' value when reporting a tail call/return is 'tail call' not 'tail return'.

My understanding is that tail calls are a bit tricky to instrument -- because a 'tail call' is actually one event that's simultaneously both a call and a return.

I think that, when instrumenting for inclusive call time, ignoring tail call events entirely is actually a sensible policy.  Effectively, I think that's what your code is doing (as the 'tail return' case will never be hit).
Reply | Threaded
Open this post in threaded view
|

Re: OP_HALT is really useful

Sean Conner
It was thus said that the Great Sven Olsen once stated:

> >
> >     elseif what == 'tail return' then
> >       local now = clock()
> >       data[key] = data[key] + (now - table.remove(stack))
> >     end
> >
>
> Is this a bug?  In Lua 5.2 <https://www.lua.org/manual/5.2/manual.html#6.10>
> at least, it looks like the 'what' value when reporting a tail call/return
> is 'tail call' not 'tail return'.

  I still use Lua 5.1 (primarily).  I checked, it changed from 'tail return'
to 'tail call' between 5.1 and 5.2.

> My understanding is that tail calls are a bit tricky to instrument --
> because a 'tail call' is actually one event that's simultaneously both a
> call and a return.

  It's a GOTO made pretty.

> I think that, when instrumenting for inclusive call time, ignoring tail
> call events entirely is actually a sensible policy.  Effectively, I think
> that's what your code is doing (as the 'tail return' case will never be
> hit).

  I did not intend to ignore tail calls.

  -spc


 

Reply | Threaded
Open this post in threaded view
|

Re: OP_HALT is really useful

Sven Olsen
  I still use Lua 5.1 (primarily).  I checked, it changed from 'tail return'
to 'tail call' between 5.1 and 5.2.

Ah, that makes sense.  It looks like the guts of how debug.sethook(hook,"cr") interacts with tail calls changed quite a bit between 5.1 and 5.2.  As best as I can tell, in 5.1, a tail call will actually emit two hook events: a 'call' event when we hit OP_TAILCALL, and then a matching set of "simulated returns" when we eventually hit an OP_RETURN.  In 5.2, it looks like Lua got rid of the "simulated return" idea, and instead just emits a single hook for an OP_TAILCALL, one that uses the 'tail call' name.

If I'm understanding all this right, this semantic change doesn't actually break profiling code like yours, because it effectively replaces both the 'call' and 'tail return' cases that 5.1-era code would use to trace OP_TAILCALLs with null-ops.  That was cleverly done.

-Sven
Reply | Threaded
Open this post in threaded view
|

Re: OP_HALT is really useful

Dan Tull
In reply to this post by Sven Olsen
> > So the answer I picked _was_ to do more of it in Lua and OP_HALT was what made that viable...
> Interesting.  Though I'll admit, I'm having a hard time reading between the lines here to figure out
> just what you did to implement this "manual instrumentation" method. 
The breakpoints we set (via OP_HALT) call into our debugging mechanism when hit, but they can have a Lua callback associated with them which determines whether to stop (to support conditional breakpoints) and can run arbitrary code (to log, take measurements, etc). We already had debugger UI for configuring these kinds of breakpoints, so I just used that instead of defining a new low level breakpoint type.

The truly manual way to do this is basically "printf profiling" where actual code is inserted. Using code/conditional/logging breakpoints also it to be done more dynamically, but still provides full manual control over the nature of the instrumentation.

> In my own (far less sophisticated) experiments with instrumentation, I've quickly come to similar
> high-level conclusion.  What one seems to need are multiple, complimentary tools ...
> ...both techniques have their drawbacks, but, they're also synergistic.
Yeah, definitely sounds like we arrived at a similar perspective.

> Thanks again for your replies Dan, it's been fascinating to learn a bit about how these issues played out in the context of a bigger project.
Sure thing. Happy to share.

DT