profiling problem

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

profiling problem

Thijs Schreijer
Hi all,

Triggered by a message on this list I picked up an old todo item still lingering around. I tried to test the impact of corowatch [1] (that is using a debug hook that is set to run every 10000 vm instructions) over a coroutine without it.

When I run the tests, the outcome differs per run, too much imo, so I probably am doing something wrong. Quite regularly the debug version runs faster...
So I updated the functions with everything I could think of that might have caused it, but to no avail;
- wrote two identical test functions, 1 with, 1 without and measured them
- then added a 'warm up' section
- then disabled GC
- then removed a table to store the results to further minimize GC impact
- then added a 10 run average
- then... posted this message because I ran out of ideas

So any ideas as to what (silliness) I'm missing?

Thijs

[1] https://github.com/Tieske/corowatch 



The code:

package.path = "C:\\Users\\Thijs\\Dropbox\\Lua projects\\corowatch\\src\\?.lua;"..package.path
require("corowatch")

local res = 1

local testfunc = function()
  for n = 1,2000000 do
    res=math.sin(n/1000)
  end
end

local function test1()
  collectgarbage()
  collectgarbage()
  collectgarbage("stop")
  local t1 = coroutine.gettime()
  testfunc()
  t1 = coroutine.gettime() - t1
  collectgarbage("restart")
  collectgarbage()
  collectgarbage()
  return t1
end

local function test2()
  collectgarbage()
  collectgarbage()
  collectgarbage("stop")
  local t2 = coroutine.gettime()
  coroutine.wrapf(testfunc, 10001, 10000, function() end)()
  t2 = coroutine.gettime() - t2
  collectgarbage("restart")
  collectgarbage()
  collectgarbage()
  return t2
end

-- warm up
for n = 1,3 do
  test1()
  test2()
end

-- run test
local t1, t2, iter = 0,0,10
for n = 1,iter do
  t1=test1() + t1
  t2=test2() + t2
end
t1=t1/iter
t2=t2/iter

print(t1)
print(t2)
print("corowatch is " .. math.floor((t2-t1)/t1 *100) .. "% slower")


My results (in this set the debug version consistently runs faster);

C:\Users\Thijs\Dropbox\Lua projects\corowatch\examples>performance.lua
0.23556060791016
0.22308025360107
corowatch is -6% slower

C:\Users\Thijs\Dropbox\Lua projects\corowatch\examples>performance.lua
0.24024028778076
0.2324405670166
corowatch is -4% slower

C:\Users\Thijs\Dropbox\Lua projects\corowatch\examples>performance.lua
0.24336032867432
0.23556041717529
corowatch is -4% slower

C:\Users\Thijs\Dropbox\Lua projects\corowatch\examples>performance.lua
0.24336013793945
0.23400058746338
corowatch is -4% slower

C:\Users\Thijs\Dropbox\Lua projects\corowatch\examples>performance.lua
0.24960060119629
0.23244037628174
corowatch is -7% slower

C:\Users\Thijs\Dropbox\Lua projects\corowatch\examples>performance.lua
0.25272045135498
0.23400039672852
corowatch is -8% slower

C:\Users\Thijs\Dropbox\Lua projects\corowatch\examples>performance.lua
0.24180030822754
0.22308044433594
corowatch is -8% slower

C:\Users\Thijs\Dropbox\Lua projects\corowatch\examples>
C:\Users\Thijs\Dropbox\Lua projects\corowatch\examples>performance.lua
0.25116081237793
0.23087997436523
corowatch is -9% slower

C:\Users\Thijs\Dropbox\Lua projects\corowatch\examples>performance.lua
0.24804058074951
0.22620010375977
corowatch is -9% slower

C:\Users\Thijs\Dropbox\Lua projects\corowatch\examples>performance.lua
0.25272026062012
0.22776069641113
corowatch is -10% slower

C:\Users\Thijs\Dropbox\Lua projects\corowatch\examples>


Reply | Threaded
Open this post in threaded view
|

Re: profiling problem

Florian Nücke
> Hi all,
>
> Triggered by a message on this list I picked up an old todo item still lingering around. I tried to test the impact of corowatch [1] (that is using a debug hook that is set to run every 10000 vm instructions) over a coroutine without it.
>
> When I run the tests, the outcome differs per run, too much imo, so I probably am doing something wrong. Quite regularly the debug version runs faster...
> So I updated the functions with everything I could think of that might have caused it, but to no avail;
> - wrote two identical test functions, 1 with, 1 without and measured them
> - then added a 'warm up' section
> - then disabled GC
> - then removed a table to store the results to further minimize GC impact
> - then added a 10 run average
> - then... posted this message because I ran out of ideas
>
> So any ideas as to what (silliness) I'm missing?

This may not help much, but: I didn't have luasocket handy, so I used
os.clock() for coroutine.gettime (on Windows, so it's not the CPU time,
but time in floating point seconds since program start, and since in
this case we only need relative times it should do). Running it in Lua
5.2.2, over 10 runs the delta has been consistently in [-1,1]%, mostly
0%. So a wild guess: maybe the results you see have something to do with
how luasocket measures time?

> Thijs

Florian

> [1] https://github.com/Tieske/corowatch

Reply | Threaded
Open this post in threaded view
|

Re: profiling problem

Peng Zhicheng
In reply to this post by Thijs Schreijer
于 2013-11-3 15:31, Thijs Schreijer 写道:

> Hi all,
>
> Triggered by a message on this list I picked up an old todo item still lingering around. I tried to test the impact of corowatch [1] (that is using a debug hook that is set to run every 10000 vm instructions) over a coroutine without it.
>
> When I run the tests, the outcome differs per run, too much imo, so I probably am doing something wrong. Quite regularly the debug version runs faster...
> So I updated the functions with everything I could think of that might have caused it, but to no avail;
> - wrote two identical test functions, 1 with, 1 without and measured them
> - then added a 'warm up' section
> - then disabled GC
> - then removed a table to store the results to further minimize GC impact
> - then added a 10 run average
> - then... posted this message because I ran out of ideas
>
> So any ideas as to what (silliness) I'm missing?
>
> Thijs
>
> [1] https://github.com/Tieske/corowatch
>
>
>
> The code:
>
> package.path = "C:\\Users\\Thijs\\Dropbox\\Lua projects\\corowatch\\src\\?.lua;"..package.path
> require("corowatch")
>
> local res = 1
>
> local testfunc = function()
>    for n = 1,2000000 do
>      res=math.sin(n/1000)
>    end
> end
>
> local function test1()
>    collectgarbage()
>    collectgarbage()
>    collectgarbage("stop")
>    local t1 = coroutine.gettime()
>    testfunc()
>    t1 = coroutine.gettime() - t1
>    collectgarbage("restart")
>    collectgarbage()
>    collectgarbage()
>    return t1
> end
>
> local function test2()
>    collectgarbage()
>    collectgarbage()
>    collectgarbage("stop")
>    local t2 = coroutine.gettime()
>    coroutine.wrapf(testfunc, 10001, 10000, function() end)()
>    t2 = coroutine.gettime() - t2
>    collectgarbage("restart")
>    collectgarbage()
>    collectgarbage()
>    return t2
> end
>
> -- warm up
> for n = 1,3 do
>    test1()
>    test2()
> end
>
> -- run test
> local t1, t2, iter = 0,0,10
> for n = 1,iter do
>    t1=test1() + t1
>    t2=test2() + t2
> end
> t1=t1/iter
> t2=t2/iter
>
> print(t1)
> print(t2)
> print("corowatch is " .. math.floor((t2-t1)/t1 *100) .. "% slower")
>
>
> My results (in this set the debug version consistently runs faster);
>
> C:\Users\Thijs\Dropbox\Lua projects\corowatch\examples>performance.lua
> 0.23556060791016
> 0.22308025360107
> corowatch is -6% slower
>
> C:\Users\Thijs\Dropbox\Lua projects\corowatch\examples>performance.lua
> 0.24024028778076
> 0.2324405670166
> corowatch is -4% slower
>
> C:\Users\Thijs\Dropbox\Lua projects\corowatch\examples>performance.lua
> 0.24336032867432
> 0.23556041717529
> corowatch is -4% slower
>
> C:\Users\Thijs\Dropbox\Lua projects\corowatch\examples>performance.lua
> 0.24336013793945
> 0.23400058746338
> corowatch is -4% slower
>
> C:\Users\Thijs\Dropbox\Lua projects\corowatch\examples>performance.lua
> 0.24960060119629
> 0.23244037628174
> corowatch is -7% slower
>
> C:\Users\Thijs\Dropbox\Lua projects\corowatch\examples>performance.lua
> 0.25272045135498
> 0.23400039672852
> corowatch is -8% slower
>
> C:\Users\Thijs\Dropbox\Lua projects\corowatch\examples>performance.lua
> 0.24180030822754
> 0.22308044433594
> corowatch is -8% slower
>
> C:\Users\Thijs\Dropbox\Lua projects\corowatch\examples>
> C:\Users\Thijs\Dropbox\Lua projects\corowatch\examples>performance.lua
> 0.25116081237793
> 0.23087997436523
> corowatch is -9% slower
>
> C:\Users\Thijs\Dropbox\Lua projects\corowatch\examples>performance.lua
> 0.24804058074951
> 0.22620010375977
> corowatch is -9% slower
>
> C:\Users\Thijs\Dropbox\Lua projects\corowatch\examples>performance.lua
> 0.25272026062012
> 0.22776069641113
> corowatch is -10% slower
>
> C:\Users\Thijs\Dropbox\Lua projects\corowatch\examples>
>
>

It is really weird. I also got similar result running your code.
After some exploring, I think I have got some points, but not quit clear.

Well, although I don't know why, some experiments shows that the *main thread* runs
slower than *user created thread(coroutines)*.

Please someone would tell me the deep reason for that?

I modified your code as follow:
==================================================
--- perf.lua 2013-11-03 20:10:11.937500000 +0800
+++ perf2.lua 2013-11-03 20:42:44.937500000 +0800
@@ -1,3 +1,5 @@
+local old_wrap = coroutine.wrap
+
require("corowatch")

local res = 1
@@ -13,7 +15,7 @@
collectgarbage()
collectgarbage("stop")
local t1 = coroutine.gettime()
- testfunc()
+ old_wrap(testfunc)()
t1 = coroutine.gettime() - t1
collectgarbage("restart")
collectgarbage()
===================================================

then the result is:
===================================================
E:\playground>lua perf2.lua
0.465625
0.478125
corowatch is 2% slower

E:\playground>lua perf2.lua
0.471875
0.48125
corowatch is 1% slower

E:\playground>lua perf2.lua
0.46875
0.471875
corowatch is 0% slower

E:\playground>lua perf2.lua
0.4671875
0.4734375
corowatch is 1% slower

E:\playground>lua perf2.lua
0.4671875
0.475
corowatch is 1% slower

E:\playground>lua perf2.lua
0.46875
0.48125
corowatch is 2% slower

E:\playground>lua perf2.lua
0.465625
0.478125
corowatch is 2% slower

E:\playground>lua perf2.lua
0.465625
0.475
corowatch is 2% slower

E:\playground>lua perf2.lua
0.4703125
0.4734375
corowatch is 0% slower
===================================================

As seen from the experiments, the coroutine with a debughook set *IS* slower than
the one not set, but the difference is not significant.
This result is what I first expected, because the Lua VM would always check
for the debug hook at each VM instruction boundary, and as long as the hook is
not triggered, the difference between the two coroutines(with debug hook set and
not set) are quit small.

But still, could anyone give me some hints to figure out why the main thread is slower,
because this seems really not understandable to me.
Any suggestion is appreciated.



Reply | Threaded
Open this post in threaded view
|

RE: profiling problem

Thijs Schreijer
>
> It is really weird. I also got similar result running your code.
> After some exploring, I think I have got some points, but not quit clear.
>
> Well, although I don't know why, some experiments shows that the *main
> thread* runs
> slower than *user created thread(coroutines)*.
>
> Please someone would tell me the deep reason for that?
>

Thanks for trying that and the hint, I get different results though;
- coroutines consistently slower than the mainloop
- corowatch (using debug hooks) consistently runs fastest

Attached my code. Here's the output;
C:\Users\Thijs\Dropbox\Lua projects\corowatch\examples>performance.lua
Mainloop :      0.28079986572266
Coroutine:      0.27924041748047
Corowatch:      0.2574010848999
corowatch is -8% slower than unprotected
coroutine is 0% faster than the main loop

C:\Users\Thijs\Dropbox\Lua projects\corowatch\examples>performance.lua
Mainloop :      0.28079986572266
Coroutine:      0.28704032897949
Corowatch:      0.26520118713379
corowatch is -8% slower than unprotected
coroutine is -3% faster than the main loop

C:\Users\Thijs\Dropbox\Lua projects\corowatch\examples>performance.lua
Mainloop :      0.26832046508789
Coroutine:      0.28080024719238
Corowatch:      0.26052074432373
corowatch is -8% slower than unprotected
coroutine is -5% faster than the main loop

C:\Users\Thijs\Dropbox\Lua projects\corowatch\examples>performance.lua
Mainloop :      0.27924003601074
Coroutine:      0.28391990661621
Corowatch:      0.26988162994385
corowatch is -5% slower than unprotected
coroutine is -2% faster than the main loop

C:\Users\Thijs\Dropbox\Lua projects\corowatch\examples>performance.lua
Mainloop :      0.27923984527588
Coroutine:      0.29015998840332
Corowatch:      0.27144165039063
corowatch is -7% slower than unprotected
coroutine is -4% faster than the main loop

C:\Users\Thijs\Dropbox\Lua projects\corowatch\examples>performance.lua
Mainloop :      0.2683198928833
Coroutine:      0.2808012008667
Corowatch:      0.26520023345947
corowatch is -6% slower than unprotected
coroutine is -5% faster than the main loop

C:\Users\Thijs\Dropbox\Lua projects\corowatch\examples>performance.lua
Mainloop :      0.2776798248291
Coroutine:      0.2854808807373
Corowatch:      0.26988067626953
corowatch is -6% slower than unprotected
coroutine is -3% faster than the main loop

C:\Users\Thijs\Dropbox\Lua projects\corowatch\examples>performance.lua
Mainloop :      0.27923984527588
Coroutine:      0.28860111236572
Corowatch:      0.26988048553467
corowatch is -7% slower than unprotected
coroutine is -4% faster than the main loop

C:\Users\Thijs\Dropbox\Lua projects\corowatch\examples>performance.lua
Mainloop :      0.28080158233643
Coroutine:      0.28547992706299
Corowatch:      0.26987991333008
corowatch is -6% slower than unprotected
coroutine is -2% faster than the main loop

C:\Users\Thijs\Dropbox\Lua projects\corowatch\examples>

performance.lua (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: profiling problem

Peng Zhicheng
于 2013-11-4 22:08, Thijs Schreijer 写道:
It is really weird. I also got similar result running your code.
After some exploring, I think I have got some points, but not quit clear.

Well, although I don't know why, some experiments shows that the *main
thread* runs
slower than *user created thread(coroutines)*.

Please someone would tell me the deep reason for that?

Thanks for trying that and the hint, I get different results though;
- coroutines consistently slower than the mainloop
- corowatch (using debug hooks) consistently runs fastest

Attached my code. Here's the output;
C:\Users\Thijs\Dropbox\Lua projects\corowatch\examples>performance.lua
Mainloop :      0.28079986572266
Coroutine:      0.27924041748047
Corowatch:      0.2574010848999
corowatch is -8% slower than unprotected
coroutine is 0% faster than the main loop

C:\Users\Thijs\Dropbox\Lua projects\corowatch\examples>performance.lua
Mainloop :      0.28079986572266
Coroutine:      0.28704032897949
Corowatch:      0.26520118713379
corowatch is -8% slower than unprotected
coroutine is -3% faster than the main loop

C:\Users\Thijs\Dropbox\Lua projects\corowatch\examples>performance.lua
Mainloop :      0.26832046508789
Coroutine:      0.28080024719238
Corowatch:      0.26052074432373
corowatch is -8% slower than unprotected
coroutine is -5% faster than the main loop

C:\Users\Thijs\Dropbox\Lua projects\corowatch\examples>performance.lua
Mainloop :      0.27924003601074
Coroutine:      0.28391990661621
Corowatch:      0.26988162994385
corowatch is -5% slower than unprotected
coroutine is -2% faster than the main loop

C:\Users\Thijs\Dropbox\Lua projects\corowatch\examples>performance.lua
Mainloop :      0.27923984527588
Coroutine:      0.29015998840332
Corowatch:      0.27144165039063
corowatch is -7% slower than unprotected
coroutine is -4% faster than the main loop

C:\Users\Thijs\Dropbox\Lua projects\corowatch\examples>performance.lua
Mainloop :      0.2683198928833
Coroutine:      0.2808012008667
Corowatch:      0.26520023345947
corowatch is -6% slower than unprotected
coroutine is -5% faster than the main loop

C:\Users\Thijs\Dropbox\Lua projects\corowatch\examples>performance.lua
Mainloop :      0.2776798248291
Coroutine:      0.2854808807373
Corowatch:      0.26988067626953
corowatch is -6% slower than unprotected
coroutine is -3% faster than the main loop

C:\Users\Thijs\Dropbox\Lua projects\corowatch\examples>performance.lua
Mainloop :      0.27923984527588
Coroutine:      0.28860111236572
Corowatch:      0.26988048553467
corowatch is -7% slower than unprotected
coroutine is -4% faster than the main loop

C:\Users\Thijs\Dropbox\Lua projects\corowatch\examples>performance.lua
Mainloop :      0.28080158233643
Coroutine:      0.28547992706299
Corowatch:      0.26987991333008
corowatch is -6% slower than unprotected
coroutine is -2% faster than the main loop

C:\Users\Thijs\Dropbox\Lua projects\corowatch\examples>

Well, forget about it. I bet we both went the wong direction.

I ran that script from your original post this evening,
just to find the result is slight different from yesterday's.

Then I added some unrelated code at the beginning. I say it
unrelated because it just prints some text to the screen.

Surprisingly I found the result changed again. This time the
execution time of the main loop and the "watched" coroutine
became almost identical. Among several runs, the result showed
some randomness - tha latter might be either slower or faster,
but the average execution time diffence is near 0(and this is
what I had expected at the very beginning).

More experiments showed that if I insert unrelated codes randomly,
the results could all be affected in somewhat random manner.

You may try something similar to see whether it'll make some
difference if you want.

So I suppose the diffence is not due to whether it is the main
thread or a coroutine, nor to whether it is "watched"(hooked).
As I stated earlier, the perfomance *should* have little difference
as long as the debug hook is installed but not triggered,
according to my understanding of the source code of Lua.

Probably the diffence in the experiments was kind of butterfly effect,
caused by some unknown, subtle factors. I don't know.


Reply | Threaded
Open this post in threaded view
|

RE: profiling problem

Thijs Schreijer
> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]] On
> Behalf Of Peng Zhicheng
> Sent: maandag 4 november 2013 16:42
> To: Lua mailing list
> Subject: Re: profiling problem

 

> Well, forget about it. I bet we both went the wong direction.
>
> I ran that script from your original post this evening,
> just to find the result is slight different from yesterday's.
>
> Then I added some unrelated code at the beginning. I say it
> unrelated because it just prints some text to the screen.
>
> Surprisingly I found the result changed again. This time the
> execution time of the main loop and the "watched" coroutine
> became almost identical. Among several runs, the result showed
> some randomness - tha latter might be either slower or faster,
> but the average execution time diffence is near 0(and this is
> what I had expected at the very beginning).
>
> More experiments showed that if I insert unrelated codes randomly,
> the results could all be affected in somewhat random manner.
>
> You may try something similar to see whether it'll make some
> difference if you want.
>
> So I suppose the diffence is not due to whether it is the main
> thread or a coroutine, nor to whether it is "watched"(hooked).
> As I stated earlier, the perfomance *should* have little difference
> as long as the debug hook is installed but not triggered,
> according to my understanding of the source code of Lua.

I did a check on that, by keeping a counter in a slightly modified corowatch version;

C:\Users\Thijs\Dropbox\Lua projects\corowatch\examples>performance.lua
Mainloop :      0.25116024017334
Coroutine:      0.23868064880371
Corowatch:      0.24492034912109        Debughook invokements:  1203
corowatch is 2% slower than unprotected
coroutine is 4% faster than the main loop

The debughook gets invoked 1203 times, so there should be an impact. In this version the watched coroutine consistenly is slightly (2% to 8%) slower than the non-watched version, which is expected, but differs for each experiment so far.

But the coroutine remains faster than the main loop, which I can't explain.
This would imply that any Lua program can get a 2-5% performance increase by simply wrapping it in a coroutine. Something like;

coroutine.resume(coroutine.wrap(function()
  <original code here>
end))

Thijs


>
> Probably the diffence in the experiments was kind of butterfly effect,
> caused by some unknown, subtle factors. I don't know.

Reply | Threaded
Open this post in threaded view
|

Re: profiling problem

Dirk Laurie-2
In reply to this post by Peng Zhicheng
2013/11/4 Peng Zhicheng <[hidden email]>:

> I ran that script from your original post this evening,
> just to find the result is slight different from yesterday's.

My experience with timings of any program in any kind of multi-user
graphics-windowed operating environment is that you need a program
to take several seconds before effects become well and truly
reproducible. It's more reliable if you boot into a single-user
command-line only mode.

Reply | Threaded
Open this post in threaded view
|

Re: profiling problem

Kaj Eijlers
I agree with what Dirk says. I don't know your testing environment, but if you're on Windows (don't know about other OS's) you're running in a pre-emptive multithreading environment with a ton of processes running - for example, moving your mouse while profiling can skew your results) You could try elevating the priority of your process, but plain DOS should give you the cleanest (tho not real world anymore I guess) results.


On Sun, Nov 10, 2013 at 12:30 AM, Dirk Laurie <[hidden email]> wrote:
2013/11/4 Peng Zhicheng <[hidden email]>:

> I ran that script from your original post this evening,
> just to find the result is slight different from yesterday's.

My experience with timings of any program in any kind of multi-user
graphics-windowed operating environment is that you need a program
to take several seconds before effects become well and truly
reproducible. It's more reliable if you boot into a single-user
command-line only mode.