Re: [v8-dev] CpuProfiler processing thread v8:ProfEvntProc fully utilizes one core

m . j . tunnicliffe Thu, 08 Jan 2015 15:15:08 -0800

Hi Ben, thanks for your reply, I was a little worried because my initial 
web searches did not find obvious reports of this issue by others (too much 
noise perhaps).


> For V8 3.14 / node.js v0.10, I fixed most of the overhead by means of 
> PR [0], what I think you call a poor man's hack in your email.  Not 
> that I disagree, but it's remarkably effective. :-) 

 
Oops, I didn't mean to imply it is _necessarily_ an incorrect or bad 
solution. 
I was really thinking in the context of my own implementation, which was 
"tactical" and intended purely to give me confidence I was on the right 
track while debugging.
My cautious thinking was that the downside would be a potential delay in 
the processing of the next batch of work (samples or code events) leading 
to an increase in the queue length.

If that worry is unfounded, it could be a quite pragmatic solution and it 
sounds like you were able to get an acceptable reduction in CPU utilisation 
with a very small sleep period.
It would be interesting to compare the behaviour of the 1 ns sleep 
implementation vs a semaphore implementation.

sched_yield() only gives up a time slice when there is another process 

scheduled on the same CPU.  I changed that to a nanosleep() call with 

a 1 ns timeout that forcibly puts the thread to sleep, with the

timeout getting rounded up to operating system granularity (50 us on 

most Linux systems, it's even coarser on OS X.)  


I also found a sleep quite effective at reducing the CPU usage, but I 
haven't checked the effects with any particular depth (at least, not yet).
I tried 1, 10 and 100 microsecond sleeps. 1 and 10 seemed to give a similar 
reduction in utilisation, which makes sense if the minimum scheduler 
granularity was >10 microseconds on my box. A 100 microsecond sleep 
appeared to give a better utilisation reduction.
To give ballpark figures, the profiler processing thread clocked ~100% 
normally, ~20% with a 1 or 10 us sleep and ~15% with a 100 us sleep.

----

I ran a test on a recent V8 version (a late December cut of master from 
v8-git-mirror) while capturing a perf profile, and it looks like the 
samples are mostly spent reading the current time:
-  97.41%  [kernel]             [k] acpi_pm_read
   - acpi_pm_read
      - 99.98% ktime_get_ts
         - posix_ktime_get_ts
         - sys_clock_gettime                        
         - system_call_fastpath                     
         - 0x7fffc5bfe7c2                           
         - __clock_gettime                          
            - 95.34% v8::base::ElapsedTimer::Now()
               - 97.86% v8::base::ElapsedTimer::Elapsed() const
                    v8::base::ElapsedTimer::HasExpired(v8::base::TimeDelta) 
const                                                   
                    v8::internal::ProfilerEventsProcessor::Run()
                    v8::base::Thread::NotifyStartedAndRun()
                    v8::base::ThreadEntry(void*)
                    start_thread
               + 2.14% v8::base::ElapsedTimer::Start()

So it looks like the thread won't yield time slices as it did in 3.14.

Regards
Michael

-- 
-- 
v8-dev mailing list
v8-dev@googlegroups.com
http://groups.google.com/group/v8-dev
--- 
You received this message because you are subscribed to the Google Groups 
"v8-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to v8-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [v8-dev] CpuProfiler processing thread v8:ProfEvntProc fully utilizes one core

Reply via email to