Re: ULE vs. 4BSD in RELENG_7

2007-12-02 Thread Julian Elischer
Josh Carroll wrote: is bigger better or worse? For sysbench bigger is better (more transactions per second). For ffmpeg, lower is better - e.g. the time to transcode the first 120 seconds of the selected video is less, so it ran faster. don't forget to give this info when giving numbers :-)

Re: ULE vs. 4BSD in RELENG_7

2007-12-02 Thread Josh Carroll
> is bigger better or worse? For sysbench bigger is better (more transactions per second). For ffmpeg, lower is better - e.g. the time to transcode the first 120 seconds of the selected video is less, so it ran faster. Josh ___ freebsd-performance@freeb

Re: ULE vs. 4BSD in RELENG_7

2007-12-02 Thread Julian Elischer
Josh Carroll wrote: I just ran through some of my benchmarks on a kernel build from sources as of today, and I've noticed an improvement for the ffmpeg workload. Here's a comparison of 4bsd, ule (BETA1) and ule (BETA3). This is vanilla source with no patches applied: Sorry, the ministat output

Re: ULE vs. 4BSD in RELENG_7

2007-12-01 Thread Josh Carroll
> I just ran through some of my benchmarks on a kernel build from > sources as of today, and I've noticed an improvement for the ffmpeg > workload. Here's a comparison of 4bsd, ule (BETA1) and ule (BETA3). > This is vanilla source with no patches applied: Sorry, the ministat output was mangled. I'

Re: ULE vs. 4BSD in RELENG_7

2007-12-01 Thread Josh Carroll
Jeff, I just ran through some of my benchmarks on a kernel build from sources as of today, and I've noticed an improvement for the ffmpeg workload. Here's a comparison of 4bsd, ule (BETA1) and ule (BETA3). This is vanilla source with no patches applied: x 4bsd + ule * uleb3 +-

Re: ULE vs. 4BSD in RELENG_7

2007-11-11 Thread Josh Carroll
> These ministat results show that the latest patch (alone) results in > slightly worse performance for ffmpeg and buildworld, but slightly > better results for sysbench. Please disregard those conclusions, I misread the ffmpeg results and didn't look at all thread counts for the sysbench runs. L

Re: ULE vs. 4BSD in RELENG_7

2007-11-11 Thread Josh Carroll
> Try the /usr/src/tools/tools/ministat utility for a simple and effective > way to compare these kinds of noisy measurements and extract reliable > comparisons. Thanks again for the suggestion, Kris! I compiled results for 5 runs of the three benchmarks I've been using (ffmpeg, sysbench(mysql),

Re: ULE vs. 4BSD in RELENG_7

2007-11-10 Thread Josh Carroll
> BTW, it doesn't make much sense to be measuring to millisecond precision > a quantity which has variation that is unknown but probably much larger > :) When trying to make comparisons to identify performance changes, a > careful statistical approach is necessary. > > Try the /usr/src/tools/tools

Re: ULE vs. 4BSD in RELENG_7

2007-11-10 Thread Kris Kennaway
Josh Carroll wrote: ffmpeg: 1:38.885 sysbench: (4,8,12,16 threads respectively): 2221.93 2327.87 2292.49 2269.29 And buildworld: 13m47.052s I ran these after changing the slice value to 7 as well with this patch. ffmpeg: 1:38.547 BTW, it doesn't make much sense to be measurin

Re: ULE vs. 4BSD in RELENG_7

2007-11-10 Thread Josh Carroll
> ffmpeg: 1:38.885 > > sysbench: (4,8,12,16 threads respectively): >2221.93 >2327.87 >2292.49 >2269.29 > > And buildworld: 13m47.052s I ran these after changing the slice value to 7 as well with this patch. ffmpeg: 1:38.547 sysbench: 2236.55 2321.02 2271.76 2254.85

Re: ULE vs 4BSD in RELENG_7

2007-11-10 Thread Gelsema, P (Patrick) - FreeBSD
On Mon, November 5, 2007 00:50, Gelsema, P (Patrick) - FreeBSD wrote: > On Sun, November 4, 2007 22:27, Jeff Roberson wrote: >> On Sun, 4 Nov 2007, Gelsema, P (Patrick) - FreeBSD wrote: >> >>> Hi Jeff, >>> >>> I tried your patch. Ran a buildkernel, timed. Recompiled kernel >>> including >>> your pa

Re: ULE vs. 4BSD in RELENG_7

2007-11-09 Thread Josh Carroll
> Josh, I had an interesting thought today. What if the reason 4BSD is > faster is because it distributes load more evenly across all packages > because it distributes randomly? ULE distributed across cores evenly but > not packages. Can you try the attached patch? This also turns the > defau

Re: ULE vs. 4BSD in RELENG_7

2007-11-09 Thread Jeff Roberson
On Tue, 6 Nov 2007, Josh Carroll wrote: That's expected due to the fuzzy rounding of 128 / 10, etc. Can you set slice_min and slice both equal to 7 and see if the numbers come out better than without the patch but with a slice value of 7? Basically I'm trying to isolate the effects of the diff

Re: ULE vs. 4BSD in RELENG_7

2007-11-06 Thread Josh Carroll
> That's expected due to the fuzzy rounding of 128 / 10, etc. Can you set > slice_min and slice both equal to 7 and see if the numbers come out > better than without the patch but with a slice value of 7? Basically I'm > trying to isolate the effects of the different slice handling in this > patc

Re: ULE vs. 4BSD in RELENG_7

2007-11-06 Thread Josh Carroll
> That's expected due to the fuzzy rounding of 128 / 10, etc. Can you set > slice_min and slice both equal to 7 and see if the numbers come out > better than without the patch but with a slice value of 7? Basically I'm > trying to isolate the effects of the different slice handling in this > patc

Re: ULE vs. 4BSD in RELENG_7

2007-11-06 Thread Jeff Roberson
On Mon, 5 Nov 2007, Josh Carroll wrote: Turns out the last patch I posted had a small compile error because I edited it by hand to remove one section. Here's an updated patch that fixes that and changes the min/max slice values to something more reasonable. Slice min should be around 4 with a

Re: ULE vs. 4BSD in RELENG_7

2007-11-05 Thread Josh Carroll
> Sysbench results: > # threadsslice=7 slice=13 slice_min=4 slice_min=2 > 42265.672250.36 2261.712297.08 > 82300.252310.02 2306.792313.61 > 12 2269.542304.04 2296.542279.73 >

Re: ULE vs 4BSD in RELENG_7

2007-11-05 Thread Bruce Evans
On Sun, 4 Nov 2007, Jeff Roberson wrote: On Sun, 4 Nov 2007, Gelsema, P (Patrick) - FreeBSD wrote: w/o patch hulk# time make -j8 buildkernel 837.808u 138.167s 10:28.96 155.1% 6349+1349k 2873+7780io 303pf+0w w patch hulk# time make -j8 buildkernel 838.554u 168.316s 10:52.10 154.4%

Re: ULE vs. 4BSD in RELENG_7

2007-11-05 Thread Josh Carroll
> Turns out the last patch I posted had a small compile error because I > edited it by hand to remove one section. Here's an updated patch that > fixes that and changes the min/max slice values to something more > reasonable. Slice min should be around 4 with a max of 12. > > Also looks like 4BSD

Re: ULE vs 4BSD in RELENG_7

2007-11-04 Thread Gelsema, P (Patrick) - FreeBSD
On Sun, November 4, 2007 22:27, Jeff Roberson wrote: > On Sun, 4 Nov 2007, Gelsema, P (Patrick) - FreeBSD wrote: > >> Hi Jeff, >> >> I tried your patch. Ran a buildkernel, timed. Recompiled kernel >> including >> your patch, rebooted and reran. Please find results below. >> >> w/o patch >> hulk# ti

Re: ULE vs. 4BSD in RELENG_7

2007-11-04 Thread Jeff Roberson
On Sun, 4 Nov 2007, Josh Carroll wrote: Josh, I included one too many changes in the diff and it made the results ambiguous. I've scaled it back slightly by removing the changes to sched_pickcpu() and included the patch in this email again. Can you run through your tests once more? I'd like t

ULE vs 4BSD in RELENG_7

2007-11-04 Thread Gelsema, P (Patrick) - FreeBSD
Hi Jeff, I tried your patch. Ran a buildkernel, timed. Recompiled kernel including your patch, rebooted and reran. Please find results below. w/o patch hulk# time make -j8 buildkernel 837.808u 138.167s 10:28.96 155.1% 6349+1349k 2873+7780io 303pf+0w w patch hulk# time make -j8 buildkernel

Re: ULE vs 4BSD in RELENG_7

2007-11-04 Thread Jeff Roberson
On Sun, 4 Nov 2007, Gelsema, P (Patrick) - FreeBSD wrote: Hi Jeff, I tried your patch. Ran a buildkernel, timed. Recompiled kernel including your patch, rebooted and reran. Please find results below. w/o patch hulk# time make -j8 buildkernel 837.808u 138.167s 10:28.96 155.1% 6349+1349k 2

Re: ULE vs. 4BSD in RELENG_7

2007-11-04 Thread Josh Carroll
> Josh, I included one too many changes in the diff and it made the results > ambiguous. I've scaled it back slightly by removing the changes to > sched_pickcpu() and included the patch in this email again. Can you run > through your tests once more? I'd like to commit this part soon as it > hel

Re: ULE vs. 4BSD in RELENG_7

2007-11-04 Thread Jeff Roberson
On Sun, 4 Nov 2007, Josh Carroll wrote: Josh, thanks for your help so far. This has been very useful. You're welcome, glad to help! Thanks for the effort and the patch. Any testing you can run this through is appreciated. Anyone else lurking in this thread who would like to is also welcome

Re: ULE vs. 4BSD in RELENG_7

2007-11-03 Thread Josh Carroll
> Josh, thanks for your help so far. This has been very useful. You're welcome, glad to help! Thanks for the effort and the patch. > Any testing you can run this through is appreciated. Anyone else lurking > in this thread who would like to is also welcome to report back findings. Here are a f

Re: ULE vs. 4BSD in RELENG_7

2007-11-03 Thread Jeff Roberson
On Sat, 3 Nov 2007, Josh Carroll wrote: What would be interesting to know is if the sum of the temperatures is any different. 4BSD gets a much more random distribution of load because a thread is run on whatever cpu context switches next. ULE will have specific load patterns since it scans lis

Re: ULE vs. 4BSD in RELENG_7

2007-11-03 Thread Josh Carroll
> What would be interesting to know is if the sum of the temperatures is any > different. 4BSD gets a much more random distribution of load because a > thread is run on whatever cpu context switches next. ULE will have > specific load patterns since it scans lists of cpus in a fixed order to > as

Re: ULE vs. 4BSD in RELENG_7

2007-11-03 Thread Josh Carroll
> What was the -j value and number of processors? -j 8. I did the following (one warm up, 3 times in a row after that, averaged): cd /usr/src rm -rf /usr/obj/* make clean time make -j8 -DNOCLEAN buildworld The system is a Q6600, so 4 cores. Thanks, Josh

Re: ULE vs. 4BSD in RELENG_7

2007-11-03 Thread Jeff Roberson
On Sat, 3 Nov 2007, Josh Carroll wrote: What was the -j value and number of processors? -j 8. I did the following (one warm up, 3 times in a row after that, averaged): cd /usr/src rm -rf /usr/obj/* make clean time make -j8 -DNOCLEAN buildworld The system is a Q6600, so 4 cores. Josh, than

Re: ULE vs. 4BSD in RELENG_7

2007-11-03 Thread Josh Carroll
> buildworld isn't cooperating for me, but once I iron that out, I'll > post some results there as well :) I was able to get buildworld compiling ok and here are the results: 4BSDULE.13ULE.7 13:24.7313:44.2813:38.85 Only a 1.75% difference when the slice value is set to 7

Re: ULE vs. 4BSD in RELENG_7

2007-11-03 Thread Jeff Roberson
On Sat, 3 Nov 2007, Josh Carroll wrote: buildworld isn't cooperating for me, but once I iron that out, I'll post some results there as well :) I was able to get buildworld compiling ok and here are the results: 4BSDULE.13ULE.7 13:24.7313:44.2813:38.85 Only a 1.75% dif

Re: ULE vs. 4BSD in RELENG_7

2007-11-02 Thread Josh Carroll
> Thank you, that was very useful. I may have something to test very soon. Sounds great Jeff, just say the word when you need someone to do the testing. I'll be glad to help! > What would be interesting to know is if the sum of the temperatures is any > different. 4BSD gets a much more random d

Re: ULE vs. 4BSD in RELENG_7

2007-11-02 Thread Jeff Roberson
On Fri, 2 Nov 2007, Josh Carroll wrote: Could you try spot checking a couple of tests with kern.sched.slice set to half its present value? 4BSD on average will use half the slice that ULE will by default. The initial value was 13, and I changed it to 7. Here is the time result for the ffmpeg

Re: ULE vs. 4BSD in RELENG_7

2007-11-02 Thread Josh Carroll
> Could you try spot checking a couple of tests with kern.sched.slice set to > half its present value? 4BSD on average will use half the slice that ULE > will by default. The initial value was 13, and I changed it to 7. Here is the time result for the ffmpeg run: 13: 1:39.09 7:1:37.01 I al

Re: ULE vs. 4BSD in RELENG_7

2007-11-02 Thread Nick Evans
> This is interesting. I have had a couple of laptop users report success > in using lower power saving modes with ULE. Are these core temp > observations repeatable? > > Thanks, > Jeff > > > > > Thanks again for all your help! Please let me know if/when I can do > > anything else to help out

Re: ULE vs. 4BSD in RELENG_7

2007-11-02 Thread Jeff Roberson
On Thu, 25 Oct 2007, Josh Carroll wrote: I'm confident that we can improve things. It will probably not make the cut for 7.0 since it will be too disruptive. I'm sure it can be backported before 7.1 when ULE is likely to become the default. That sounds great! I figured it was something that

Re: ULE vs. 4BSD in RELENG_7

2007-10-25 Thread Josh Carroll
> I'm confident that we can improve things. It will probably not make the > cut for 7.0 since it will be too disruptive. I'm sure it can be > backported before 7.1 when ULE is likely to become the default. That sounds great! I figured it was something that would have to wait until 7.0 released.

Re: ULE vs. 4BSD in RELENG_7

2007-10-24 Thread Jeff Roberson
On Wed, 24 Oct 2007, Josh Carroll wrote: Your tests with ffmpeg threads vs processes probably is triggering more context switches due to lock contention in the kernel in the threads case. This is also likely the problem with some super-smack tests. On each context switch 4BSD has an opportunity

Re: ULE vs. 4BSD in RELENG_7

2007-10-24 Thread Josh Carroll
> Your tests with ffmpeg threads vs processes probably is triggering more > context switches due to lock contention in the kernel in the threads case. > This is also likely the problem with some super-smack tests. On each > context switch 4BSD has an opportunity to perfectly balance the CPUs. ULE

Re: ULE vs. 4BSD in RELENG_7

2007-10-24 Thread Jeff Roberson
On Tue, 23 Oct 2007, Josh Carroll wrote: Hello, I posted this to the stable mailing list, as I thought it was pertinent there, but I think it will get better attention here. So I apologize in advance for cross-posting if this is a faux pas. :) Anyway, in summary, ULE is about 5-6 % slower than

Re: ULE vs. 4BSD in RELENG_7

2007-10-24 Thread Manjunath R Gowda
On 10/24/07, Josh Carroll <[EMAIL PROTECTED]> wrote: > > > Yes, that's the proper default. You could try setting steal_thresh to 1. > I > > noticed a problem with building ports on an 8 core Xeon system while 8 > > distributed.net crunchers were running. The port build would proceed > > incredibly

Re: ULE vs. 4BSD in RELENG_7

2007-10-24 Thread Bruce Evans
On Tue, 23 Oct 2007, Kris Kennaway wrote: Josh Carroll wrote: Anyway, in summary, ULE is about 5-6 % slower than 4BSD for two workloads that I am sensitive to: building world with -j X, and ffmpeg -threads X. Other benchmarks seem to indicate relatively equal performance between the two. MySQL,

Re: ULE vs. 4BSD in RELENG_7

2007-10-24 Thread Josh Carroll
> Yes, that's the proper default. You could try setting steal_thresh to 1. I > noticed a problem with building ports on an 8 core Xeon system while 8 > distributed.net crunchers were running. The port build would proceed > incredibly slowly, steal_thresh=1 helped a little bit. It might not make up

Re: ULE vs. 4BSD in RELENG_7

2007-10-24 Thread Nick Evans
On Wed, 24 Oct 2007 09:39:29 -0400 "Josh Carroll" <[EMAIL PROTECTED]> wrote: > > 5-6% is a lot. ULE has some tuning for makeworld in -current, which > > for me reduced it to less than 1% slower than 4BSD (down from 5-10% > > slower), for the case of makeworld -j4 over nfs on a 2-CPU system with >

Re: ULE vs. 4BSD in RELENG_7

2007-10-24 Thread Nick Evans
On Wed, 24 Oct 2007 11:39:52 -0400 "Josh Carroll" <[EMAIL PROTECTED]> wrote: > > kern.sched.steal_thresh is/was one of the more effective tuning sysctls. > > rev 1.205 of sched_ule had a change that was supposed to automatically > > adjust it based on the number of cores. Is this the same 8 core s

Re: ULE vs. 4BSD in RELENG_7

2007-10-24 Thread Josh Carroll
> kern.sched.steal_thresh is/was one of the more effective tuning sysctls. rev > 1.205 of sched_ule had a change that was supposed to automatically adjust it > based on the number of cores. Is this the same 8 core system as the > other thread? In that case the commit dictates steal_thresh should be

Re: ULE vs. 4BSD in RELENG_7

2007-10-24 Thread gnn
At Tue, 23 Oct 2007 21:06:39 -0400, Josh Carroll wrote: > > I decided to do some testing of concurrent processes (rather than a > single process that's multi-threaded). Specifically, I ran 4 ffmpeg > (without the -threads option) commands at the same time. The > difference was less than a percent:

Re: ULE vs. 4BSD in RELENG_7

2007-10-24 Thread Josh Carroll
> 5-6% is a lot. ULE has some tuning for makeworld in -current, which > for me reduced it to less than 1% slower than 4BSD (down from 5-10% > slower), for the case of makeworld -j4 over nfs on a 2-CPU system with > the sources pre-cached on the server and objects on a local file system, > and exte

Re: ULE vs. 4BSD in RELENG_7

2007-10-23 Thread Josh Carroll
> We can not ignore this performance bug, also I had found that ULE is > slower than 4BSD when testing super-smack's update benchmark on my > dual-core machine. I actually saw improved performance with ULE over 4BSD for super-smack. What were the parameters you used for your testing? These were mi

Re: ULE vs. 4BSD in RELENG_7

2007-10-23 Thread David Xu
Kris Kennaway wrote: One major difference is that your workload is 100% user. Also you were reporting ULE had more idle time, which looks like a bug since I would expect it be basically 0% idle on such a workload. Kris We can not ignore this performance bug, also I had found that ULE is sl

Re: ULE vs. 4BSD in RELENG_7

2007-10-23 Thread Josh Carroll
I decided to do some testing of concurrent processes (rather than a single process that's multi-threaded). Specifically, I ran 4 ffmpeg (without the -threads option) commands at the same time. The difference was less than a percent: 4bsd: 439.92 real 1755.91 user 1.08 sys ule:

Re: ULE vs. 4BSD in RELENG_7

2007-10-23 Thread Josh Carroll
> My next step is to run some transcodes with mencoder to see if it has > similar performance between the two schedulers. When I have those > results, I'll post them to this thread. mencoder is linked against the same libx264 library that ffmpeg uses for h.264 encoding, so I was expecting similar

Re: ULE vs. 4BSD in RELENG_7

2007-10-23 Thread Josh Paetzel
On Tuesday 23 October 2007, Josh Carroll wrote: > > ULE is tuned towards providing cpu affinity compilation and > > evidently encoding are workloads that do not benefit from > > affinity. Before we conclude that it is slower, try building with > > -j5, -j6, j7. > > Here are the results of running f

Re: ULE vs. 4BSD in RELENG_7

2007-10-23 Thread Josh Carroll
> Just curious, but are these results obtained while you are > overclocking your 2.4ghz CPU to 3.4ghz? That might be a useful > datapoint. Yes they are with the CPU overclocked. I have verified the results when not overclocked as well (running at stock). > It also might be useful to know what s

Re: ULE vs. 4BSD in RELENG_7

2007-10-23 Thread Josh Carroll
> ULE is tuned towards providing cpu affinity compilation and evidently > encoding are workloads that do not benefit from affinity. Before we > conclude that it is slower, try building with -j5, -j6, j7. Here are the results of running ffmpeg with 4 through 8 threads on both schedulers: 4 threads

Re: ULE vs. 4BSD in RELENG_7

2007-10-23 Thread Kip Macy
On 10/23/07, Josh Carroll <[EMAIL PROTECTED]> wrote: > Hello, > > I posted this to the stable mailing list, as I thought it was > pertinent there, but I think it will get better attention here. So I > apologize in advance for cross-posting if this is a faux pas. :) > > Anyway, in summary, ULE is ab

Re: ULE vs. 4BSD in RELENG_7

2007-10-23 Thread Kris Kennaway
Josh Carroll wrote: Hello, I posted this to the stable mailing list, as I thought it was pertinent there, but I think it will get better attention here. So I apologize in advance for cross-posting if this is a faux pas. :) Anyway, in summary, ULE is about 5-6 % slower than 4BSD for two workload

ULE vs. 4BSD in RELENG_7

2007-10-23 Thread Josh Carroll
Hello, I posted this to the stable mailing list, as I thought it was pertinent there, but I think it will get better attention here. So I apologize in advance for cross-posting if this is a faux pas. :) Anyway, in summary, ULE is about 5-6 % slower than 4BSD for two workloads that I am sensitive