Josh Carroll wrote:
is bigger better or worse?
For sysbench bigger is better (more transactions per second). For
ffmpeg, lower is better - e.g. the time to transcode the first 120
seconds of the selected video is less, so it ran faster.
don't forget to give this info when giving numbers :-)
> is bigger better or worse?
For sysbench bigger is better (more transactions per second). For
ffmpeg, lower is better - e.g. the time to transcode the first 120
seconds of the selected video is less, so it ran faster.
Josh
___
freebsd-performance@freeb
Josh Carroll wrote:
I just ran through some of my benchmarks on a kernel build from
sources as of today, and I've noticed an improvement for the ffmpeg
workload. Here's a comparison of 4bsd, ule (BETA1) and ule (BETA3).
This is vanilla source with no patches applied:
Sorry, the ministat output
> I just ran through some of my benchmarks on a kernel build from
> sources as of today, and I've noticed an improvement for the ffmpeg
> workload. Here's a comparison of 4bsd, ule (BETA1) and ule (BETA3).
> This is vanilla source with no patches applied:
Sorry, the ministat output was mangled. I'
Jeff,
I just ran through some of my benchmarks on a kernel build from
sources as of today, and I've noticed an improvement for the ffmpeg
workload. Here's a comparison of 4bsd, ule (BETA1) and ule (BETA3).
This is vanilla source with no patches applied:
x 4bsd
+ ule
* uleb3
+-
> These ministat results show that the latest patch (alone) results in
> slightly worse performance for ffmpeg and buildworld, but slightly
> better results for sysbench.
Please disregard those conclusions, I misread the ffmpeg results and
didn't look at all thread counts for the sysbench runs.
L
> Try the /usr/src/tools/tools/ministat utility for a simple and effective
> way to compare these kinds of noisy measurements and extract reliable
> comparisons.
Thanks again for the suggestion, Kris!
I compiled results for 5 runs of the three benchmarks I've been using
(ffmpeg, sysbench(mysql),
> BTW, it doesn't make much sense to be measuring to millisecond precision
> a quantity which has variation that is unknown but probably much larger
> :) When trying to make comparisons to identify performance changes, a
> careful statistical approach is necessary.
>
> Try the /usr/src/tools/tools
Josh Carroll wrote:
ffmpeg: 1:38.885
sysbench: (4,8,12,16 threads respectively):
2221.93
2327.87
2292.49
2269.29
And buildworld: 13m47.052s
I ran these after changing the slice value to 7 as well with this patch.
ffmpeg: 1:38.547
BTW, it doesn't make much sense to be measurin
> ffmpeg: 1:38.885
>
> sysbench: (4,8,12,16 threads respectively):
>2221.93
>2327.87
>2292.49
>2269.29
>
> And buildworld: 13m47.052s
I ran these after changing the slice value to 7 as well with this patch.
ffmpeg: 1:38.547
sysbench:
2236.55
2321.02
2271.76
2254.85
On Mon, November 5, 2007 00:50, Gelsema, P (Patrick) - FreeBSD wrote:
> On Sun, November 4, 2007 22:27, Jeff Roberson wrote:
>> On Sun, 4 Nov 2007, Gelsema, P (Patrick) - FreeBSD wrote:
>>
>>> Hi Jeff,
>>>
>>> I tried your patch. Ran a buildkernel, timed. Recompiled kernel
>>> including
>>> your pa
> Josh, I had an interesting thought today. What if the reason 4BSD is
> faster is because it distributes load more evenly across all packages
> because it distributes randomly? ULE distributed across cores evenly but
> not packages. Can you try the attached patch? This also turns the
> defau
On Tue, 6 Nov 2007, Josh Carroll wrote:
That's expected due to the fuzzy rounding of 128 / 10, etc. Can you set
slice_min and slice both equal to 7 and see if the numbers come out
better than without the patch but with a slice value of 7? Basically I'm
trying to isolate the effects of the diff
> That's expected due to the fuzzy rounding of 128 / 10, etc. Can you set
> slice_min and slice both equal to 7 and see if the numbers come out
> better than without the patch but with a slice value of 7? Basically I'm
> trying to isolate the effects of the different slice handling in this
> patc
> That's expected due to the fuzzy rounding of 128 / 10, etc. Can you set
> slice_min and slice both equal to 7 and see if the numbers come out
> better than without the patch but with a slice value of 7? Basically I'm
> trying to isolate the effects of the different slice handling in this
> patc
On Mon, 5 Nov 2007, Josh Carroll wrote:
Turns out the last patch I posted had a small compile error because I
edited it by hand to remove one section. Here's an updated patch that
fixes that and changes the min/max slice values to something more
reasonable. Slice min should be around 4 with a
> Sysbench results:
> # threadsslice=7 slice=13 slice_min=4 slice_min=2
> 42265.672250.36 2261.712297.08
> 82300.252310.02 2306.792313.61
> 12 2269.542304.04 2296.542279.73
>
On Sun, 4 Nov 2007, Jeff Roberson wrote:
On Sun, 4 Nov 2007, Gelsema, P (Patrick) - FreeBSD wrote:
w/o patch
hulk# time make -j8 buildkernel
837.808u 138.167s 10:28.96 155.1% 6349+1349k 2873+7780io 303pf+0w
w patch
hulk# time make -j8 buildkernel
838.554u 168.316s 10:52.10 154.4%
> Turns out the last patch I posted had a small compile error because I
> edited it by hand to remove one section. Here's an updated patch that
> fixes that and changes the min/max slice values to something more
> reasonable. Slice min should be around 4 with a max of 12.
>
> Also looks like 4BSD
On Sun, November 4, 2007 22:27, Jeff Roberson wrote:
> On Sun, 4 Nov 2007, Gelsema, P (Patrick) - FreeBSD wrote:
>
>> Hi Jeff,
>>
>> I tried your patch. Ran a buildkernel, timed. Recompiled kernel
>> including
>> your patch, rebooted and reran. Please find results below.
>>
>> w/o patch
>> hulk# ti
On Sun, 4 Nov 2007, Josh Carroll wrote:
Josh, I included one too many changes in the diff and it made the results
ambiguous. I've scaled it back slightly by removing the changes to
sched_pickcpu() and included the patch in this email again. Can you run
through your tests once more? I'd like t
Hi Jeff,
I tried your patch. Ran a buildkernel, timed. Recompiled kernel including
your patch, rebooted and reran. Please find results below.
w/o patch
hulk# time make -j8 buildkernel
837.808u 138.167s 10:28.96 155.1% 6349+1349k 2873+7780io 303pf+0w
w patch
hulk# time make -j8 buildkernel
On Sun, 4 Nov 2007, Gelsema, P (Patrick) - FreeBSD wrote:
Hi Jeff,
I tried your patch. Ran a buildkernel, timed. Recompiled kernel including
your patch, rebooted and reran. Please find results below.
w/o patch
hulk# time make -j8 buildkernel
837.808u 138.167s 10:28.96 155.1% 6349+1349k 2
> Josh, I included one too many changes in the diff and it made the results
> ambiguous. I've scaled it back slightly by removing the changes to
> sched_pickcpu() and included the patch in this email again. Can you run
> through your tests once more? I'd like to commit this part soon as it
> hel
On Sun, 4 Nov 2007, Josh Carroll wrote:
Josh, thanks for your help so far. This has been very useful.
You're welcome, glad to help! Thanks for the effort and the patch.
Any testing you can run this through is appreciated. Anyone else lurking
in this thread who would like to is also welcome
> Josh, thanks for your help so far. This has been very useful.
You're welcome, glad to help! Thanks for the effort and the patch.
> Any testing you can run this through is appreciated. Anyone else lurking
> in this thread who would like to is also welcome to report back findings.
Here are a f
On Sat, 3 Nov 2007, Josh Carroll wrote:
What would be interesting to know is if the sum of the temperatures is any
different. 4BSD gets a much more random distribution of load because a
thread is run on whatever cpu context switches next. ULE will have
specific load patterns since it scans lis
> What would be interesting to know is if the sum of the temperatures is any
> different. 4BSD gets a much more random distribution of load because a
> thread is run on whatever cpu context switches next. ULE will have
> specific load patterns since it scans lists of cpus in a fixed order to
> as
> What was the -j value and number of processors?
-j 8.
I did the following (one warm up, 3 times in a row after that, averaged):
cd /usr/src
rm -rf /usr/obj/*
make clean
time make -j8 -DNOCLEAN buildworld
The system is a Q6600, so 4 cores.
Thanks,
Josh
On Sat, 3 Nov 2007, Josh Carroll wrote:
What was the -j value and number of processors?
-j 8.
I did the following (one warm up, 3 times in a row after that, averaged):
cd /usr/src
rm -rf /usr/obj/*
make clean
time make -j8 -DNOCLEAN buildworld
The system is a Q6600, so 4 cores.
Josh, than
> buildworld isn't cooperating for me, but once I iron that out, I'll
> post some results there as well :)
I was able to get buildworld compiling ok and here are the results:
4BSDULE.13ULE.7
13:24.7313:44.2813:38.85
Only a 1.75% difference when the slice value is set to 7
On Sat, 3 Nov 2007, Josh Carroll wrote:
buildworld isn't cooperating for me, but once I iron that out, I'll
post some results there as well :)
I was able to get buildworld compiling ok and here are the results:
4BSDULE.13ULE.7
13:24.7313:44.2813:38.85
Only a 1.75% dif
> Thank you, that was very useful. I may have something to test very soon.
Sounds great Jeff, just say the word when you need someone to do the
testing. I'll be glad to help!
> What would be interesting to know is if the sum of the temperatures is any
> different. 4BSD gets a much more random d
On Fri, 2 Nov 2007, Josh Carroll wrote:
Could you try spot checking a couple of tests with kern.sched.slice set to
half its present value? 4BSD on average will use half the slice that ULE
will by default.
The initial value was 13, and I changed it to 7. Here is the time
result for the ffmpeg
> Could you try spot checking a couple of tests with kern.sched.slice set to
> half its present value? 4BSD on average will use half the slice that ULE
> will by default.
The initial value was 13, and I changed it to 7. Here is the time
result for the ffmpeg run:
13: 1:39.09
7:1:37.01
I al
> This is interesting. I have had a couple of laptop users report success
> in using lower power saving modes with ULE. Are these core temp
> observations repeatable?
>
> Thanks,
> Jeff
>
> >
> > Thanks again for all your help! Please let me know if/when I can do
> > anything else to help out
On Thu, 25 Oct 2007, Josh Carroll wrote:
I'm confident that we can improve things. It will probably not make the
cut for 7.0 since it will be too disruptive. I'm sure it can be
backported before 7.1 when ULE is likely to become the default.
That sounds great! I figured it was something that
> I'm confident that we can improve things. It will probably not make the
> cut for 7.0 since it will be too disruptive. I'm sure it can be
> backported before 7.1 when ULE is likely to become the default.
That sounds great! I figured it was something that would have to wait
until 7.0 released.
On Wed, 24 Oct 2007, Josh Carroll wrote:
Your tests with ffmpeg threads vs processes probably is triggering more
context switches due to lock contention in the kernel in the threads case.
This is also likely the problem with some super-smack tests. On each
context switch 4BSD has an opportunity
> Your tests with ffmpeg threads vs processes probably is triggering more
> context switches due to lock contention in the kernel in the threads case.
> This is also likely the problem with some super-smack tests. On each
> context switch 4BSD has an opportunity to perfectly balance the CPUs. ULE
On Tue, 23 Oct 2007, Josh Carroll wrote:
Hello,
I posted this to the stable mailing list, as I thought it was
pertinent there, but I think it will get better attention here. So I
apologize in advance for cross-posting if this is a faux pas. :)
Anyway, in summary, ULE is about 5-6 % slower than
On 10/24/07, Josh Carroll <[EMAIL PROTECTED]> wrote:
>
> > Yes, that's the proper default. You could try setting steal_thresh to 1.
> I
> > noticed a problem with building ports on an 8 core Xeon system while 8
> > distributed.net crunchers were running. The port build would proceed
> > incredibly
On Tue, 23 Oct 2007, Kris Kennaway wrote:
Josh Carroll wrote:
Anyway, in summary, ULE is about 5-6 % slower than 4BSD for two
workloads that I am sensitive to: building world with -j X, and ffmpeg
-threads X. Other benchmarks seem to indicate relatively equal
performance between the two. MySQL,
> Yes, that's the proper default. You could try setting steal_thresh to 1. I
> noticed a problem with building ports on an 8 core Xeon system while 8
> distributed.net crunchers were running. The port build would proceed
> incredibly slowly, steal_thresh=1 helped a little bit. It might not make up
On Wed, 24 Oct 2007 09:39:29 -0400
"Josh Carroll" <[EMAIL PROTECTED]> wrote:
> > 5-6% is a lot. ULE has some tuning for makeworld in -current, which
> > for me reduced it to less than 1% slower than 4BSD (down from 5-10%
> > slower), for the case of makeworld -j4 over nfs on a 2-CPU system with
>
On Wed, 24 Oct 2007 11:39:52 -0400
"Josh Carroll" <[EMAIL PROTECTED]> wrote:
> > kern.sched.steal_thresh is/was one of the more effective tuning sysctls.
> > rev 1.205 of sched_ule had a change that was supposed to automatically
> > adjust it based on the number of cores. Is this the same 8 core s
> kern.sched.steal_thresh is/was one of the more effective tuning sysctls. rev
> 1.205 of sched_ule had a change that was supposed to automatically adjust it
> based on the number of cores. Is this the same 8 core system as the
> other thread? In that case the commit dictates steal_thresh should be
At Tue, 23 Oct 2007 21:06:39 -0400,
Josh Carroll wrote:
>
> I decided to do some testing of concurrent processes (rather than a
> single process that's multi-threaded). Specifically, I ran 4 ffmpeg
> (without the -threads option) commands at the same time. The
> difference was less than a percent:
> 5-6% is a lot. ULE has some tuning for makeworld in -current, which
> for me reduced it to less than 1% slower than 4BSD (down from 5-10%
> slower), for the case of makeworld -j4 over nfs on a 2-CPU system with
> the sources pre-cached on the server and objects on a local file system,
> and exte
> We can not ignore this performance bug, also I had found that ULE is
> slower than 4BSD when testing super-smack's update benchmark on my
> dual-core machine.
I actually saw improved performance with ULE over 4BSD for
super-smack. What were the parameters you used for your testing? These
were mi
Kris Kennaway wrote:
One major difference is that your workload is 100% user. Also you were
reporting ULE had more idle time, which looks like a bug since I would
expect it be basically 0% idle on such a workload.
Kris
We can not ignore this performance bug, also I had found that ULE is
sl
I decided to do some testing of concurrent processes (rather than a
single process that's multi-threaded). Specifically, I ran 4 ffmpeg
(without the -threads option) commands at the same time. The
difference was less than a percent:
4bsd: 439.92 real 1755.91 user 1.08 sys
ule:
> My next step is to run some transcodes with mencoder to see if it has
> similar performance between the two schedulers. When I have those
> results, I'll post them to this thread.
mencoder is linked against the same libx264 library that ffmpeg uses
for h.264 encoding, so I was expecting similar
On Tuesday 23 October 2007, Josh Carroll wrote:
> > ULE is tuned towards providing cpu affinity compilation and
> > evidently encoding are workloads that do not benefit from
> > affinity. Before we conclude that it is slower, try building with
> > -j5, -j6, j7.
>
> Here are the results of running f
> Just curious, but are these results obtained while you are
> overclocking your 2.4ghz CPU to 3.4ghz? That might be a useful
> datapoint.
Yes they are with the CPU overclocked. I have verified the results
when not overclocked as well (running at stock).
> It also might be useful to know what s
> ULE is tuned towards providing cpu affinity compilation and evidently
> encoding are workloads that do not benefit from affinity. Before we
> conclude that it is slower, try building with -j5, -j6, j7.
Here are the results of running ffmpeg with 4 through 8 threads on
both schedulers:
4 threads
On 10/23/07, Josh Carroll <[EMAIL PROTECTED]> wrote:
> Hello,
>
> I posted this to the stable mailing list, as I thought it was
> pertinent there, but I think it will get better attention here. So I
> apologize in advance for cross-posting if this is a faux pas. :)
>
> Anyway, in summary, ULE is ab
Josh Carroll wrote:
Hello,
I posted this to the stable mailing list, as I thought it was
pertinent there, but I think it will get better attention here. So I
apologize in advance for cross-posting if this is a faux pas. :)
Anyway, in summary, ULE is about 5-6 % slower than 4BSD for two
workload
Hello,
I posted this to the stable mailing list, as I thought it was
pertinent there, but I think it will get better attention here. So I
apologize in advance for cross-posting if this is a faux pas. :)
Anyway, in summary, ULE is about 5-6 % slower than 4BSD for two
workloads that I am sensitive
59 matches
Mail list logo