Re: mysql scaling questions

2008-01-01 Thread Josh Carroll
> Does anyone have a theory why syscalls are so expensive in FreeBSD? Here > are the results of unixbench 4.1 on two machines. First is the machine > running FreeBSD HEAD (debugging disabled) on a dual-core Athlon 64 (i386 > mode), 2 GHz: I ran the syscall benchmark from UnixBench on the same hard

Re: ULE vs. 4BSD in RELENG_7

2007-12-02 Thread Josh Carroll
> is bigger better or worse? For sysbench bigger is better (more transactions per second). For ffmpeg, lower is better - e.g. the time to transcode the first 120 seconds of the selected video is less, so it ran faster. Josh ___ freebsd-performance@freeb

Re: ULE vs. 4BSD in RELENG_7

2007-12-01 Thread Josh Carroll
> I just ran through some of my benchmarks on a kernel build from > sources as of today, and I've noticed an improvement for the ffmpeg > workload. Here's a comparison of 4bsd, ule (BETA1) and ule (BETA3). > This is vanilla source with no patches applied: Sorry, the ministat output was mangled. I'

Re: ULE vs. 4BSD in RELENG_7

2007-12-01 Thread Josh Carroll
Jeff, I just ran through some of my benchmarks on a kernel build from sources as of today, and I've noticed an improvement for the ffmpeg workload. Here's a comparison of 4bsd, ule (BETA1) and ule (BETA3). This is vanilla source with no patches applied: x 4bsd + ule * uleb3 +-

Re: ULE vs. 4BSD in RELENG_7

2007-11-11 Thread Josh Carroll
> These ministat results show that the latest patch (alone) results in > slightly worse performance for ffmpeg and buildworld, but slightly > better results for sysbench. Please disregard those conclusions, I misread the ffmpeg results and didn't look at all thread counts for the sysbench runs. L

Re: ULE vs. 4BSD in RELENG_7

2007-11-11 Thread Josh Carroll
> Try the /usr/src/tools/tools/ministat utility for a simple and effective > way to compare these kinds of noisy measurements and extract reliable > comparisons. Thanks again for the suggestion, Kris! I compiled results for 5 runs of the three benchmarks I've been using (ffmpeg, sysbench(mysql),

Re: ULE vs. 4BSD in RELENG_7

2007-11-10 Thread Josh Carroll
> BTW, it doesn't make much sense to be measuring to millisecond precision > a quantity which has variation that is unknown but probably much larger > :) When trying to make comparisons to identify performance changes, a > careful statistical approach is necessary. > > Try the /usr/src/tools/tools

Re: ULE vs. 4BSD in RELENG_7

2007-11-10 Thread Josh Carroll
> ffmpeg: 1:38.885 > > sysbench: (4,8,12,16 threads respectively): >2221.93 >2327.87 >2292.49 >2269.29 > > And buildworld: 13m47.052s I ran these after changing the slice value to 7 as well with this patch. ffmpeg: 1:38.547 sysbench: 2236.55 2321.02 2271.76 2254.85

Re: ULE vs. 4BSD in RELENG_7

2007-11-09 Thread Josh Carroll
> Josh, I had an interesting thought today. What if the reason 4BSD is > faster is because it distributes load more evenly across all packages > because it distributes randomly? ULE distributed across cores evenly but > not packages. Can you try the attached patch? This also turns the > defau

Re: ULE vs. 4BSD in RELENG_7

2007-11-06 Thread Josh Carroll
> That's expected due to the fuzzy rounding of 128 / 10, etc. Can you set > slice_min and slice both equal to 7 and see if the numbers come out > better than without the patch but with a slice value of 7? Basically I'm > trying to isolate the effects of the different slice handling in this > patc

Re: ULE vs. 4BSD in RELENG_7

2007-11-06 Thread Josh Carroll
> That's expected due to the fuzzy rounding of 128 / 10, etc. Can you set > slice_min and slice both equal to 7 and see if the numbers come out > better than without the patch but with a slice value of 7? Basically I'm > trying to isolate the effects of the different slice handling in this > patc

Re: ULE vs. 4BSD in RELENG_7

2007-11-05 Thread Josh Carroll
> Sysbench results: > # threadsslice=7 slice=13 slice_min=4 slice_min=2 > 42265.672250.36 2261.712297.08 > 82300.252310.02 2306.792313.61 > 12 2269.542304.04 2296.542279.73 >

Re: ULE vs. 4BSD in RELENG_7

2007-11-05 Thread Josh Carroll
> Turns out the last patch I posted had a small compile error because I > edited it by hand to remove one section. Here's an updated patch that > fixes that and changes the min/max slice values to something more > reasonable. Slice min should be around 4 with a max of 12. > > Also looks like 4BSD

Re: ULE vs. 4BSD in RELENG_7

2007-11-04 Thread Josh Carroll
> Josh, I included one too many changes in the diff and it made the results > ambiguous. I've scaled it back slightly by removing the changes to > sched_pickcpu() and included the patch in this email again. Can you run > through your tests once more? I'd like to commit this part soon as it > hel

Re: ULE vs. 4BSD in RELENG_7

2007-11-03 Thread Josh Carroll
> Josh, thanks for your help so far. This has been very useful. You're welcome, glad to help! Thanks for the effort and the patch. > Any testing you can run this through is appreciated. Anyone else lurking > in this thread who would like to is also welcome to report back findings. Here are a f

Re: ULE vs. 4BSD in RELENG_7

2007-11-03 Thread Josh Carroll
> What would be interesting to know is if the sum of the temperatures is any > different. 4BSD gets a much more random distribution of load because a > thread is run on whatever cpu context switches next. ULE will have > specific load patterns since it scans lists of cpus in a fixed order to > as

Re: ULE vs. 4BSD in RELENG_7

2007-11-03 Thread Josh Carroll
> What was the -j value and number of processors? -j 8. I did the following (one warm up, 3 times in a row after that, averaged): cd /usr/src rm -rf /usr/obj/* make clean time make -j8 -DNOCLEAN buildworld The system is a Q6600, so 4 cores. Thanks, Josh

Re: ULE vs. 4BSD in RELENG_7

2007-11-03 Thread Josh Carroll
> buildworld isn't cooperating for me, but once I iron that out, I'll > post some results there as well :) I was able to get buildworld compiling ok and here are the results: 4BSDULE.13ULE.7 13:24.7313:44.2813:38.85 Only a 1.75% difference when the slice value is set to 7

Re: ULE vs. 4BSD in RELENG_7

2007-11-02 Thread Josh Carroll
> Thank you, that was very useful. I may have something to test very soon. Sounds great Jeff, just say the word when you need someone to do the testing. I'll be glad to help! > What would be interesting to know is if the sum of the temperatures is any > different. 4BSD gets a much more random d

Re: ULE vs. 4BSD in RELENG_7

2007-11-02 Thread Josh Carroll
> Could you try spot checking a couple of tests with kern.sched.slice set to > half its present value? 4BSD on average will use half the slice that ULE > will by default. The initial value was 13, and I changed it to 7. Here is the time result for the ffmpeg run: 13: 1:39.09 7:1:37.01 I al

Re: ULE vs. 4BSD in RELENG_7

2007-10-25 Thread Josh Carroll
> I'm confident that we can improve things. It will probably not make the > cut for 7.0 since it will be too disruptive. I'm sure it can be > backported before 7.1 when ULE is likely to become the default. That sounds great! I figured it was something that would have to wait until 7.0 released.

Re: ULE vs. 4BSD in RELENG_7

2007-10-24 Thread Josh Carroll
> Your tests with ffmpeg threads vs processes probably is triggering more > context switches due to lock contention in the kernel in the threads case. > This is also likely the problem with some super-smack tests. On each > context switch 4BSD has an opportunity to perfectly balance the CPUs. ULE

Re: ULE vs. 4BSD in RELENG_7

2007-10-24 Thread Josh Carroll
> Yes, that's the proper default. You could try setting steal_thresh to 1. I > noticed a problem with building ports on an 8 core Xeon system while 8 > distributed.net crunchers were running. The port build would proceed > incredibly slowly, steal_thresh=1 helped a little bit. It might not make up

Re: ULE vs. 4BSD in RELENG_7

2007-10-24 Thread Josh Carroll
> kern.sched.steal_thresh is/was one of the more effective tuning sysctls. rev > 1.205 of sched_ule had a change that was supposed to automatically adjust it > based on the number of cores. Is this the same 8 core system as the > other thread? In that case the commit dictates steal_thresh should be

Re: ULE vs. 4BSD in RELENG_7

2007-10-24 Thread Josh Carroll
> 5-6% is a lot. ULE has some tuning for makeworld in -current, which > for me reduced it to less than 1% slower than 4BSD (down from 5-10% > slower), for the case of makeworld -j4 over nfs on a 2-CPU system with > the sources pre-cached on the server and objects on a local file system, > and exte

Re: ULE vs. 4BSD in RELENG_7

2007-10-23 Thread Josh Carroll
> We can not ignore this performance bug, also I had found that ULE is > slower than 4BSD when testing super-smack's update benchmark on my > dual-core machine. I actually saw improved performance with ULE over 4BSD for super-smack. What were the parameters you used for your testing? These were mi

Re: ULE vs. 4BSD in RELENG_7

2007-10-23 Thread Josh Carroll
I decided to do some testing of concurrent processes (rather than a single process that's multi-threaded). Specifically, I ran 4 ffmpeg (without the -threads option) commands at the same time. The difference was less than a percent: 4bsd: 439.92 real 1755.91 user 1.08 sys ule:

Re: ULE vs. 4BSD in RELENG_7

2007-10-23 Thread Josh Carroll
> My next step is to run some transcodes with mencoder to see if it has > similar performance between the two schedulers. When I have those > results, I'll post them to this thread. mencoder is linked against the same libx264 library that ffmpeg uses for h.264 encoding, so I was expecting similar

Re: ULE vs. 4BSD in RELENG_7

2007-10-23 Thread Josh Carroll
> Just curious, but are these results obtained while you are > overclocking your 2.4ghz CPU to 3.4ghz? That might be a useful > datapoint. Yes they are with the CPU overclocked. I have verified the results when not overclocked as well (running at stock). > It also might be useful to know what s

Re: ULE vs. 4BSD in RELENG_7

2007-10-23 Thread Josh Carroll
> ULE is tuned towards providing cpu affinity compilation and evidently > encoding are workloads that do not benefit from affinity. Before we > conclude that it is slower, try building with -j5, -j6, j7. Here are the results of running ffmpeg with 4 through 8 threads on both schedulers: 4 threads

ULE vs. 4BSD in RELENG_7

2007-10-23 Thread Josh Carroll
Hello, I posted this to the stable mailing list, as I thought it was pertinent there, but I think it will get better attention here. So I apologize in advance for cross-posting if this is a faux pas. :) Anyway, in summary, ULE is about 5-6 % slower than 4BSD for two workloads that I am sensitive

disk read vs. write performance in 5.4-RELEASE-p6

2005-08-07 Thread Josh Carroll
Hello, After reading man tuning, I began poking around at my IDE drives to see how their performance was in FreeBSD. I noticed that writes are quite slow (on the order of 15MB/s) compared to reads (55MB/s). In some initial googling, I saw a thread from early 2005 about 5.3 and performance problem