[Bug 176902]
yeah let's not fix this before it's a decade old. not much longer to wait! -- You received this bug notification because you are a member of Kubuntu Bugs, which is subscribed to kdegraphics in ubuntu. https://bugs.launchpad.net/bugs/176902 Title: kpdf locks sound output -- kubuntu-bugs mailing list kubuntu-b...@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/kubuntu-bugs
[Bug 102408]
yeah let's not fix this before it's a decade old. not much longer to wait! -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/102408 Title: Helper apps inherit open file descriptors -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 159258]
yeah let's not fix this before it's a decade old. not much longer to wait! -- You received this bug notification because you are a member of Ubuntu Bugs, which is a direct subscriber. https://bugs.launchpad.net/bugs/159258 Title: Helper applications launched by Firefox inherit ALL file descriptors -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
Bug#506707: me too
this is a fairly serious regression. -dean -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#495820: FTBS: make[1]: *** No rule to make target `txt'. Stop.
Package: iproute Version: 20080725-2 i did: sudo apt-get build-dep iproute apt-get source iproute cd iproute-20080725-2 fakeroot ./debian/rules binary and it fails: ... /usr/share/texmf-texlive/dvips/base/texps.pro /usr/share/texmf-texlive/dvips/base/special.pro /usr/share/texmf-texlive/dvips/base/color.pro. /usr/share/texmf-texlive/fonts/type1/bluesky/cm/cmsy10.pfb[1] make[1]: *** No rule to make target `txt'. Stop. make[1]: Leaving directory `/var/src/iproute2/iproute-20080725/doc' make: *** [stamp-doc] Error 2 if i remove the txt from the make -C doc line in debian/rules the build completes successfully. is there some other missing build-dep which makes that work? thanks -dean -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#493635: really ignore /etc/network/options
package: netbase version: 4.33 spot the bug in /etc/init.d/networking: process_options() { [ -e /etc/network/options ] || return 0 log_warning_msg /etc/network/options still exists and it will be IGNORED! Read README.Debian of netbase. } there should be a return 0 after the log_warning_msg... without it /etc/init.d/networking aborts if there is a /etc/network/options file and all hell breaks loose. -dean -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: valgrind and openssl
On Tue, 20 May 2008, Richard Salz wrote: on the other hand it may be a known plaintext attack. Using those words in this context makes it sound that you not only don't understand what is being discussed right here and now, but also that you don't understand the term you just used. Are you sure you understood, e.g., Ted Tso's postings in this thread? Perhaps I'm missing something, but can you show me something that talks about known plaintext attacks in the context of hashing/digests? yes i abused the term. the so-called uninitialized data is actually from the stack right? an attacker generally controls that (i.e. earlier use of the stack probably includes char buf[] which is controllable). i don't know what ordering the entropy is added to the PRNG, but if all the useful entropy goes in first then an attacker might get to control the last 1KiB passed through the SHA1. yes it's unlikely given what we know today that an attacker could manipulate the state down to a sufficiently small number of outputs, but i really don't see the point of letting an attacker have that sort of control. -dean __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager [EMAIL PROTECTED]
Re: valgrind and openssl
On Thu, 15 May 2008, Geoff Thorpe wrote: I forgot to mention something; On Thursday 15 May 2008 12:38:24 John Parker wrote: It is already possible to use openssl and valgrind - just build OpenSSL with -DPURIFY, and it is quite clean. Actually on my system, just -DPURIFY doesn't satisfy valgrind. What I'm asking for is something that both satisfies valgrind and doesn't reduce the keyspace. If you're using an up-to-date version of openssl when you see this (ie. a recent CVS snapshot from our website, even if it's from a stable branch for compatibility reasons), then please post details. -DPURIFY exists to facilitate debuggers that don't like reading uninitialised data, so if that's not the case then please provide details. Note however that there are a variety of gotchas that allow you to create little leaks if you're not careful, and valgrind could well be complaining about those instead. Note that you should always build with no-asm if you're doing this kind of debug analysis. The assembly optimisations are likely to operate at granularities and in ways that valgrind could easily complain about. I don't know that this is the case, but it would certainly make sense to compare before posting a bug report. you know, this is sheer stupidity. you're suggesting that testing the no-asm code is a valid way of testing the assembly code? additionally the suggestion of -DPURIFY as a way of testing the code is also completely broken software engineering practice. any special case changes for testing means you're not testing the REAL CODE. for example if you build -DPURIFY then you also won't get notified of problems with other PRNG seeds which are supposed to be providing random *initialized* data. not to mention that a system compiled that way is insecure -- so you either have to link your binaries static (to avoid the danger of an insecure shared lib), or set up a chroot for testing. in any event YOU'RE NOT TESTING THE REAL CODE. which is to say you're wasting your time if you test under any of these conditions. openssl should not be relying on uninitialized data for anything. even if it doesn't matter from the point of view of the PRNG, it should be pretty damn clear it's horrible software engineering practice. -dean __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager [EMAIL PROTECTED]
Re: valgrind and openssl
On Thu, 15 May 2008, Bodo Moeller wrote: On Thu, May 15, 2008 at 11:41 PM, Erik de Castro Lopo [EMAIL PROTECTED] wrote: Goetz Babin-Ebell wrote: But here the use of this uninitialized data is intentional and the programmer are very well aware of what they did. The use of unititialized data in this case is stupid because the entropy of this random data is close to zero. It may be zero, but it may be more, depending on what happened earlier in the program if the same memory locations have been in use before. This may very well include data that would be unpredictable to adversaries -- i.e., entropy; that's the point here. on the other hand it may be a known plaintext attack. what are you guys smoking? -dean __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager [EMAIL PROTECTED]
RE: valgrind and openssl
On Mon, 19 May 2008, David Schwartz wrote: any special case changes for testing means you're not testing the REAL CODE. You mean you're not testing *all* of the real code. That's fine, you can't debug everythign at once. if you haven't tested your final production binary then you haven't tested anything at all. Good luck finding people who agree with you. I've been a professional software developer for about 18 years and I've worked on debugging with i've been a professional for longer than you. big whoop. -dean __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager [EMAIL PROTECTED]
Bug#481754: no option for specifying syslog facility
Package: fail2ban Version: 0.8.2-3 fail2ban 0.6 supported a syslog-facility config option which controlled the facility for syslog messages... 0.8.2-3 does not support this. i had to edit /usr/share/fail2ban/server/server.py in order to change LOG_DAEMON to LOG_AUTH. -dean -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#481760: Failed none causes false triggers
Package: fail2ban Version: 0.8.2-3 when connecting with ssh keys, no password, sshd logs: May 18 05:08:45 twinlark sshd[5681]: Failed none for dean from 10.1.1.1 port 37262 ssh2 May 18 05:08:45 twinlark sshd[5681]: Found matching RSA key: May 18 05:08:45 twinlark sshd[5681]: Found matching RSA key: May 18 05:08:45 twinlark sshd[5681]: Accepted publickey for dean from 10.1.1.1 port 37262 ssh2 and fail2ban considers the Failed none to be an attack... enough successful logins like this and the IP is banned. this is broken. best fix i can see is to be more explicit about the /etc/fail2ban/filter.d/sshd.conf filters, such as: ^%(__prefix_line)sFailed password for .* from HOST(?: port \d*)?(?: ssh\d*)?$ ^%(__prefix_line)sFailed publickey for .* from HOST(?: port \d*)?(?: ssh\d*)?$ -dean -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#479530: confim on error
Package: apt-listchanges Version: 2.82 when apt-listchanges encounters an error (such as the now infamous database /var/lib/apt/listchanges.db failed to load. error) it continues without confirmation even if confirm=1 is in the etc file. i think apt-listchanges should always ask for confirmation when confirm=1 is set. -dean -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: [PATCH -mm crypto] AES: x86_64 asm implementation optimization
one of the more important details in evaluating these changes would be the family/model/stepping of the processors being microbenchmarked... could you folks include /proc/cpuinfo with the results? also -- please drop the #define for R16 to %rsp ... it obfuscates more than it helps anything. thanks -dean On Wed, 30 Apr 2008, Sebastian Siewior wrote: * Huang, Ying | 2008-04-25 11:11:17 [+0800]: Hi, Sebastian, Hi Huang, sorry for the delay. I changed the patches to group the read or write together instead of interleaving. Can you help me to test these new patches? The new patches is attached with the mail. The new results are attached. Best Regards, Huang Ying Sebastian -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: system without RAM on node0 boot fail
actually yeah i've seen this... in a bizarre failure situation in a system which physically had RAM in the boot node but it was never enumerated for the kernel (other nodes had RAM which was enumerated). so technically there was boot node RAM but the kernel never saw it. -dean On Wed, 30 Jan 2008, Christoph Lameter wrote: > x86 supports booting from a node without RAM? > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.24] x86: add sysfs interface for cpuid module
why do we need another kernel cpuid reading method when sched_setaffinity exists and cpuid is available in ring3? -dean -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: add PCI IDs to k8topology_64.c II
On Tue, 29 Jan 2008, Andi Kleen wrote: > > SRAT is essentially just a two dimensional table with node distances. > > Sorry, that was actually SLIT. SRAT is not two dimensional, but also > relatively simple. SLIT you don't really need to implement. yeah but i'd heartily recommend implementing SLIT too. mind you it's almost universal non-existence means i've had to resort to userland measurements to determine node distances and that won't change. i guess i just wanted to grumble somewhere. -dean -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: add PCI IDs to k8topology_64.c II
On Tue, 29 Jan 2008, Andi Kleen wrote: SRAT is essentially just a two dimensional table with node distances. Sorry, that was actually SLIT. SRAT is not two dimensional, but also relatively simple. SLIT you don't really need to implement. yeah but i'd heartily recommend implementing SLIT too. mind you it's almost universal non-existence means i've had to resort to userland measurements to determine node distances and that won't change. i guess i just wanted to grumble somewhere. -dean -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.24] x86: add sysfs interface for cpuid module
why do we need another kernel cpuid reading method when sched_setaffinity exists and cpuid is available in ring3? -dean -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: system without RAM on node0 boot fail
actually yeah i've seen this... in a bizarre failure situation in a system which physically had RAM in the boot node but it was never enumerated for the kernel (other nodes had RAM which was enumerated). so technically there was boot node RAM but the kernel never saw it. -dean On Wed, 30 Jan 2008, Christoph Lameter wrote: x86 supports booting from a node without RAM? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [rdiff-backup-users] can rdiff-backup be stopped / paused / restarted? - HOWTO?
On Mon, 14 Jan 2008, Dave Kempe wrote: Lexje wrote: I'm completely new to rdiff-backup. I'm trying to backup a complete server over the internet. Is it possible to pause, stop / restart rdiff-backup? (To free up / respect bandwith limitations) You could do a Ctrl-Z and then start it again with fg you could use screen as well or use kill -STOP and kill -CONT ... and pray the ssh connection isn't dropped. -dean ___ rdiff-backup-users mailing list at rdiff-backup-users@nongnu.org http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki
Re: Fast network file copy; "recvfile()" ?
On Thu, 17 Jan 2008, Patrick J. LoPresti wrote: > I need to copy large (> 100GB) files between machines on a fast > network. Both machines have reasonably fast disk subsystems, with > read/write performance benchmarked at > 800 MB/sec. Using 10GigE cards > and the usual tweaks to tcp_rmem etc., I am getting single-stream TCP > throughput better than 600 MB/sec. > > My question is how best to move the actual file. NFS writes appear to > max out at a little over 100 MB/sec on this configuration. did your "usual tweaks" include mounting with -o tcp,rsize=262144,wsize=262144 ? i should have kept better notes last time i was experimenting with this, but from memory here's what i found: - if i used three NFS clients and was reading from page cache on the server i hit 1.2GB/s total throughput from the server. the client NFS code was maxing out one CPU on each of the client machines. - disk subsystem (sw raid10 far2) was capable of 600MB/s+ when read locally on the NFS server, but topped out around ~250MB/s when read remotely (no matter how many clients). my workload was read-intensive so i didn't experiment with writes... -dean -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Fast network file copy; recvfile() ?
On Thu, 17 Jan 2008, Patrick J. LoPresti wrote: I need to copy large ( 100GB) files between machines on a fast network. Both machines have reasonably fast disk subsystems, with read/write performance benchmarked at 800 MB/sec. Using 10GigE cards and the usual tweaks to tcp_rmem etc., I am getting single-stream TCP throughput better than 600 MB/sec. My question is how best to move the actual file. NFS writes appear to max out at a little over 100 MB/sec on this configuration. did your usual tweaks include mounting with -o tcp,rsize=262144,wsize=262144 ? i should have kept better notes last time i was experimenting with this, but from memory here's what i found: - if i used three NFS clients and was reading from page cache on the server i hit 1.2GB/s total throughput from the server. the client NFS code was maxing out one CPU on each of the client machines. - disk subsystem (sw raid10 far2) was capable of 600MB/s+ when read locally on the NFS server, but topped out around ~250MB/s when read remotely (no matter how many clients). my workload was read-intensive so i didn't experiment with writes... -dean -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 001 of 6] md: Fix an occasional deadlock in raid5
On Tue, 15 Jan 2008, Andrew Morton wrote: > On Tue, 15 Jan 2008 21:01:17 -0800 (PST) dean gaudet <[EMAIL PROTECTED]> > wrote: > > > On Mon, 14 Jan 2008, NeilBrown wrote: > > > > > > > > raid5's 'make_request' function calls generic_make_request on > > > underlying devices and if we run out of stripe heads, it could end up > > > waiting for one of those requests to complete. > > > This is bad as recursive calls to generic_make_request go on a queue > > > and are not even attempted until make_request completes. > > > > > > So: don't make any generic_make_request calls in raid5 make_request > > > until all waiting has been done. We do this by simply setting > > > STRIPE_HANDLE instead of calling handle_stripe(). > > > > > > If we need more stripe_heads, raid5d will get called to process the > > > pending stripe_heads which will call generic_make_request from a > > > different thread where no deadlock will happen. > > > > > > > > > This change by itself causes a performance hit. So add a change so > > > that raid5_activate_delayed is only called at unplug time, never in > > > raid5. This seems to bring back the performance numbers. Calling it > > > in raid5d was sometimes too soon... > > > > > > Cc: "Dan Williams" <[EMAIL PROTECTED]> > > > Signed-off-by: Neil Brown <[EMAIL PROTECTED]> > > > > probably doesn't matter, but for the record: > > > > Tested-by: dean gaudet <[EMAIL PROTECTED]> > > > > this time i tested with internal and external bitmaps and it survived 8h > > and 14h resp. under the parallel tar workload i used to reproduce the > > hang. > > > > btw this should probably be a candidate for 2.6.22 and .23 stable. > > > > hm, Neil said > > The first fixes a bug which could make it a candidate for 24-final. > However it is a deadlock that seems to occur very rarely, and has been in > mainline since 2.6.22. So letting it into one more release shouldn't be > a big problem. While the fix is fairly simple, it could have some > unexpected consequences, so I'd rather go for the next cycle. > > food fight! > heheh. it's really easy to reproduce the hang without the patch -- i could hang the box in under 20 min on 2.6.22+ w/XFS and raid5 on 7x750GB. i'll try with ext3... Dan's experiences suggest it won't happen with ext3 (or is even more rare), which would explain why this has is overall a rare problem. but it doesn't result in dataloss or permanent system hangups as long as you can become root and raise the size of the stripe cache... so OK i agree with Neil, let's test more... food fight over! :) -dean -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 001 of 6] md: Fix an occasional deadlock in raid5
On Mon, 14 Jan 2008, NeilBrown wrote: > > raid5's 'make_request' function calls generic_make_request on > underlying devices and if we run out of stripe heads, it could end up > waiting for one of those requests to complete. > This is bad as recursive calls to generic_make_request go on a queue > and are not even attempted until make_request completes. > > So: don't make any generic_make_request calls in raid5 make_request > until all waiting has been done. We do this by simply setting > STRIPE_HANDLE instead of calling handle_stripe(). > > If we need more stripe_heads, raid5d will get called to process the > pending stripe_heads which will call generic_make_request from a > different thread where no deadlock will happen. > > > This change by itself causes a performance hit. So add a change so > that raid5_activate_delayed is only called at unplug time, never in > raid5. This seems to bring back the performance numbers. Calling it > in raid5d was sometimes too soon... > > Cc: "Dan Williams" <[EMAIL PROTECTED]> > Signed-off-by: Neil Brown <[EMAIL PROTECTED]> probably doesn't matter, but for the record: Tested-by: dean gaudet <[EMAIL PROTECTED]> this time i tested with internal and external bitmaps and it survived 8h and 14h resp. under the parallel tar workload i used to reproduce the hang. btw this should probably be a candidate for 2.6.22 and .23 stable. thanks -dean -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 001 of 6] md: Fix an occasional deadlock in raid5
On Mon, 14 Jan 2008, NeilBrown wrote: raid5's 'make_request' function calls generic_make_request on underlying devices and if we run out of stripe heads, it could end up waiting for one of those requests to complete. This is bad as recursive calls to generic_make_request go on a queue and are not even attempted until make_request completes. So: don't make any generic_make_request calls in raid5 make_request until all waiting has been done. We do this by simply setting STRIPE_HANDLE instead of calling handle_stripe(). If we need more stripe_heads, raid5d will get called to process the pending stripe_heads which will call generic_make_request from a different thread where no deadlock will happen. This change by itself causes a performance hit. So add a change so that raid5_activate_delayed is only called at unplug time, never in raid5. This seems to bring back the performance numbers. Calling it in raid5d was sometimes too soon... Cc: Dan Williams [EMAIL PROTECTED] Signed-off-by: Neil Brown [EMAIL PROTECTED] probably doesn't matter, but for the record: Tested-by: dean gaudet [EMAIL PROTECTED] this time i tested with internal and external bitmaps and it survived 8h and 14h resp. under the parallel tar workload i used to reproduce the hang. btw this should probably be a candidate for 2.6.22 and .23 stable. thanks -dean - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 001 of 6] md: Fix an occasional deadlock in raid5
On Tue, 15 Jan 2008, Andrew Morton wrote: On Tue, 15 Jan 2008 21:01:17 -0800 (PST) dean gaudet [EMAIL PROTECTED] wrote: On Mon, 14 Jan 2008, NeilBrown wrote: raid5's 'make_request' function calls generic_make_request on underlying devices and if we run out of stripe heads, it could end up waiting for one of those requests to complete. This is bad as recursive calls to generic_make_request go on a queue and are not even attempted until make_request completes. So: don't make any generic_make_request calls in raid5 make_request until all waiting has been done. We do this by simply setting STRIPE_HANDLE instead of calling handle_stripe(). If we need more stripe_heads, raid5d will get called to process the pending stripe_heads which will call generic_make_request from a different thread where no deadlock will happen. This change by itself causes a performance hit. So add a change so that raid5_activate_delayed is only called at unplug time, never in raid5. This seems to bring back the performance numbers. Calling it in raid5d was sometimes too soon... Cc: Dan Williams [EMAIL PROTECTED] Signed-off-by: Neil Brown [EMAIL PROTECTED] probably doesn't matter, but for the record: Tested-by: dean gaudet [EMAIL PROTECTED] this time i tested with internal and external bitmaps and it survived 8h and 14h resp. under the parallel tar workload i used to reproduce the hang. btw this should probably be a candidate for 2.6.22 and .23 stable. hm, Neil said The first fixes a bug which could make it a candidate for 24-final. However it is a deadlock that seems to occur very rarely, and has been in mainline since 2.6.22. So letting it into one more release shouldn't be a big problem. While the fix is fairly simple, it could have some unexpected consequences, so I'd rather go for the next cycle. food fight! heheh. it's really easy to reproduce the hang without the patch -- i could hang the box in under 20 min on 2.6.22+ w/XFS and raid5 on 7x750GB. i'll try with ext3... Dan's experiences suggest it won't happen with ext3 (or is even more rare), which would explain why this has is overall a rare problem. but it doesn't result in dataloss or permanent system hangups as long as you can become root and raise the size of the stripe cache... so OK i agree with Neil, let's test more... food fight over! :) -dean - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 001 of 6] md: Fix an occasional deadlock in raid5
On Mon, 14 Jan 2008, NeilBrown wrote: raid5's 'make_request' function calls generic_make_request on underlying devices and if we run out of stripe heads, it could end up waiting for one of those requests to complete. This is bad as recursive calls to generic_make_request go on a queue and are not even attempted until make_request completes. So: don't make any generic_make_request calls in raid5 make_request until all waiting has been done. We do this by simply setting STRIPE_HANDLE instead of calling handle_stripe(). If we need more stripe_heads, raid5d will get called to process the pending stripe_heads which will call generic_make_request from a different thread where no deadlock will happen. This change by itself causes a performance hit. So add a change so that raid5_activate_delayed is only called at unplug time, never in raid5. This seems to bring back the performance numbers. Calling it in raid5d was sometimes too soon... Cc: Dan Williams [EMAIL PROTECTED] Signed-off-by: Neil Brown [EMAIL PROTECTED] probably doesn't matter, but for the record: Tested-by: dean gaudet [EMAIL PROTECTED] this time i tested with internal and external bitmaps and it survived 8h and 14h resp. under the parallel tar workload i used to reproduce the hang. btw this should probably be a candidate for 2.6.22 and .23 stable. thanks -dean -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 001 of 6] md: Fix an occasional deadlock in raid5
On Tue, 15 Jan 2008, Andrew Morton wrote: On Tue, 15 Jan 2008 21:01:17 -0800 (PST) dean gaudet [EMAIL PROTECTED] wrote: On Mon, 14 Jan 2008, NeilBrown wrote: raid5's 'make_request' function calls generic_make_request on underlying devices and if we run out of stripe heads, it could end up waiting for one of those requests to complete. This is bad as recursive calls to generic_make_request go on a queue and are not even attempted until make_request completes. So: don't make any generic_make_request calls in raid5 make_request until all waiting has been done. We do this by simply setting STRIPE_HANDLE instead of calling handle_stripe(). If we need more stripe_heads, raid5d will get called to process the pending stripe_heads which will call generic_make_request from a different thread where no deadlock will happen. This change by itself causes a performance hit. So add a change so that raid5_activate_delayed is only called at unplug time, never in raid5. This seems to bring back the performance numbers. Calling it in raid5d was sometimes too soon... Cc: Dan Williams [EMAIL PROTECTED] Signed-off-by: Neil Brown [EMAIL PROTECTED] probably doesn't matter, but for the record: Tested-by: dean gaudet [EMAIL PROTECTED] this time i tested with internal and external bitmaps and it survived 8h and 14h resp. under the parallel tar workload i used to reproduce the hang. btw this should probably be a candidate for 2.6.22 and .23 stable. hm, Neil said The first fixes a bug which could make it a candidate for 24-final. However it is a deadlock that seems to occur very rarely, and has been in mainline since 2.6.22. So letting it into one more release shouldn't be a big problem. While the fix is fairly simple, it could have some unexpected consequences, so I'd rather go for the next cycle. food fight! heheh. it's really easy to reproduce the hang without the patch -- i could hang the box in under 20 min on 2.6.22+ w/XFS and raid5 on 7x750GB. i'll try with ext3... Dan's experiences suggest it won't happen with ext3 (or is even more rare), which would explain why this has is overall a rare problem. but it doesn't result in dataloss or permanent system hangups as long as you can become root and raise the size of the stripe cache... so OK i agree with Neil, let's test more... food fight over! :) -dean -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
nosmp/maxcpus=0 or 1 -> TSC unstable
if i boot an x86 64-bit 2.6.24-rc7 kernel with nosmp, maxcpus=0 or 1 it still disables TSC :) Marking TSC unstable due to TSCs unsynchronized this is an opteron 2xx box which does have two cpus and no clock-divide in halt or cpufreq enabled so TSC should be fine with only one cpu. pretty sure this is the culprit is that num_possible_cpus() > 1, which would mean cpu_possible_map contains the second cpu... but i'm not quite sure what the right fix is... or perhaps this is all intended. -dean -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
nosmp/maxcpus=0 or 1 - TSC unstable
if i boot an x86 64-bit 2.6.24-rc7 kernel with nosmp, maxcpus=0 or 1 it still disables TSC :) Marking TSC unstable due to TSCs unsynchronized this is an opteron 2xx box which does have two cpus and no clock-divide in halt or cpufreq enabled so TSC should be fine with only one cpu. pretty sure this is the culprit is that num_possible_cpus() 1, which would mean cpu_possible_map contains the second cpu... but i'm not quite sure what the right fix is... or perhaps this is all intended. -dean -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CPA patchset
On Fri, 11 Jan 2008, dean gaudet wrote: > On Fri, 11 Jan 2008, Ingo Molnar wrote: > > > * Andi Kleen <[EMAIL PROTECTED]> wrote: > > > > > Cached requires the cache line to be read first before you can write > > > it. > > > > nonsense, and you should know it. It is perfectly possible to construct > > fully written cachelines, without reading the cacheline first. MOVDQ is > > SSE1 so on basically in every CPU today - and it is 16 byte aligned and > > can generate full cacheline writes, _without_ filling in the cacheline > > first. > > did you mean to write MOVNTPS above? btw in case you were thinking a normal store to WB rather than a non-temporal store... i ran a microbenchmark streaming stores to every 16 bytes of a 16MiB region aligned to 4096 bytes on a xeon 53xx series CPU (4MiB L2) + 5000X northbridge and the avg latency of MOVNTPS is 12 cycles whereas the avg latency of MOVAPS is 20 cycles. the inner loop is unrolled 16 times so there are literally 4 cache lines worth of stores being stuffed into the store queue as fast as possible... and there's no coalescing for normal stores even on this modern CPU. i'm certain i'll see the same thing on AMD... it's a very hard thing to do in hardware without the non-temporal hint. -dean -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CPA patchset
On Fri, 11 Jan 2008, Ingo Molnar wrote: > * Andi Kleen <[EMAIL PROTECTED]> wrote: > > > Cached requires the cache line to be read first before you can write > > it. > > nonsense, and you should know it. It is perfectly possible to construct > fully written cachelines, without reading the cacheline first. MOVDQ is > SSE1 so on basically in every CPU today - and it is 16 byte aligned and > can generate full cacheline writes, _without_ filling in the cacheline > first. did you mean to write MOVNTPS above? > Bulk ops (string ops, etc.) will do full cacheline writes too, > without filling in the cacheline. on intel with fast strings enabled yes. mind you intel gives hints in the documentation these operations don't respect coherence... and i asked about this when they posted their memory ordering paper but got no response. -dean -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CPA patchset
On Fri, 11 Jan 2008, Ingo Molnar wrote: * Andi Kleen [EMAIL PROTECTED] wrote: Cached requires the cache line to be read first before you can write it. nonsense, and you should know it. It is perfectly possible to construct fully written cachelines, without reading the cacheline first. MOVDQ is SSE1 so on basically in every CPU today - and it is 16 byte aligned and can generate full cacheline writes, _without_ filling in the cacheline first. did you mean to write MOVNTPS above? Bulk ops (string ops, etc.) will do full cacheline writes too, without filling in the cacheline. on intel with fast strings enabled yes. mind you intel gives hints in the documentation these operations don't respect coherence... and i asked about this when they posted their memory ordering paper but got no response. -dean -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CPA patchset
On Fri, 11 Jan 2008, dean gaudet wrote: On Fri, 11 Jan 2008, Ingo Molnar wrote: * Andi Kleen [EMAIL PROTECTED] wrote: Cached requires the cache line to be read first before you can write it. nonsense, and you should know it. It is perfectly possible to construct fully written cachelines, without reading the cacheline first. MOVDQ is SSE1 so on basically in every CPU today - and it is 16 byte aligned and can generate full cacheline writes, _without_ filling in the cacheline first. did you mean to write MOVNTPS above? btw in case you were thinking a normal store to WB rather than a non-temporal store... i ran a microbenchmark streaming stores to every 16 bytes of a 16MiB region aligned to 4096 bytes on a xeon 53xx series CPU (4MiB L2) + 5000X northbridge and the avg latency of MOVNTPS is 12 cycles whereas the avg latency of MOVAPS is 20 cycles. the inner loop is unrolled 16 times so there are literally 4 cache lines worth of stores being stuffed into the store queue as fast as possible... and there's no coalescing for normal stores even on this modern CPU. i'm certain i'll see the same thing on AMD... it's a very hard thing to do in hardware without the non-temporal hint. -dean -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc6 reproducible raid5 hang
On Thu, 10 Jan 2008, Neil Brown wrote: On Wednesday January 9, [EMAIL PROTECTED] wrote: On Sun, 2007-12-30 at 10:58 -0700, dean gaudet wrote: i have evidence pointing to d89d87965dcbe6fe4f96a2a7e8421b3a75f634d1 http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=d89d87965dcbe6fe4f96a2a7e8421b3a75f634d1 which was Neil's change in 2.6.22 for deferring generic_make_request until there's enough stack space for it. Commit d89d87965dcbe6fe4f96a2a7e8421b3a75f634d1 reduced stack utilization by preventing recursive calls to generic_make_request. However the following conditions can cause raid5 to hang until 'stripe_cache_size' is increased: Thanks for pursuing this guys. That explanation certainly sounds very credible. The generic_make_request_immed is a good way to confirm that we have found the bug, but I don't like it as a long term solution, as it just reintroduced the problem that we were trying to solve with the problematic commit. As you say, we could arrange that all request submission happens in raid5d and I think this is the right way to proceed. However we can still take some of the work into the thread that is submitting the IO by calling raid5d() at the end of make_request, like this. Can you test it please? Does it seem reasonable? Thanks, NeilBrown Signed-off-by: Neil Brown [EMAIL PROTECTED] it has passed 11h of the untar/diff/rm linux.tar.gz workload... that's pretty good evidence it works for me. thanks! Tested-by: dean gaudet [EMAIL PROTECTED] ### Diffstat output ./drivers/md/md.c|2 +- ./drivers/md/raid5.c |4 +++- 2 files changed, 4 insertions(+), 2 deletions(-) diff .prev/drivers/md/md.c ./drivers/md/md.c --- .prev/drivers/md/md.c 2008-01-07 13:32:10.0 +1100 +++ ./drivers/md/md.c 2008-01-10 11:08:02.0 +1100 @@ -5774,7 +5774,7 @@ void md_check_recovery(mddev_t *mddev) if (mddev-ro) return; - if (signal_pending(current)) { + if (current == mddev-thread-tsk signal_pending(current)) { if (mddev-pers-sync_request) { printk(KERN_INFO md: %s in immediate safe mode\n, mdname(mddev)); diff .prev/drivers/md/raid5.c ./drivers/md/raid5.c --- .prev/drivers/md/raid5.c 2008-01-07 13:32:10.0 +1100 +++ ./drivers/md/raid5.c 2008-01-10 11:06:54.0 +1100 @@ -3432,6 +3432,7 @@ static int chunk_aligned_read(struct req } } +static void raid5d (mddev_t *mddev); static int make_request(struct request_queue *q, struct bio * bi) { @@ -3547,7 +3548,7 @@ static int make_request(struct request_q goto retry; } finish_wait(conf-wait_for_overlap, w); - handle_stripe(sh, NULL); + set_bit(STRIPE_HANDLE, sh-state); release_stripe(sh); } else { /* cannot get stripe for read-ahead, just give-up */ @@ -3569,6 +3570,7 @@ static int make_request(struct request_q test_bit(BIO_UPTODATE, bi-bi_flags) ? 0 : -EIO); } + raid5d(mddev); return 0; } - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.24-rc6 reproducible raid5 hang
On Fri, 11 Jan 2008, Neil Brown wrote: Thanks. But I suspect you didn't test it with a bitmap :-) I ran the mdadm test suite and it hit a problem - easy enough to fix. damn -- i lost my bitmap 'cause it was external and i didn't have things set up properly to pick it up after a reboot :) if you send an updated patch i'll give it another spin... -dean - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid 1, can't get the second disk added back in.
On Tue, 8 Jan 2008, Bill Davidsen wrote: Neil Brown wrote: On Monday January 7, [EMAIL PROTECTED] wrote: Problem is not raid, or at least not obviously raid related. The problem is that the whole disk, /dev/hdb is unavailable. Maybe check /sys/block/hdb/holders ? lsof /dev/hdb ? good luck :-) losetup -a may help, lsof doesn't seem to show files used in loop mounts. Yes, long shot... and don't forget dmsetup ls... (followed immediately by apt-get remove evms if you're on an unfortunate version of ubuntu which helpfully installed that partition-stealing service for you.) -dean - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.24-rc6 reproducible raid5 hang
On Sat, 29 Dec 2007, Dan Williams wrote: On Dec 29, 2007 1:58 PM, dean gaudet [EMAIL PROTECTED] wrote: On Sat, 29 Dec 2007, Dan Williams wrote: On Dec 29, 2007 9:48 AM, dean gaudet [EMAIL PROTECTED] wrote: hmm bummer, i'm doing another test (rsync 3.5M inodes from another box) on the same 64k chunk array and had raised the stripe_cache_size to 1024... and got a hang. this time i grabbed stripe_cache_active before bumping the size again -- it was only 905 active. as i recall the bug we were debugging a year+ ago the active was at the size when it would hang. so this is probably something new. I believe I am seeing the same issue and am trying to track down whether XFS is doing something unexpected, i.e. I have not been able to reproduce the problem with EXT3. MD tries to increase throughput by letting some stripe work build up in batches. It looks like every time your system has hung it has been in the 'inactive_blocked' state i.e. 3/4 of stripes active. This state should automatically clear... cool, glad you can reproduce it :) i have a bit more data... i'm seeing the same problem on debian's 2.6.22-3-amd64 kernel, so it's not new in 2.6.24. This is just brainstorming at this point, but it looks like xfs can submit more requests in the bi_end_io path such that it can lock itself out of the RAID array. The sequence that concerns me is: return_io-xfs_buf_end_io-xfs_buf_io_end-xfs_buf_iodone_work-xfs_buf_iorequest-make_request-hang I need verify whether this path is actually triggering, but if we are in an inactive_blocked condition this new request will be put on a wait queue and we'll never get to the release_stripe() call after return_io(). It would be interesting to see if this is new XFS behavior in recent kernels. i have evidence pointing to d89d87965dcbe6fe4f96a2a7e8421b3a75f634d1 http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=d89d87965dcbe6fe4f96a2a7e8421b3a75f634d1 which was Neil's change in 2.6.22 for deferring generic_make_request until there's enough stack space for it. with my git tree sync'd to that commit my test cases fail in under 20 minutes uptime (i rebooted and tested 3x). sync'd to the commit previous to it i've got 8h of run-time now without the problem. this isn't definitive of course since it does seem to be timing dependent, but since all failures have occured much earlier than that for me so far i think this indicates this change is either the cause of the problem or exacerbates an existing raid5 problem. given that this problem looks like a very rare problem i saw with 2.6.18 (raid5+xfs there too) i'm thinking Neil's commit may just exacerbate an existing problem... not that i have evidence either way. i've attached a new kernel log with a hang at d89d87965d... and the reduced config file i was using for the bisect. hopefully the hang looks the same as what we were seeing at 2.6.24-rc6. let me know. -dean kern.log.d89d87965d.bz2 Description: Binary data config-2.6.21-b1.bz2 Description: Binary data
Re: [patch] improve stripe_cache_size documentation
On Sun, 30 Dec 2007, Thiemo Nagel wrote: stripe_cache_size (currently raid5 only) As far as I have understood, it applies to raid6, too. good point... and raid4. here's an updated patch. -dean Signed-off-by: dean gaudet [EMAIL PROTECTED] Index: linux/Documentation/md.txt === --- linux.orig/Documentation/md.txt 2007-12-29 13:01:25.0 -0800 +++ linux/Documentation/md.txt 2007-12-30 10:16:58.0 -0800 @@ -435,8 +435,14 @@ These currently include - stripe_cache_size (currently raid5 only) + stripe_cache_size (raid4, raid5 and raid6) number of entries in the stripe cache. This is writable, but there are upper and lower limits (32768, 16). Default is 128. - strip_cache_active (currently raid5 only) + + The stripe cache memory is locked down and not available for other uses. + The total size of the stripe cache is determined by this formula: + +PAGE_SIZE * raid_disks * stripe_cache_size + + strip_cache_active (raid4, raid5 and raid6) number of active entries in the stripe cache - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] improve stripe_cache_size documentation
On Sun, 30 Dec 2007, dean gaudet wrote: On Sun, 30 Dec 2007, Thiemo Nagel wrote: stripe_cache_size (currently raid5 only) As far as I have understood, it applies to raid6, too. good point... and raid4. here's an updated patch. and once again with a typo fix. oops. -dean Signed-off-by: dean gaudet [EMAIL PROTECTED] Index: linux/Documentation/md.txt === --- linux.orig/Documentation/md.txt 2007-12-29 13:01:25.0 -0800 +++ linux/Documentation/md.txt 2007-12-30 14:30:40.0 -0800 @@ -435,8 +435,14 @@ These currently include - stripe_cache_size (currently raid5 only) + stripe_cache_size (raid4, raid5 and raid6) number of entries in the stripe cache. This is writable, but there are upper and lower limits (32768, 16). Default is 128. - strip_cache_active (currently raid5 only) + + The stripe cache memory is locked down and not available for other uses. + The total size of the stripe cache is determined by this formula: + +PAGE_SIZE * raid_disks * stripe_cache_size + + stripe_cache_active (raid4, raid5 and raid6) number of active entries in the stripe cache - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC: permit link(2) to work across --bind mounts ?
On Sat, 29 Dec 2007, [EMAIL PROTECTED] wrote: > On Sat, 29 Dec 2007 12:40:47 PST, dean gaudet said: > > > the main worry i have is some user maliciously hardlinks everything > > under /var/log somewhere else and slowly fills up the file system with > > old rotated logs. > > "Doctor, it hurts when I do this.." "Well, don't do that then". actually it doesn't hurt. i have other mechanisms which would pick this up fairly quickly. -dean -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RFC: permit link(2) to work across --bind mounts ?
On Sun, 30 Dec 2007, David Newall wrote: > dean gaudet wrote: > > > Pffuff. That's what volume managers are for! You do have (at least) two > > > independent spindles in your RAID1 array, which give you less need to > > > worry > > > about head-stack contention. > > > > > > > this system is write intensive and writes go to all spindles, so you're > > assertion is wrong. > > I don't know what you think I was asserting, but you were wrong. Of course > I/O is distributed across both spindles. You would expect no less. THAT is > what I was telling you. are you on crack? it's a raid1. writes go to all spindles. they have to. by definition. reads can be spread around, but writes are mirrored. > > > the main worry i have is some user maliciously hardlinks everything > > under /var/log somewhere else and slowly fills up the file system with > > old rotated logs. the users otherwise have quotas so they can't fill > > things up on their own. i could probably set up XFS quota trees (aka > > "projects") but haven't gone to this effort yet. > > > > See, this is where you show that you don't understand the system. I'll > explain it, just once. /var/home contains home directories. /var/log and > /var/home are on the same filesystem. So /var/log/* can be linked to > /var/home/malicious, and that's just one of your basic misunderstandings. yes you are on crack. i told you i understand this exactly. it's right there in the message sent. > No. Look, you obviously haven't read what I've told you. I mean, it's very > obvious you haven't. I'm wasting my time on you and I'm now out of > generosity. Good luck to you. I think you need it. you're the idiot not actually reading my messages. -dean -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RFC: permit link(2) to work across --bind mounts ?
On Sat, 29 Dec 2007, David Newall wrote: > dean gaudet wrote: > > On Wed, 19 Dec 2007, David Newall wrote: > > > > > Mark Lord wrote: > > > > > > > But.. pity there's no mount flag override for smaller systems, > > > > where bind mounts might be more useful with link(2) actually working. > > > > > > > I don't see it. You always can make hard link on the underlying > > > filesystem. > > > If you need to make it on the bound mount, that is, if you can't locate > > > the > > > underlying filesystem to make the hard link, you can use a symbolic link. > > > > > > > i run into it on a system where /home is a bind mount of /var/home ... i did > > this because: > > > > - i prefer /home to be nosuid,nodev (multi-user system) > > > > Whatever security /home has, /var/home is the one that restricts because users > can still access their files that way. yep. and /var is nosuid,nodev as well. > > - i prefer /home to not be on same fs as / > > - the system has only one raid1 array, and i can't stand having two > > writable filesystems competing on the same set of spindles (i like to > > imagine that one fs competing for the spindles can potentially result > > in better seek patterns) > > ... > > - i didn't want to try to balance disk space between /var and /home > > - i didn't want to use a volume mgr just to handle disk space balance... > > > > Pffuff. That's what volume managers are for! You do have (at least) two > independent spindles in your RAID1 array, which give you less need to worry > about head-stack contention. this system is write intensive and writes go to all spindles, so you're assertion is wrong. a quick look at iostat shows the system has averaged 50/50 reads/writes over 34 days. that means 50% of the IO is going to both spindles. Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await svctm %util sda 1.96 2.24 33.65 33.16 755.50 465.4536.55 0.568.43 5.98 39.96 > You probably want different mount restrictions > on /home than /var, so you really must use separate filesystems. not sure why you think i want different restrictions... i'm running fine with nosuid,nodev for /var. the main worry i have is some user maliciously hardlinks everything under /var/log somewhere else and slowly fills up the file system with old rotated logs. the users otherwise have quotas so they can't fill things up on their own. i could probably set up XFS quota trees (aka "projects") but haven't gone to this effort yet. > LVM is your friend. i disagree. but this is getting into personal taste -- i find volume managers to be an unnecessary layer of complexity. given i need quotas for the users anyhow i don't see why i should both manage my disk space via quotas and via an extra block layer. > > But with regards to bind mounts and hard links: If you want to be able to > hard-link /home/me/log to /var/tmp/my-log, then I see nothing to prevent > hard-linking /var/home/me/log to /var/tmp/my-log. you probably missed the point where i said that i was surprised i couldn't hardlink across the bind mount and actually wanted it to work. -dean -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc6 reproducible raid5 hang
hmm bummer, i'm doing another test (rsync 3.5M inodes from another box) on the same 64k chunk array and had raised the stripe_cache_size to 1024... and got a hang. this time i grabbed stripe_cache_active before bumping the size again -- it was only 905 active. as i recall the bug we were debugging a year+ ago the active was at the size when it would hang. so this is probably something new. anyhow raising it to 2048 got it unstuck, but i'm guessing i'll be able to hit that limit too if i try harder :) btw what units are stripe_cache_size/active in? is the memory consumed equal to (chunk_size * raid_disks * stripe_cache_size) or (chunk_size * raid_disks * stripe_cache_active)? -dean On Thu, 27 Dec 2007, dean gaudet wrote: hmm this seems more serious... i just ran into it with chunksize 64KiB and while just untarring a bunch of linux kernels in parallel... increasing stripe_cache_size did the trick again. -dean On Thu, 27 Dec 2007, dean gaudet wrote: hey neil -- remember that raid5 hang which me and only one or two others ever experienced and which was hard to reproduce? we were debugging it well over a year ago (that box has 400+ day uptime now so at least that long ago :) the workaround was to increase stripe_cache_size... i seem to have a way to reproduce something which looks much the same. setup: - 2.6.24-rc6 - system has 8GiB RAM but no swap - 8x750GB in a raid5 with one spare, chunksize 1024KiB. - mkfs.xfs default options - mount -o noatime - dd if=/dev/zero of=/mnt/foo bs=4k count=2621440 that sequence hangs for me within 10 seconds... and i can unhang / rehang it by toggling between stripe_cache_size 256 and 1024. i detect the hang by watching iostat -kx /dev/sd? 5. i've attached the kernel log where i dumped task and timer state while it was hung... note that you'll see at some point i did an xfs mount with external journal but it happens with internal journal as well. looks like it's using the raid456 module and async api. anyhow let me know if you need more info / have any suggestions. -dean - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Linux RAID Partition Offset 63 cylinders / 30% performance hit?
On Tue, 25 Dec 2007, Bill Davidsen wrote: The issue I'm thinking about is hardware sector size, which on modern drives may be larger than 512b and therefore entail a read-alter-rewrite (RAR) cycle when writing a 512b block. i'm not sure any shipping SATA disks have larger than 512B sectors yet... do you know of any? (or is this thread about SCSI which i don't pay attention to...) on a brand new WDC WD7500AAKS-00RBA0 with this partition layout: 255 heads, 63 sectors/track, 91201 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes so sda1 starts at a non-multiple of 4096 into the disk. i ran some random seek+write experiments using http://arctic.org/~dean/randomio/, here are the results using 512 byte and 4096 byte writes (fsync after each write), 8 threads, on sda1: # ./randomio /dev/sda1 8 1 1 512 10 6 total | read: latency (ms) | write:latency (ms) iops | iops minavgmax sdev | iops minavgmax sdev +---+-- 148.5 |0.0 infnan0.0nan | 148.5 0.2 53.7 89.3 19.5 129.2 |0.0 infnan0.0nan | 129.2 37.2 61.9 96.79.3 131.2 |0.0 infnan0.0nan | 131.2 40.3 61.0 90.49.3 132.0 |0.0 infnan0.0nan | 132.0 39.6 60.6 89.39.1 130.7 |0.0 infnan0.0nan | 130.7 39.8 61.3 98.18.9 131.4 |0.0 infnan0.0nan | 131.4 40.0 60.8 101.09.6 # ./randomio /dev/sda1 8 1 1 4096 10 6 total | read: latency (ms) | write:latency (ms) iops | iops minavgmax sdev | iops minavgmax sdev +---+-- 141.7 |0.0 infnan0.0nan | 141.7 0.3 56.3 99.3 21.1 132.4 |0.0 infnan0.0nan | 132.4 43.3 60.4 91.88.5 131.6 |0.0 infnan0.0nan | 131.6 41.4 60.9 111.09.6 131.8 |0.0 infnan0.0nan | 131.8 41.4 60.7 85.38.6 130.6 |0.0 infnan0.0nan | 130.6 41.7 61.3 95.09.4 131.4 |0.0 infnan0.0nan | 131.4 42.2 60.8 90.58.4 i think the anomalous results in the first 10s samples are perhaps the drive coming out of a standby state. and here are the results aligned using the sda raw device itself: # ./randomio /dev/sda 8 1 1 512 10 6 total | read: latency (ms) | write:latency (ms) iops | iops minavgmax sdev | iops minavgmax sdev +---+-- 147.3 |0.0 infnan0.0nan | 147.3 0.3 54.1 93.7 20.1 132.4 |0.0 infnan0.0nan | 132.4 37.4 60.6 91.89.2 132.5 |0.0 infnan0.0nan | 132.5 37.7 60.3 93.79.3 131.8 |0.0 infnan0.0nan | 131.8 39.4 60.7 92.79.0 133.9 |0.0 infnan0.0nan | 133.9 41.7 59.8 90.78.5 130.2 |0.0 infnan0.0nan | 130.2 40.8 61.5 88.68.9 # ./randomio /dev/sda 8 1 1 4096 10 6 total | read: latency (ms) | write:latency (ms) iops | iops minavgmax sdev | iops minavgmax sdev +---+-- 145.4 |0.0 infnan0.0nan | 145.4 0.3 54.9 94.0 20.1 130.3 |0.0 infnan0.0nan | 130.3 36.0 61.4 92.79.6 130.6 |0.0 infnan0.0nan | 130.6 38.2 61.2 96.79.2 132.1 |0.0 infnan0.0nan | 132.1 39.0 60.5 93.59.2 131.8 |0.0 infnan0.0nan | 131.8 43.1 60.8 93.89.1 129.0 |0.0 infnan0.0nan | 129.0 40.2 62.0 96.48.8 it looks pretty much the same to me... -dean - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.24-rc6 reproducible raid5 hang
On Sat, 29 Dec 2007, Dan Williams wrote: On Dec 29, 2007 9:48 AM, dean gaudet [EMAIL PROTECTED] wrote: hmm bummer, i'm doing another test (rsync 3.5M inodes from another box) on the same 64k chunk array and had raised the stripe_cache_size to 1024... and got a hang. this time i grabbed stripe_cache_active before bumping the size again -- it was only 905 active. as i recall the bug we were debugging a year+ ago the active was at the size when it would hang. so this is probably something new. I believe I am seeing the same issue and am trying to track down whether XFS is doing something unexpected, i.e. I have not been able to reproduce the problem with EXT3. MD tries to increase throughput by letting some stripe work build up in batches. It looks like every time your system has hung it has been in the 'inactive_blocked' state i.e. 3/4 of stripes active. This state should automatically clear... cool, glad you can reproduce it :) i have a bit more data... i'm seeing the same problem on debian's 2.6.22-3-amd64 kernel, so it's not new in 2.6.24. i'm doing some more isolation but just grabbing kernels i have precompiled so far -- a 2.6.19.7 kernel doesn't show the problem, and early indications are a 2.6.21.7 kernel also doesn't have the problem but i'm giving it longer to show its head. i'll try a stock 2.6.22 next depending on how the 2.6.21 test goes, just so we get the debian patches out of the way. i was tempted to blame async api because it's newish :) but according to the dmesg output it doesn't appear the 2.6.22-3-amd64 kernel used async API, and it still hung, so async is probably not to blame. anyhow the test case i'm using is the dma_thrasher script i attached... it takes about an hour to give me confidence there's no problems so this will take a while. -dean - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch] improve stripe_cache_size documentation
Document the amount of memory used by the stripe cache and the fact that it's tied down and unavailable for other purposes (right?). thanks to Dan Williams for the formula. -dean Signed-off-by: dean gaudet [EMAIL PROTECTED] Index: linux/Documentation/md.txt === --- linux.orig/Documentation/md.txt 2007-12-29 13:01:25.0 -0800 +++ linux/Documentation/md.txt 2007-12-29 13:04:17.0 -0800 @@ -438,5 +438,11 @@ stripe_cache_size (currently raid5 only) number of entries in the stripe cache. This is writable, but there are upper and lower limits (32768, 16). Default is 128. + + The stripe cache memory is locked down and not available for other uses. + The total size of the stripe cache is determined by this formula: + +PAGE_SIZE * raid_disks * stripe_cache_size + strip_cache_active (currently raid5 only) number of active entries in the stripe cache - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.24-rc6 reproducible raid5 hang
On Sat, 29 Dec 2007, Justin Piszcz wrote: Curious btw what kind of filesystem size/raid type (5, but defaults I assume, nothing special right? (right-symmetric vs. left-symmetric, etc?)/cache size/chunk size(s) are you using/testing with? mdadm --create --level=5 --chunk=64 -n7 -x1 /dev/md2 /dev/sd[a-h]1 mkfs.xfs -f /dev/md2 otherwise defaults The script you sent out earlier, you are able to reproduce it easily with 31 or so kernel tar decompressions? not sure, the point of the script is to untar more than there is RAM. it happened with a single rsync running though -- 3.5M indoes from a remote box. it also happens with the single 10GB dd write... although i've been using the tar method for testing different kernel revs. -dean - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.24-rc6 reproducible raid5 hang
On Sat, 29 Dec 2007, dean gaudet wrote: On Sat, 29 Dec 2007, Justin Piszcz wrote: Curious btw what kind of filesystem size/raid type (5, but defaults I assume, nothing special right? (right-symmetric vs. left-symmetric, etc?)/cache size/chunk size(s) are you using/testing with? mdadm --create --level=5 --chunk=64 -n7 -x1 /dev/md2 /dev/sd[a-h]1 mkfs.xfs -f /dev/md2 otherwise defaults hmm i missed a few things, here's exactly how i created the array: mdadm --create --level=5 --chunk=64 -n7 -x1 --assume-clean /dev/md2 /dev/sd[a-h]1 it's reassembled automagically each reboot, but i do this each reboot: mkfs.xfs -f /dev/md2 mount -o noatime /dev/md2 /mnt/new ./dma_thrasher linux.tar.gz /mnt/new the --assume-clean and noatime probably make no difference though... on the bisection front it looks like it's new behaviour between 2.6.21.7 and 2.6.22.15 (stock kernels now, not debian). i've got to step out for a while, but i'll go at it again later, probably with git bisect unless someone has some cherry picked changes to suggest. -dean - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC: permit link(2) to work across --bind mounts ?
On Sat, 29 Dec 2007, David Newall wrote: dean gaudet wrote: On Wed, 19 Dec 2007, David Newall wrote: Mark Lord wrote: But.. pity there's no mount flag override for smaller systems, where bind mounts might be more useful with link(2) actually working. I don't see it. You always can make hard link on the underlying filesystem. If you need to make it on the bound mount, that is, if you can't locate the underlying filesystem to make the hard link, you can use a symbolic link. i run into it on a system where /home is a bind mount of /var/home ... i did this because: - i prefer /home to be nosuid,nodev (multi-user system) Whatever security /home has, /var/home is the one that restricts because users can still access their files that way. yep. and /var is nosuid,nodev as well. - i prefer /home to not be on same fs as / - the system has only one raid1 array, and i can't stand having two writable filesystems competing on the same set of spindles (i like to imagine that one fs competing for the spindles can potentially result in better seek patterns) ... - i didn't want to try to balance disk space between /var and /home - i didn't want to use a volume mgr just to handle disk space balance... Pffuff. That's what volume managers are for! You do have (at least) two independent spindles in your RAID1 array, which give you less need to worry about head-stack contention. this system is write intensive and writes go to all spindles, so you're assertion is wrong. a quick look at iostat shows the system has averaged 50/50 reads/writes over 34 days. that means 50% of the IO is going to both spindles. Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await svctm %util sda 1.96 2.24 33.65 33.16 755.50 465.4536.55 0.568.43 5.98 39.96 You probably want different mount restrictions on /home than /var, so you really must use separate filesystems. not sure why you think i want different restrictions... i'm running fine with nosuid,nodev for /var. the main worry i have is some user maliciously hardlinks everything under /var/log somewhere else and slowly fills up the file system with old rotated logs. the users otherwise have quotas so they can't fill things up on their own. i could probably set up XFS quota trees (aka projects) but haven't gone to this effort yet. LVM is your friend. i disagree. but this is getting into personal taste -- i find volume managers to be an unnecessary layer of complexity. given i need quotas for the users anyhow i don't see why i should both manage my disk space via quotas and via an extra block layer. But with regards to bind mounts and hard links: If you want to be able to hard-link /home/me/log to /var/tmp/my-log, then I see nothing to prevent hard-linking /var/home/me/log to /var/tmp/my-log. you probably missed the point where i said that i was surprised i couldn't hardlink across the bind mount and actually wanted it to work. -dean -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RFC: permit link(2) to work across --bind mounts ?
On Sun, 30 Dec 2007, David Newall wrote: dean gaudet wrote: Pffuff. That's what volume managers are for! You do have (at least) two independent spindles in your RAID1 array, which give you less need to worry about head-stack contention. this system is write intensive and writes go to all spindles, so you're assertion is wrong. I don't know what you think I was asserting, but you were wrong. Of course I/O is distributed across both spindles. You would expect no less. THAT is what I was telling you. are you on crack? it's a raid1. writes go to all spindles. they have to. by definition. reads can be spread around, but writes are mirrored. the main worry i have is some user maliciously hardlinks everything under /var/log somewhere else and slowly fills up the file system with old rotated logs. the users otherwise have quotas so they can't fill things up on their own. i could probably set up XFS quota trees (aka projects) but haven't gone to this effort yet. See, this is where you show that you don't understand the system. I'll explain it, just once. /var/home contains home directories. /var/log and /var/home are on the same filesystem. So /var/log/* can be linked to /var/home/malicious, and that's just one of your basic misunderstandings. yes you are on crack. i told you i understand this exactly. it's right there in the message sent. No. Look, you obviously haven't read what I've told you. I mean, it's very obvious you haven't. I'm wasting my time on you and I'm now out of generosity. Good luck to you. I think you need it. you're the idiot not actually reading my messages. -dean -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RFC: permit link(2) to work across --bind mounts ?
On Sat, 29 Dec 2007, [EMAIL PROTECTED] wrote: On Sat, 29 Dec 2007 12:40:47 PST, dean gaudet said: the main worry i have is some user maliciously hardlinks everything under /var/log somewhere else and slowly fills up the file system with old rotated logs. Doctor, it hurts when I do this.. Well, don't do that then. actually it doesn't hurt. i have other mechanisms which would pick this up fairly quickly. -dean -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RFC: permit link(2) to work across --bind mounts ?
On Sat, 29 Dec 2007, Jan Engelhardt wrote: > > On Dec 28 2007 18:53, dean gaudet wrote: > >p.s. in retrospect i probably could have arranged it more like this: > > > > mount /dev/md1 $tmpmntpoint > > mount --bind $tmpmntpoint/var /var > > mount --bind $tmpmntpoint/home /home > > umount $tmpmntpoint > > > >except i can't easily specify that in fstab... and neither of the bind > >mounts would show up in df(1). seems like it wouldn't be hard to support > >this type of subtree mount though. mount(8) could support a single > >subtree mount using this technique but the second subtree mount attempt > >would fail because you can't temporarily remount the device because the > >mount point is gone. > > Why is it gone? > > mount /dev/md1 /tmpmnt > mount --bind /tmpmnt/var /var > mount --bind /tmpmnt/home /home > > Is perfectly fine, and /tmpmnt is still alive and mounted. Additionally, > you can > > umount /tmpmnt > > now, which leaves only /var and /home. i was trying to come up with a userland-only change in mount(8) which would behave like so: # mount --subtree var /dev/md1 /var internally mount does: - mount /dev/md1 /tmpmnt - mount --bind /tmpmnt/var /var - umount /tmpmnt # mount --subtree home /dev/md1 /home internally mount does: - mount /dev/md1 /tmpmnt - mount --bind /tmpmnt/home /home - umount /tmpmnt but that second mount would fail because /dev/md1 is already mounted (but the mount point is gone)... it certainly works if i issue the commands individually as i described -- but a change within mount(8) would have the benefit of working with /etc/fstab too. -dean -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RFC: permit link(2) to work across --bind mounts ?
On Wed, 19 Dec 2007, David Newall wrote: > Mark Lord wrote: > > But.. pity there's no mount flag override for smaller systems, > > where bind mounts might be more useful with link(2) actually working. > > I don't see it. You always can make hard link on the underlying filesystem. > If you need to make it on the bound mount, that is, if you can't locate the > underlying filesystem to make the hard link, you can use a symbolic link. i run into it on a system where /home is a bind mount of /var/home ... i did this because: - i prefer /home to be nosuid,nodev (multi-user system) - i prefer /home to not be on same fs as / - the system has only one raid1 array, and i can't stand having two writable filesystems competing on the same set of spindles (i like to imagine that one fs competing for the spindles can potentially result in better seek patterns) - i didn't want to do /var -> /home/var or vice versa ... because i don't like seeing "/var/home/dean" when i'm in my home dir and such. - i didn't want to try to balance disk space between /var and /home - i didn't want to use a volume mgr just to handle disk space balance... so i gave a bind mount a try. i was surprised to see that mv(1) between /var and /home causes the file to be copied due to the link(1) failing... it does seem like something which should be configurable per mount point... maybe that can be done with the patches i've seen going around supporting per-bind mount read-only/etc options? -dean p.s. in retrospect i probably could have arranged it more like this: mount /dev/md1 $tmpmntpoint mount --bind $tmpmntpoint/var /var mount --bind $tmpmntpoint/home /home umount $tmpmntpoint except i can't easily specify that in fstab... and neither of the bind mounts would show up in df(1). seems like it wouldn't be hard to support this type of subtree mount though. mount(8) could support a single subtree mount using this technique but the second subtree mount attempt would fail because you can't temporarily remount the device because the mount point is gone. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RFC: permit link(2) to work across --bind mounts ?
On Wed, 19 Dec 2007, David Newall wrote: Mark Lord wrote: But.. pity there's no mount flag override for smaller systems, where bind mounts might be more useful with link(2) actually working. I don't see it. You always can make hard link on the underlying filesystem. If you need to make it on the bound mount, that is, if you can't locate the underlying filesystem to make the hard link, you can use a symbolic link. i run into it on a system where /home is a bind mount of /var/home ... i did this because: - i prefer /home to be nosuid,nodev (multi-user system) - i prefer /home to not be on same fs as / - the system has only one raid1 array, and i can't stand having two writable filesystems competing on the same set of spindles (i like to imagine that one fs competing for the spindles can potentially result in better seek patterns) - i didn't want to do /var - /home/var or vice versa ... because i don't like seeing /var/home/dean when i'm in my home dir and such. - i didn't want to try to balance disk space between /var and /home - i didn't want to use a volume mgr just to handle disk space balance... so i gave a bind mount a try. i was surprised to see that mv(1) between /var and /home causes the file to be copied due to the link(1) failing... it does seem like something which should be configurable per mount point... maybe that can be done with the patches i've seen going around supporting per-bind mount read-only/etc options? -dean p.s. in retrospect i probably could have arranged it more like this: mount /dev/md1 $tmpmntpoint mount --bind $tmpmntpoint/var /var mount --bind $tmpmntpoint/home /home umount $tmpmntpoint except i can't easily specify that in fstab... and neither of the bind mounts would show up in df(1). seems like it wouldn't be hard to support this type of subtree mount though. mount(8) could support a single subtree mount using this technique but the second subtree mount attempt would fail because you can't temporarily remount the device because the mount point is gone. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RFC: permit link(2) to work across --bind mounts ?
On Sat, 29 Dec 2007, Jan Engelhardt wrote: On Dec 28 2007 18:53, dean gaudet wrote: p.s. in retrospect i probably could have arranged it more like this: mount /dev/md1 $tmpmntpoint mount --bind $tmpmntpoint/var /var mount --bind $tmpmntpoint/home /home umount $tmpmntpoint except i can't easily specify that in fstab... and neither of the bind mounts would show up in df(1). seems like it wouldn't be hard to support this type of subtree mount though. mount(8) could support a single subtree mount using this technique but the second subtree mount attempt would fail because you can't temporarily remount the device because the mount point is gone. Why is it gone? mount /dev/md1 /tmpmnt mount --bind /tmpmnt/var /var mount --bind /tmpmnt/home /home Is perfectly fine, and /tmpmnt is still alive and mounted. Additionally, you can umount /tmpmnt now, which leaves only /var and /home. i was trying to come up with a userland-only change in mount(8) which would behave like so: # mount --subtree var /dev/md1 /var internally mount does: - mount /dev/md1 /tmpmnt - mount --bind /tmpmnt/var /var - umount /tmpmnt # mount --subtree home /dev/md1 /home internally mount does: - mount /dev/md1 /tmpmnt - mount --bind /tmpmnt/home /home - umount /tmpmnt but that second mount would fail because /dev/md1 is already mounted (but the mount point is gone)... it certainly works if i issue the commands individually as i described -- but a change within mount(8) would have the benefit of working with /etc/fstab too. -dean -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc6 reproducible raid5 hang
hmm this seems more serious... i just ran into it with chunksize 64KiB and while just untarring a bunch of linux kernels in parallel... increasing stripe_cache_size did the trick again. -dean On Thu, 27 Dec 2007, dean gaudet wrote: hey neil -- remember that raid5 hang which me and only one or two others ever experienced and which was hard to reproduce? we were debugging it well over a year ago (that box has 400+ day uptime now so at least that long ago :) the workaround was to increase stripe_cache_size... i seem to have a way to reproduce something which looks much the same. setup: - 2.6.24-rc6 - system has 8GiB RAM but no swap - 8x750GB in a raid5 with one spare, chunksize 1024KiB. - mkfs.xfs default options - mount -o noatime - dd if=/dev/zero of=/mnt/foo bs=4k count=2621440 that sequence hangs for me within 10 seconds... and i can unhang / rehang it by toggling between stripe_cache_size 256 and 1024. i detect the hang by watching iostat -kx /dev/sd? 5. i've attached the kernel log where i dumped task and timer state while it was hung... note that you'll see at some point i did an xfs mount with external journal but it happens with internal journal as well. looks like it's using the raid456 module and async api. anyhow let me know if you need more info / have any suggestions. -dean - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.24-rc6 reproducible raid5 hang
On Thu, 27 Dec 2007, Justin Piszcz wrote: With that high of a stripe size the stripe_cache_size needs to be greater than the default to handle it. i'd argue that any deadlock is a bug... regardless i'm still seeing deadlocks with the default chunk_size of 64k and stripe_cache_size of 256... in this case it's with a workload which is untarring 34 copies of the linux kernel at the same time. it's a variant of doug ledford's memtest, and i've attached it. -dean#!/usr/bin/perl # Copyright (c) 2007 dean gaudet [EMAIL PROTECTED] # # Permission is hereby granted, free of charge, to any person obtaining a # copy of this software and associated documentation files (the Software), # to deal in the Software without restriction, including without limitation # the rights to use, copy, modify, merge, publish, distribute, sublicense, # and/or sell copies of the Software, and to permit persons to whom the # Software is furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included # in all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL # THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR # OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, # ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR # OTHER DEALINGS IN THE SOFTWARE. # this idea shamelessly stolen from doug ledford use warnings; use strict; # ensure stdout is not buffered select(STDOUT); $| = 1; my $usage = usage: $0 linux.tar.gz /path1 [/path2 ...]\n; defined(my $tarball = shift) or die $usage; -f $tarball or die $tarball does not exist or is not a file\n; my @paths = @ARGV; $#paths = 0 or die $usage; # determine size of uncompressed tarball open(GZIP, -|) || exec gzip, --quiet, --list, $tarball; my $line = GZIP; my ($tarball_size) = $line =~ m#^\s*\d+\s*(\d+)#; defined($tarball_size) or die unexpected result from gzip --quiet --list $tarball\n; close(GZIP); # determine amount of memory open(MEMINFO, /proc/meminfo) or die unable to open /proc/meminfo for read: $!\n; my $total_mem; while (MEMINFO) { if (/^MemTotal:\s*(\d+)\s*kB/) { $total_mem = $1; last; } } defined($total_mem) or die did not find MemTotal line in /proc/meminfo\n; close(MEMINFO); $total_mem *= 1024; print total memory: $total_mem\n; print uncompressed tarball: $tarball_size\n; my $nr_simultaneous = int(1.2 * $total_mem / $tarball_size); print nr simultaneous processes: $nr_simultaneous\n; sub system_or_die { my @args = @_; system(@args); if ($? == 1) { my $msg = sprintf(%s failed to exec %s: $!\n, scalar(localtime), $args[0]); } elsif ($? 127) { my $msg = sprintf(%s %s died with signal %d, %s coredump\n, scalar(localtime), $args[0], ($? 127), ($? 128) ? with : without); die $msg; } elsif (($? 8) != 0) { my $msg = sprintf(%s %s exited with non-zero exit code %d\n, scalar(localtime), $args[0], $? 8); die $msg; } } sub untar($) { mkdir($_[0]) or die localtime(). unable to mkdir($_[0]): $!\n; system_or_die(tar, -xzf, $tarball, -C, $_[0]); } print localtime(). untarring golden copy\n; my $golden = $paths[0]./dma_tmp.$$.gold; untar($golden); my $pass_no = 0; while (1) { print localtime(). pass $pass_no: extracting\n; my @outputs; foreach my $n (1..$nr_simultaneous) { # treat paths in a round-robin manner my $dir = shift(@paths); push(@paths, $dir); $dir .= /dma_tmp.$$.$n; push(@outputs, $dir); my $pid = fork; defined($pid) or die localtime(). unable to fork: $!\n; if ($pid == 0) { untar($dir); exit(0); } } # wait for the children while (wait != -1) {} print localtime(). pass $pass_no: diffing\n; foreach my $dir (@outputs) { my $pid = fork; defined($pid) or die localtime(). unable to fork: $!\n; if ($pid == 0) { system_or_die(diff, -U, 3, -rN, $golden, $dir); system_or_die(rm, -fr, $dir); exit(0); } } # wait for the children while (wait != -1) {} ++$pass_no; }
Re: external bitmaps.. and more
On Thu, 6 Dec 2007, Michael Tokarev wrote: I come across a situation where external MD bitmaps aren't usable on any standard linux distribution unless special (non-trivial) actions are taken. First is a small buglet in mdadm, or two. It's not possible to specify --bitmap= in assemble command line - the option seems to be ignored. But it's honored when specified in config file. i think neil fixed this at some point -- i ran into it / reported essentially the same problems here a while ago. The thing is that when a external bitmap is being used for an array, and that bitmap resides on another filesystem, all common distributions fails to start/mount and to shutdown/umount arrays/filesystems properly, because all starts/stops is done in one script, and all mounts/umounts in another, but for bitmaps to work the two should be intermixed with each other. so i've got a debian unstable box which has uptime 402 days (to give you an idea how long ago i last tested the reboot sequence). it has raid1 root and raid5 /home. /home has an external bitmap on the root partition. i have /etc/default/mdadm set with INITRDSTART to start only the root raid1 during initrd... this manages to work out later when the external bitmap is required. but it is fragile... and i think it's only possible to get things to work with an initrd and the external bitmap on the root fs or by having custom initrd and/or rc.d scripts. -dean - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Documentation about unaligned memory access
On Fri, 23 Nov 2007, Arne Georg Gleditsch wrote: > dean gaudet <[EMAIL PROTECTED]> writes: > > on AMD x86 pre-family 10h the boundary is 8 bytes, and on fam 10h it's 16 > > bytes. the penalty is a mere 3 cycles if an access crosses the specified > > boundary. > > Worth noting though, is that atomic accesses that cross cache lines on > an Opteron system is going to lock down the Hypertransport fabric for > you during the operation -- which is obviously not so nice. ooh awesome, i hadn't measured that before. on a 2 node sockF / revF with a random pointer chase running on cpu 0 / node 0 i see the avg load-to-load cache miss latency jump from 77ns to 109ns when i add an unaligned lock-intensive workload on one core of node 1. the worst i can get the pointer chase latency to is 273ns when i add two threads on node 1 fighting over an unaligned lock. on a 4 node (square) the worst case i can get seems to be an increase from 98ns with no antagonist to 385ns with 6 antagonists fighting over an unaligned lock on the other 3 nodes. cool. -dean - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Documentation about unaligned memory access
On Fri, 23 Nov 2007, Arne Georg Gleditsch wrote: dean gaudet [EMAIL PROTECTED] writes: on AMD x86 pre-family 10h the boundary is 8 bytes, and on fam 10h it's 16 bytes. the penalty is a mere 3 cycles if an access crosses the specified boundary. Worth noting though, is that atomic accesses that cross cache lines on an Opteron system is going to lock down the Hypertransport fabric for you during the operation -- which is obviously not so nice. ooh awesome, i hadn't measured that before. on a 2 node sockF / revF with a random pointer chase running on cpu 0 / node 0 i see the avg load-to-load cache miss latency jump from 77ns to 109ns when i add an unaligned lock-intensive workload on one core of node 1. the worst i can get the pointer chase latency to is 273ns when i add two threads on node 1 fighting over an unaligned lock. on a 4 node (square) the worst case i can get seems to be an increase from 98ns with no antagonist to 385ns with 6 antagonists fighting over an unaligned lock on the other 3 nodes. cool. -dean - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Documentation about unaligned memory access
On Fri, 23 Nov 2007, Alan Cox wrote: > Its usually faster if you don't misalign on x86 as well. i'm not sure if i agree with "usually"... but i know you (alan) are probably aware of the exact requirements of the hw. for everyone else: on intel x86 processors an access is unaligned only if it crosses a cacheline boundary (64 bytes). otherwise it's aligned. the penalty for crossing a cacheline boundary varies from ~12 cycles (core2) to many dozens of cycles (p4). on AMD x86 pre-family 10h the boundary is 8 bytes, and on fam 10h it's 16 bytes. the penalty is a mere 3 cycles if an access crosses the specified boundary. if you're making <= 4 byte accesses i recommend not worrying about alignment on x86. it's pretty hard to beat the hardware support. i curse all the RISC and embedded processor designers who pretend unaligned accesses are something evil and to be avoided. in case you're worried, MIPS patent 4,814,976 expired in december 2006 :) -dean - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Documentation about unaligned memory access
On Fri, 23 Nov 2007, Alan Cox wrote: Its usually faster if you don't misalign on x86 as well. i'm not sure if i agree with usually... but i know you (alan) are probably aware of the exact requirements of the hw. for everyone else: on intel x86 processors an access is unaligned only if it crosses a cacheline boundary (64 bytes). otherwise it's aligned. the penalty for crossing a cacheline boundary varies from ~12 cycles (core2) to many dozens of cycles (p4). on AMD x86 pre-family 10h the boundary is 8 bytes, and on fam 10h it's 16 bytes. the penalty is a mere 3 cycles if an access crosses the specified boundary. if you're making = 4 byte accesses i recommend not worrying about alignment on x86. it's pretty hard to beat the hardware support. i curse all the RISC and embedded processor designers who pretend unaligned accesses are something evil and to be avoided. in case you're worried, MIPS patent 4,814,976 expired in december 2006 :) -dean - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch][v2] x86, ptrace: support for branch trace store(BTS)
On Tue, 20 Nov 2007, dean gaudet wrote: > On Tue, 20 Nov 2007, Metzger, Markus T wrote: > > > +__cpuinit void ptrace_bts_init_intel(struct cpuinfo_x86 *c) > > +{ > > + switch (c->x86) { > > + case 0x6: > > + switch (c->x86_model) { > > +#ifdef __i386__ > > + case 0xD: > > + case 0xE: /* Pentium M */ > > + ptrace_bts_ops = ptrace_bts_ops_pentium_m; > > + break; > > +#endif /* _i386_ */ > > + case 0xF: /* Core2 */ > > + ptrace_bts_ops = ptrace_bts_ops_core2; > > + break; > > + default: > > + /* sorry, don't know about them */ > > + break; > > + } > > + break; > > + case 0xF: > > + switch (c->x86_model) { > > +#ifdef __i386__ > > + case 0x0: > > + case 0x1: > > + case 0x2: > > + case 0x3: /* Netburst */ > > + ptrace_bts_ops = ptrace_bts_ops_netburst; > > + break; > > +#endif /* _i386_ */ > > + default: > > + /* sorry, don't know about them */ > > + break; > > + } > > + break; > > is this right? i thought intel family 15 models 3 and 4 supported amd64 > mode... actually... why aren't you using cpuid level 1 edx bit 21 to enable/disable this feature? isn't that the bit defined to indicate whether this feature is supported or not? and it seems like this patch and perfmon2 are going to have to live with each other... since they both require the use of the DS save area... -dean - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch][v2] x86, ptrace: support for branch trace store(BTS)
On Tue, 20 Nov 2007, Metzger, Markus T wrote: > +__cpuinit void ptrace_bts_init_intel(struct cpuinfo_x86 *c) > +{ > + switch (c->x86) { > + case 0x6: > + switch (c->x86_model) { > +#ifdef __i386__ > + case 0xD: > + case 0xE: /* Pentium M */ > + ptrace_bts_ops = ptrace_bts_ops_pentium_m; > + break; > +#endif /* _i386_ */ > + case 0xF: /* Core2 */ > + ptrace_bts_ops = ptrace_bts_ops_core2; > + break; > + default: > + /* sorry, don't know about them */ > + break; > + } > + break; > + case 0xF: > + switch (c->x86_model) { > +#ifdef __i386__ > + case 0x0: > + case 0x1: > + case 0x2: > + case 0x3: /* Netburst */ > + ptrace_bts_ops = ptrace_bts_ops_netburst; > + break; > +#endif /* _i386_ */ > + default: > + /* sorry, don't know about them */ > + break; > + } > + break; is this right? i thought intel family 15 models 3 and 4 supported amd64 mode... -dean - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCHv3 0/4] sys_indirect system call
On Mon, 19 Nov 2007, Ingo Molnar wrote: > > * Eric Dumazet <[EMAIL PROTECTED]> wrote: > > > I do see a problem, because some readers will take your example as a > > reference, as it will probably sit in a page that > > google^Wsearch_engines will bring at the top of search results for > > next ten years or so. > > > > (I bet for "sys_indirect syscall" -> http://lwn.net/Articles/258708/ ) > > > > Next time you post it, please warn users that it will break in some > > years, or state clearly this should only be used internally by glibc. > > dont be silly, next time Ulrich should also warn everyone that running > attachments and applying patches from untrusted sources is dangerous? > > any code that includes: > > fd = syscall (__NR_indirect, , , sizeof (i)); > > is by definition broken and unportable in every sense of the word. Apps > will use the proper glibc interfaces (if it's exposed). as an application writer how do i access accept(2) with FD_CLOEXEC functionality? will glibc expose an accept2() with a flags param? if so... why don't we just have an accept2() syscall? -dean - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCHv3 0/4] sys_indirect system call
On Mon, 19 Nov 2007, Ingo Molnar wrote: * Eric Dumazet [EMAIL PROTECTED] wrote: I do see a problem, because some readers will take your example as a reference, as it will probably sit in a page that google^Wsearch_engines will bring at the top of search results for next ten years or so. (I bet for sys_indirect syscall - http://lwn.net/Articles/258708/ ) Next time you post it, please warn users that it will break in some years, or state clearly this should only be used internally by glibc. dont be silly, next time Ulrich should also warn everyone that running attachments and applying patches from untrusted sources is dangerous? any code that includes: fd = syscall (__NR_indirect, r, i, sizeof (i)); is by definition broken and unportable in every sense of the word. Apps will use the proper glibc interfaces (if it's exposed). as an application writer how do i access accept(2) with FD_CLOEXEC functionality? will glibc expose an accept2() with a flags param? if so... why don't we just have an accept2() syscall? -dean - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch][v2] x86, ptrace: support for branch trace store(BTS)
On Tue, 20 Nov 2007, dean gaudet wrote: On Tue, 20 Nov 2007, Metzger, Markus T wrote: +__cpuinit void ptrace_bts_init_intel(struct cpuinfo_x86 *c) +{ + switch (c-x86) { + case 0x6: + switch (c-x86_model) { +#ifdef __i386__ + case 0xD: + case 0xE: /* Pentium M */ + ptrace_bts_ops = ptrace_bts_ops_pentium_m; + break; +#endif /* _i386_ */ + case 0xF: /* Core2 */ + ptrace_bts_ops = ptrace_bts_ops_core2; + break; + default: + /* sorry, don't know about them */ + break; + } + break; + case 0xF: + switch (c-x86_model) { +#ifdef __i386__ + case 0x0: + case 0x1: + case 0x2: + case 0x3: /* Netburst */ + ptrace_bts_ops = ptrace_bts_ops_netburst; + break; +#endif /* _i386_ */ + default: + /* sorry, don't know about them */ + break; + } + break; is this right? i thought intel family 15 models 3 and 4 supported amd64 mode... actually... why aren't you using cpuid level 1 edx bit 21 to enable/disable this feature? isn't that the bit defined to indicate whether this feature is supported or not? and it seems like this patch and perfmon2 are going to have to live with each other... since they both require the use of the DS save area... -dean - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch][v2] x86, ptrace: support for branch trace store(BTS)
On Tue, 20 Nov 2007, Metzger, Markus T wrote: +__cpuinit void ptrace_bts_init_intel(struct cpuinfo_x86 *c) +{ + switch (c-x86) { + case 0x6: + switch (c-x86_model) { +#ifdef __i386__ + case 0xD: + case 0xE: /* Pentium M */ + ptrace_bts_ops = ptrace_bts_ops_pentium_m; + break; +#endif /* _i386_ */ + case 0xF: /* Core2 */ + ptrace_bts_ops = ptrace_bts_ops_core2; + break; + default: + /* sorry, don't know about them */ + break; + } + break; + case 0xF: + switch (c-x86_model) { +#ifdef __i386__ + case 0x0: + case 0x1: + case 0x2: + case 0x3: /* Netburst */ + ptrace_bts_ops = ptrace_bts_ops_netburst; + break; +#endif /* _i386_ */ + default: + /* sorry, don't know about them */ + break; + } + break; is this right? i thought intel family 15 models 3 and 4 supported amd64 mode... -dean - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCHv2 4/4] first use of sys_indirect system call
On Fri, 16 Nov 2007, Ulrich Drepper wrote: > dean gaudet wrote: > > honestly i think there should be a per-task flag which indicates whether > > fds are by default F_CLOEXEC or not. my reason: third party libraries. > > Only somebody who thinks exclusively about applications as opposed to > runtimes/libraries can say something like that. Library writers don't > have the luxury of being able to modify any global state. This has all > been discussed here before. only someone who thinks about writing libraries can say something like that. you've solved the problem for yourself, and for well written applications, but not for the other 99.% of libraries out there. i'm not suggesting the library set the global flag. i'm suggesting that me as an app writer will do so. it seems like both methods are useful. -dean - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCHv2 4/4] first use of sys_indirect system call
you know... i understand the need for FD_CLOEXEC -- in fact i tried petitioning for CLOEXEC options to all the fd creating syscalls something like 7 years ago when i was banging my head against the wall trying to figure out how to thread apache... but even still i'm not convinced that extending every system call which creates an fd is the way to do this. honestly i think there should be a per-task flag which indicates whether fds are by default F_CLOEXEC or not. my reason: third party libraries. i can control all my own code in a threaded program, but i can't control all the code which is linked in. fds are going to leak. if i set a per task flag then the only thing which would break are third party libraries which use fork/exec and aren't aware they may need to unset F_CLOEXEC. personally i'd rather break that than leak fds to another program. but hey i'm happy to see this sort of thing is finally being fixed, thanks. -dean - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perfmon2 merge news
On Fri, 16 Nov 2007, Andi Kleen wrote: > I didn't see a clear list. - cross platform extensible API for configuring perf counters - support for multiplexed counters - support for virtualized 64-bit counters - support for PC and call graph sampling at specific intervals - support for reading counters not necessarily with sampling - taskswitch support for counters - API available from userland - ability to self-monitor: need select/poll/etc interface - support for PEBS, IBS and whatever other new perf monitoring infrastructure the vendors through at us in the future - low overhead: must minimize the "probe effect" of monitoring - low noise in measurements: cannot achieve this in userland permon2 has all of this and more i've probably neglected... -dean - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCHv2 4/4] first use of sys_indirect system call
On Fri, 16 Nov 2007, Ulrich Drepper wrote: dean gaudet wrote: honestly i think there should be a per-task flag which indicates whether fds are by default F_CLOEXEC or not. my reason: third party libraries. Only somebody who thinks exclusively about applications as opposed to runtimes/libraries can say something like that. Library writers don't have the luxury of being able to modify any global state. This has all been discussed here before. only someone who thinks about writing libraries can say something like that. you've solved the problem for yourself, and for well written applications, but not for the other 99.% of libraries out there. i'm not suggesting the library set the global flag. i'm suggesting that me as an app writer will do so. it seems like both methods are useful. -dean - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perfmon2 merge news
On Fri, 16 Nov 2007, Andi Kleen wrote: I didn't see a clear list. - cross platform extensible API for configuring perf counters - support for multiplexed counters - support for virtualized 64-bit counters - support for PC and call graph sampling at specific intervals - support for reading counters not necessarily with sampling - taskswitch support for counters - API available from userland - ability to self-monitor: need select/poll/etc interface - support for PEBS, IBS and whatever other new perf monitoring infrastructure the vendors through at us in the future - low overhead: must minimize the probe effect of monitoring - low noise in measurements: cannot achieve this in userland permon2 has all of this and more i've probably neglected... -dean - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCHv2 4/4] first use of sys_indirect system call
you know... i understand the need for FD_CLOEXEC -- in fact i tried petitioning for CLOEXEC options to all the fd creating syscalls something like 7 years ago when i was banging my head against the wall trying to figure out how to thread apache... but even still i'm not convinced that extending every system call which creates an fd is the way to do this. honestly i think there should be a per-task flag which indicates whether fds are by default F_CLOEXEC or not. my reason: third party libraries. i can control all my own code in a threaded program, but i can't control all the code which is linked in. fds are going to leak. if i set a per task flag then the only thing which would break are third party libraries which use fork/exec and aren't aware they may need to unset F_CLOEXEC. personally i'd rather break that than leak fds to another program. but hey i'm happy to see this sort of thing is finally being fixed, thanks. -dean - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [perfmon] Re: [perfmon2] perfmon2 merge news
On Thu, 15 Nov 2007, Paul Mackerras wrote: > dean gaudet writes: > > > actually multiplexing is the main feature i am in need of. there are an > > insufficient number of counters (even on k8 with 4 counters) to do > > complete stall accounting or to get a general overview of L1d/L1i/L2 cache > > hit rates, average miss latency, time spent in various stalls, and the > > memory system utilization (or HT bus utilization). this runs out to > > something like 30 events which are interesting... and re-running a > > benchmark over and over just to get around the lack of multiplexing is a > > royal pain in the ass. > > So by "multiplexing" do you mean the ability to have multiple event > sets associated with a context and have the kernel switch between them > automatically? yep. -dean - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [perfmon] Re: [perfmon2] perfmon2 merge news
On Wed, 14 Nov 2007, Andi Kleen wrote: > Later a syscall might be needed with event multiplexing, but that seems > more like a far away non essential feature. actually multiplexing is the main feature i am in need of. there are an insufficient number of counters (even on k8 with 4 counters) to do complete stall accounting or to get a general overview of L1d/L1i/L2 cache hit rates, average miss latency, time spent in various stalls, and the memory system utilization (or HT bus utilization). this runs out to something like 30 events which are interesting... and re-running a benchmark over and over just to get around the lack of multiplexing is a royal pain in the ass. it's not a "far away non-essential feature" to me. it's something i would use daily if i had all the pieces together now (and i'm constrained because i cannot add an out-of-tree patch which adds unofficial syscalls to the kernel i use). -dean - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [perfmon] Re: [perfmon2] perfmon2 merge news
On Wed, 14 Nov 2007, Andi Kleen wrote: Later a syscall might be needed with event multiplexing, but that seems more like a far away non essential feature. actually multiplexing is the main feature i am in need of. there are an insufficient number of counters (even on k8 with 4 counters) to do complete stall accounting or to get a general overview of L1d/L1i/L2 cache hit rates, average miss latency, time spent in various stalls, and the memory system utilization (or HT bus utilization). this runs out to something like 30 events which are interesting... and re-running a benchmark over and over just to get around the lack of multiplexing is a royal pain in the ass. it's not a far away non-essential feature to me. it's something i would use daily if i had all the pieces together now (and i'm constrained because i cannot add an out-of-tree patch which adds unofficial syscalls to the kernel i use). -dean - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [perfmon] Re: [perfmon2] perfmon2 merge news
On Thu, 15 Nov 2007, Paul Mackerras wrote: dean gaudet writes: actually multiplexing is the main feature i am in need of. there are an insufficient number of counters (even on k8 with 4 counters) to do complete stall accounting or to get a general overview of L1d/L1i/L2 cache hit rates, average miss latency, time spent in various stalls, and the memory system utilization (or HT bus utilization). this runs out to something like 30 events which are interesting... and re-running a benchmark over and over just to get around the lack of multiplexing is a royal pain in the ass. So by multiplexing do you mean the ability to have multiple event sets associated with a context and have the kernel switch between them automatically? yep. -dean - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: TCP_DEFER_ACCEPT issues
fwiw i also brought the TCP_DEFER_ACCEPT problems up the end of last year: http://www.mail-archive.com/[EMAIL PROTECTED]/msg28916.html it's possible the final message in that thread is how we should define the behaviour, i haven't tried the TCP_SYNCNT idea though. -dean - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: TCP_DEFER_ACCEPT issues
fwiw i also brought the TCP_DEFER_ACCEPT problems up the end of last year: http://www.mail-archive.com/netdev@vger.kernel.org/msg28916.html it's possible the final message in that thread is how we should define the behaviour, i haven't tried the TCP_SYNCNT idea though. -dean - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP_DEFER_ACCEPT issues
fwiw i also brought the TCP_DEFER_ACCEPT problems up the end of last year: http://www.mail-archive.com/[EMAIL PROTECTED]/msg28916.html it's possible the final message in that thread is how we should define the behaviour, i haven't tried the TCP_SYNCNT idea though. -dean - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Interaction between Xen and XFS: stray RW mappings
On Sun, 21 Oct 2007, Jeremy Fitzhardinge wrote: > dean gaudet wrote: > > On Mon, 15 Oct 2007, Nick Piggin wrote: > > > > > >> Yes, as Dave said, vmap (more specifically: vunmap) is very expensive > >> because it generally has to invalidate TLBs on all CPUs. > >> > > > > why is that? ignoring 32-bit archs we have heaps of address space > > available... couldn't the kernel just burn address space and delay global > > TLB invalidate by some relatively long time (say 1 second)? > > > > Yes, that's precisely the problem. xfs does delay the unmap, leaving > stray mappings, which upsets Xen. sounds like a bug in xen to me :) -dean - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Interaction between Xen and XFS: stray RW mappings
On Mon, 15 Oct 2007, Nick Piggin wrote: > Yes, as Dave said, vmap (more specifically: vunmap) is very expensive > because it generally has to invalidate TLBs on all CPUs. why is that? ignoring 32-bit archs we have heaps of address space available... couldn't the kernel just burn address space and delay global TLB invalidate by some relatively long time (say 1 second)? -dean - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Interaction between Xen and XFS: stray RW mappings
On Mon, 15 Oct 2007, Nick Piggin wrote: Yes, as Dave said, vmap (more specifically: vunmap) is very expensive because it generally has to invalidate TLBs on all CPUs. why is that? ignoring 32-bit archs we have heaps of address space available... couldn't the kernel just burn address space and delay global TLB invalidate by some relatively long time (say 1 second)? -dean - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Interaction between Xen and XFS: stray RW mappings
On Sun, 21 Oct 2007, Jeremy Fitzhardinge wrote: dean gaudet wrote: On Mon, 15 Oct 2007, Nick Piggin wrote: Yes, as Dave said, vmap (more specifically: vunmap) is very expensive because it generally has to invalidate TLBs on all CPUs. why is that? ignoring 32-bit archs we have heaps of address space available... couldn't the kernel just burn address space and delay global TLB invalidate by some relatively long time (say 1 second)? Yes, that's precisely the problem. xfs does delay the unmap, leaving stray mappings, which upsets Xen. sounds like a bug in xen to me :) -dean - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Bug#447493: zsh missing in /etc/shells
Package: zsh Version: 4.3.4-23 upgrading from 4.3.4-19 to 4.3.4-23 caused zsh to be removed from /etc/shells... i have a nightly cron job which looks for users with invalid shells and it picked up this change last night after i did the aforementioned upgrade yesterday. -dean -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#447497: pipe viewer does not wrap long lines
Package: alpine Version: 0.+dfsg-1 this is a pine 4.64 - alpine 0. regression. when a message with long lines is piped through an external command the lines are truncated. i see no options for scrolling the display or avoiding the truncation. note that regular message viewing wraps the lines... by way of an example i've provided a line hopefully long enough to wrap on your display. try comparing this message unpiped and piped through cat. contrast with pine 4.64 -- the piped results are wrapped. a b c d e f g h i j k l m n o p q r s t -dean -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#446988: Acknowledgement (must compile -fno-strict-aliasing)
i rebuilt 0.11.7-1 from source (fetched from snapshot.debian.org) and it seems not to be crashing (crashes were occuring in under a day before and i've had 0.11.7-1 going for 2 days)... so this really is a 0.11.7-1 - 0.11.8-1 regression. i'm going to upgrade my gcc/etc to latest bleeding edge and see if that changes anything. i'm also going to upgrade an i686 box from .7 to .8 to see if this is amd64 specific. -dean -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#446988: Acknowledgement (must compile -fno-strict-aliasing)
damn... -fno-strict-aliasing isn't enough to fix the crash i started seeing in 0.11.8. i built my own package, but saw a crash within 24h. -dean -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#446988: must compile -fno-strict-aliasing
Package: libtorrent10 Version: 0.11.8-1 between 0.11.7 and 0.11.8-1 i started getting regular crashes starting with: ** glibc detected *** /usr/bin/rtorrent: double free or corruption (!prev): 0x0b0952b0 *** this is on amd64. i looked at the known issues page and it requires -fno-strict-aliasing, but that's not set in the debian/rules. http://libtorrent.rakshasa.no/wiki/LibTorrentKnownIssues -dean -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#444364: please stop rewriting all the initrds
On Fri, 28 Sep 2007, martin f krafft wrote: also sprach dean gaudet [EMAIL PROTECTED] [2007.09.28.0230 +0100]: it is EXEPTIONALLY DANGEROUS to replace EVERY SINGLE initrd when mdadm is installed/upgraded. Please STOP SCREAMING and look at the existing bugs before you reply new ones. 2.6.3-1 will not do this anymore. You could help testing: i did search but obviously didn't search for the right things, alas. but no this time i'll have to resort to a recovery CD. There are backups of the initrds. Plus, I tend to make sure your initrd will not get corrupted. unfortunately it happened on a box where i upgrade on unstable frequently but reboot infrequently... so the .bak had already been overwritten. (in the end it was my own configuration problem which resulted in the initrd being unbootable). i think i might make some @reboot cron job which saves away a copy of /boot/initrd-`uname -r` after a successful boot, so i always have something to fall back on. thanks -dean -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#444364: please stop rewriting all the initrds
Package: mdadm Version: 2.6.2-2 it is EXEPTIONALLY DANGEROUS to replace EVERY SINGLE initrd when mdadm is installed/upgraded. you pretty much guarantee that any problem will produce an unbootable system -- especially if root is on md. as has just occured to me. in the past in this situation i could easily go back to an old kernel version which could still boot my system fine *because its initrd hadn't been broken as well*. but no this time i'll have to resort to a recovery CD. -dean -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: Intel Memory Ordering White Paper
On Sat, 8 Sep 2007, Petr Vandrovec wrote: > dean gaudet wrote: > > On Sun, 9 Sep 2007, Nick Piggin wrote: > > > > > I've also heard that string operations do not follow the normal ordering, > > > but > > > that's just with respect to individual loads/stores in the one operation, > > > I > > > hope? And they will still follow ordering rules WRT surrounding loads and > > > stores? > > > > see section 7.2.3 of intel volume 3A... > > > > "Code dependent upon sequential store ordering should not use the string > > operations for the entire data structure to be stored. Data and semaphores > > should be separated. Order dependent code should use a discrete semaphore > > uniquely stored to after any string operations to allow correctly ordered > > data to be seen by all processors." > > > > i think we need sfence after things like copy_page, clear_page, and possibly > > copy_user... at least on intel processors with fast strings option enabled. > > I do not think. I believe that authors are trying to say that > > struct { uint8 lock; uint8 data; } x; > > lea (x.data),%edi > mov $2,%ecx > std > rep movsb > > to set both data and lock does not guarantee that x.lock will be set after > x.data and that you should do > > lea (x.data),%edi > std > movsb > movsb # or mov (%esi),%al; mov %al,(%edi), but movsb looks discrete enough to > me > > instead (and yes, I know that my example is silly). no it's worse than that -- intel fast string stores can become globally visible in any order at all w.r.t. normal loads or stores... so take all those great examples in their recent whitepaper and throw out all the ordering guarantees for addresses on different cachelines if any of the stores are rep string. for example transitive store ordering for locations on multiple cachelines is not guaranteed at all. the kernel could return a zero page and one core could see the zeroes out of order with another core performing some sort of lockless data structure operation. fast strings don't break ordering from the point of view of the core performing the rep string operation, but externally there are no guarantees (it's right there in the docs). -dean - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Intel Memory Ordering White Paper
On Sun, 9 Sep 2007, Nick Piggin wrote: > I've also heard that string operations do not follow the normal ordering, but > that's just with respect to individual loads/stores in the one operation, I > hope? And they will still follow ordering rules WRT surrounding loads and > stores? see section 7.2.3 of intel volume 3A... "Code dependent upon sequential store ordering should not use the string operations for the entire data structure to be stored. Data and semaphores should be separated. Order dependent code should use a discrete semaphore uniquely stored to after any string operations to allow correctly ordered data to be seen by all processors." i think we need sfence after things like copy_page, clear_page, and possibly copy_user... at least on intel processors with fast strings option enabled. -dean - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Intel Memory Ordering White Paper
On Sun, 9 Sep 2007, Nick Piggin wrote: I've also heard that string operations do not follow the normal ordering, but that's just with respect to individual loads/stores in the one operation, I hope? And they will still follow ordering rules WRT surrounding loads and stores? see section 7.2.3 of intel volume 3A... Code dependent upon sequential store ordering should not use the string operations for the entire data structure to be stored. Data and semaphores should be separated. Order dependent code should use a discrete semaphore uniquely stored to after any string operations to allow correctly ordered data to be seen by all processors. i think we need sfence after things like copy_page, clear_page, and possibly copy_user... at least on intel processors with fast strings option enabled. -dean - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Intel Memory Ordering White Paper
On Sat, 8 Sep 2007, Petr Vandrovec wrote: dean gaudet wrote: On Sun, 9 Sep 2007, Nick Piggin wrote: I've also heard that string operations do not follow the normal ordering, but that's just with respect to individual loads/stores in the one operation, I hope? And they will still follow ordering rules WRT surrounding loads and stores? see section 7.2.3 of intel volume 3A... Code dependent upon sequential store ordering should not use the string operations for the entire data structure to be stored. Data and semaphores should be separated. Order dependent code should use a discrete semaphore uniquely stored to after any string operations to allow correctly ordered data to be seen by all processors. i think we need sfence after things like copy_page, clear_page, and possibly copy_user... at least on intel processors with fast strings option enabled. I do not think. I believe that authors are trying to say that struct { uint8 lock; uint8 data; } x; lea (x.data),%edi mov $2,%ecx std rep movsb to set both data and lock does not guarantee that x.lock will be set after x.data and that you should do lea (x.data),%edi std movsb movsb # or mov (%esi),%al; mov %al,(%edi), but movsb looks discrete enough to me instead (and yes, I know that my example is silly). no it's worse than that -- intel fast string stores can become globally visible in any order at all w.r.t. normal loads or stores... so take all those great examples in their recent whitepaper and throw out all the ordering guarantees for addresses on different cachelines if any of the stores are rep string. for example transitive store ordering for locations on multiple cachelines is not guaranteed at all. the kernel could return a zero page and one core could see the zeroes out of order with another core performing some sort of lockless data structure operation. fast strings don't break ordering from the point of view of the core performing the rep string operation, but externally there are no guarantees (it's right there in the docs). -dean - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFT][PATCH v7] sata_mv: convert to new EH
On Fri, 13 Jul 2007, greg wrote: dean gaudet dean at arctic.org writes: if you've got any other workload you'd like me to throw at it, let me know. I've had a few problems with the driver in 2.6.20 (fc6xen x86_64). The machine tended to lock up after a random period of time (from a few minutes upwards), without any messages. Performing a smartctl on all the disks, or leaving smartd running, seemed to speed up the rate at which the crash occurred. What I found was that by moving the sata_mv device onto it's own bus (or a bus with two sata_mv devices), the crashes went away. Are you doing tests with the controller sharing a bus with other devices? Is there an merit to my observation that it might be an issue with devices sharing a PCI-X bus? Cards: Supermicro 5081 (SAT-MV8), Supermicro 6081 (SAT2-MV8), Highpoint 5081 (RocketRaid 1820A v1.1). Motherboards: Tyan S2882, AMD 8131 chipset; IBM x206, Intel 6300ESB. hmm! i don't seem to have replied to this. you know, i've seen this problem. the first time it happened was with a promise ultra tx/100 or tx/133 (on a dual k7 box, two controllers on the same bus certainly)... a 5 minute cronjob logging HD temperatures via smart would occasionally cause one of the disks to just disappear, return errors on every request, and required a reboot to rediscover it. eliminating the cronjob stopped the problem. i switched to 3ware 750x and the problem went away even with the cronjob going. forward a few years and i ran into the same problem with a 3ware 9550sx (only card on the bus) -- and a firmware upgrade to the controller eventually fixed the problem. but yeah, i've been meaning to add a smartctl -a once every 10 seconds to my burn-in process because of these experiences... but haven't built a new server in a while. the particular box i was testing sata_mv on (tyan s2881) has every pci-x slot filled with one thing or another, but i only have one sata_mv device. if i get around to testing again i'll throw smartctl into the mix. -dean - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 5/5] x86: Set PCI config space size to extended for AMD Barcelona
it's so very unfortunate the PCI standard has no feature bit to indicate the presence of ECS. FWIW in my testing on a range of machines spanning 7 or 8 years i could read config space reg 256... and get 0x when the device didn't support ECS, and get valid data when the device did support ECS... granted there may be some system out there which behaves really badly when you do this. perhaps someone could write a userspace program and test that concept on a far wider range of machines. -dean On Mon, 3 Sep 2007, Robert Richter wrote: > This patch sets the config space size for AMD Barcelona PCI devices to > 4096. > > Signed-off-by: Robert Richter <[EMAIL PROTECTED]> > > --- > arch/i386/pci/fixup.c | 14 ++ > 1 file changed, 14 insertions(+) > > Index: linux-2.6/arch/i386/pci/fixup.c > === > --- linux-2.6.orig/arch/i386/pci/fixup.c > +++ linux-2.6/arch/i386/pci/fixup.c > @@ -8,6 +8,7 @@ > #include > #include "pci.h" > > +#define PCI_CFG_SPACE_EXP_SIZE 4096 > > static void __devinit pci_fixup_i450nx(struct pci_dev *d) > { > @@ -444,3 +445,16 @@ static void __devinit pci_siemens_interr > } > DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_SIEMENS, 0x0015, > pci_siemens_interrupt_controller); > + > +/* > + * Extend size of PCI configuration space for AMD CPUs > + */ > +static void __devinit pci_ext_cfg_space_access(struct pci_dev *dev) > +{ > + dev->cfg_size = PCI_CFG_SPACE_EXP_SIZE; > +} > +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_FAM10H_HT, > pci_ext_cfg_space_access); > +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_FAM10H_MAP, > pci_ext_cfg_space_access); > +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_FAM10H_DRAM, > pci_ext_cfg_space_access); > +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_FAM10H_MISC, > pci_ext_cfg_space_access); > +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_FAM10H_LINK, > pci_ext_cfg_space_access); > > -- > AMD Saxony, Dresden, Germany > Operating System Research Center > email: [EMAIL PROTECTED] > > > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/