from:"Christian Kujau"

Re: [PATCH] torture: Correctly fetch CPUs for kvm-build.sh with all native language

2021-04-05 Thread Christian Kujau

On Thu, 1 Apr 2021, Paul E. McKenney wrote:
> +# This script knows only English.
> +LANG=en_US.UTF-8; export LANG

This, too, will only work if en_US.UTF-8 is installed . Check with "locale 
-a" if it is. Also, Perl will complain loudly if the language is not 
installed (try: "LANG=en_US.UTF-9 perl"), a nice way to test if LANG works 
as expected.

So, wouldn't LANG=C be a more conservative fallback here?



Christian.
-- 
BOFH excuse #58:

high pressure system failure

acpi PNP0C14:02: duplicate WMI GUID 05901221-D566-11D1-B2F0-00A0C9062910

2020-11-03 Thread Christian Kujau

Hi,

while looking through boot messages I came across the following on a 
Lenovo T470 laptop with Linux 5.8:

  acpi PNP0C14:02: duplicate WMI GUID 05901221-D566-11D1-B2F0-00A0C9062910 
(first instance was on PNP0C14:01)
  acpi PNP0C14:03: duplicate WMI GUID 05901221-D566-11D1-B2F0-00A0C9062910 
(first instance was on PNP0C14:01)

Searching the interwebs brought me to an old patch proposal:

 > https://lkml.org/lkml/2017/12/8/914
 > Fri, 8 Dec 2017 20:34:21 -0600
 > [PATCH 2/2] platform/x86: wmi: Allow creating WMI devices with the same GUID

The patch was proposed, but never made it into mainline. It's not really a 
big deal, booting continues and all devices appear to work, only these two 
messages get logged during boot. I'm just wondering if this needs to be 
fixed or if it's really just a cosmetic issue.

Full dmesg: https://pastebin.ubuntu.com/p/2pPv3hywPF/

Thanks,
Christian.
-- 
BOFH excuse #451:

astropneumatic oscillations in the water-cooling

Re: [PATCH] CREDITS: remove link http://www.dementia.org/~shadow

2020-07-15 Thread Christian Kujau

On Tue, 14 Jul 2020, Jonathan Corbet wrote:
> >  N: Derrick J. Brashear
> >  E: sha...@dementia.org
> > -W: http://www.dementia.org/~shadow

That particular entry moved to:

 W: http://www.contrib.andrew.cmu.edu/~shadow/

(The https version only supports TLSv1, and Firefox balks)

Otherwise, what Jon said:

> So thanks for addressing these.  That said, I do wonder if this is quite
> the right thing to do.  I'm assuming that the old sites still exist in the
> wayback machine somewhere, and somebody might actually want to find them.
> Pity the poor anthropologist researching the origins of the the
> billion-line, free-software kernels widely used in the 2500's...
> 
> So maybe we should either mark it as "[BROKEN]" or make a direct link into
> the wayback machine instead?  That would enable the suitably motivated to
> go after the content that once existed.

As an innocent bystander, I'd opt for [BROKEN] tags, or Wayback machine 
substitutes, instead of just removing those entries.

My 2 cents,
Christian.
-- 
BOFH excuse #128:

Power Company having EMP problems with their reactor

Re: process '/usr/bin/rsync' started with executable stack

2020-06-23 Thread Christian Kujau

On Tue, 23 Jun 2020, Kees Cook wrote:
> > If you run something with exec stack after the message
> > you shouldn't get it second time.
> 
> If you want to reset this flag, you can do:
>  # echo 1 > /sys/kernel/debug/clear_warn_once

Thanks. Although, I tend to not mount /sys/kernel/{config,debug,tracing} 
and other things, I always thought they are not needed and could maybe 
lower the attack surface if not mounted. Or maybe my tinfoil hat needs 
some adjustment...

Christian.
-- 
BOFH excuse #279:

The static electricity routing is acting up...

Re: [PATCH] Re: filesystem being remounted supports timestamps until 2038

2020-06-23 Thread Christian Kujau

On Sat, 4 Jan 2020, Christian Kujau wrote:
> On Sun, 29 Dec 2019, Linus Torvalds wrote:
> > > When file systems are remounted a couple of times per day (e.g. rw/ro 
> > > for backup
> > > purposes), dmesg gets flooded with these messages. Change pr_warn 
> > > into pr_debug
> > > to make it stop.
> > 
> > How about just doing it once per mount?
> 
> Yes, once per mount would work, and maybe not print a warning on remounts 
> at all.

Is there any chance that this can be revisited perhaps? This is still 
flooding my dmesg just because I have that (curde?) mechanism in place to 
remount the backup device after the hourly backup-run to read-only. Sure, 
I could omit that ("Doc, it hurts when I do that", as Al would comment), 
but that's really the only repeating message that gets triggered because 
of this. 1067 messages in ~60 days of uptime :-|

Does the patch below make any sense, would that work?

Please reconsider,
Christian.

> Commit f8b92ba67c5d ("mount: Add mount warning for impending timestamp 
> expiry") introduced:
> 
>Mounted %s file system at %s supports timestamps until [...]
> 
> in mnt_warn_timestamp_expiry(), but then 0ecee6699064 ("fs/namespace.c: 
> fix use-after-free of mount in mnt_warn_timestamp_expiry") changed this to
> 
>   %s filesystem being %s at %s supports timestamps until [...]
> 
> in order to fix a use-after-free.
> 
> > Of course, if you actually unmount and completely re-mount a
> > filesystem, then that would still warn multiple times, but at that
> > point I think it's reasonable to do.
> 
> Yes, of course. Umount/remount cycles should still issue a warning, but 
> "-o remount" should not, IMHO.
> 
> Thanks,
> Christian.

commit c9a5338b4930cdf99073042de0717db43d7b75be
Author: Christian Kujau 
Date:   Thu Dec 26 17:39:57 2019 -0800

Commit f8b92ba67c5d ("mount: Add mount warning for impending timestamp 
expiry") resp.
0ecee6699064 ("fix use-after-free of mount in mnt_warn_timestamp_expiry()") 
introduced
a pr_warn message and the following gets sent to dmesg on every remount:

 [...] filesystem being remounted at /mnt supports timestamps until 2038 
(0x7fff)

When file systems are remounted a couple of times per day (e.g. rw/ro for 
backup
purposes), dmesg gets flooded with these messages. Change pr_warn into 
pr_debug
to make it stop.

Signed-off-by: Christian Kujau 

diff --git a/fs/namespace.c b/fs/namespace.c
index be601d3a8008..afc6a13e7316 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2478,7 +2478,7 @@ static void mnt_warn_timestamp_expiry(struct path 
*mountpoint, struct vfsmount *

time64_to_tm(sb->s_time_max, 0, );

-   pr_warn("%s filesystem being %s at %s supports timestamps until 
%04ld (0x%llx)\n",
+   pr_debug("%s filesystem being %s at %s supports timestamps 
until %04ld (0x%llx)\n",
sb->s_type->name,
is_mounted(mnt) ? "remounted" : "mounted",
mntpath,

-- 
BOFH excuse #66:

bit bucket overflow

Re: process '/usr/bin/rsync' started with executable stack

2020-06-23 Thread Christian Kujau

On Wed, 24 Jun 2020, Alexey Dobriyan wrote:
> BTW this bug was exactly the one described in the changelog:
> compiling assembly brings executable stack by default:

Great, thanks for the pointer, will wait until this lands in Arch. My 
search engine brought up the lkml discussion though, no the thread[0] on 
rsync-cvs ;-)

Christian.

[0] https://lists.samba.org/archive/rsync-cvs/2020-June/007661.html
-- 
BOFH excuse #211:

Lightning strikes.

Re: process '/usr/bin/rsync' started with executable stack

2020-06-23 Thread Christian Kujau

On Wed, 24 Jun 2020, Alexey Dobriyan wrote:
> > >   process '/usr/bin/rsync' started with executable stack
> > > But I can't reproduce this message,
> 
> This message is once-per-reboot.

Interesting, thanks. Now I know why I cannot reproduce this. I still 
wonder what made rsync trigger this message today. The machine is running 
for some weeks, rsync is run a few times an hour the whole day, regularly 
and automatically, with always the same parameters. But oh, now I see, 
rsync had been upgraded (automatically) over night:

 > [ALPM] upgraded rsync (3.1.3-3 -> 3.2.0-1)

And indeed, the _older_ version had NX enabled:

$ wget 
https://archive.archlinux.org/packages/.all/rsync-3.1.3-3-x86_64.pkg.tar.zst
$ zstd -dc rsync-3.1.3-3-x86_64.pkg.tar.zst | tar -xf - usr/bin/rsync
$ checksec --format=json --extended --file=usr/bin/rsync | jq
{
  "usr/bin/rsync": {
"relro": "full",
"canary": "yes",
"nx": "yes",
"pie": "yes",
"clangcfi": "no",
"safestack": "no",
"rpath": "no",
"runpath": "no",
"symbols": "no",
"fortify_source": "yes",
"fortified": "10",
"fortify-able": "19"
  }
}

So, while I still think a PID would have been nice, now I know that it's 
pr_warn_once and won't be printed again until after the next reboot. Going 
to ask the Arch folks why NX has been disabled...

Thanks,
Christian.
-- 
BOFH excuse #211:

Lightning strikes.

Re: process '/usr/bin/rsync' started with executable stack

2020-06-23 Thread Christian Kujau

On Tue, 23 Jun 2020, Kees Cook wrote:
> > $ checksec --format=json --extended --file=`which rsync` | jq
> > {
> >   "/usr/bin/rsync": {
> > "relro": "full",
> > "canary": "yes",
> > "nx": "no",
> ^^
> 
> It is, indeed, marked executable, it seems. What distro is this?

Arch Linux (x86-64) with 5.6.5.a-1-hardened[0], running in a Xen DomU.

Christian.

[0] 
https://git.archlinux.org/svntogit/packages.git/tree/trunk?h=packages/linux-hardened
-- 
BOFH excuse #211:

Lightning strikes.

process '/usr/bin/rsync' started with executable stack

2020-06-23 Thread Christian Kujau

Hi,

exactly this[0] happened today, on a 5.6.5 kernel:

  process '/usr/bin/rsync' started with executable stack

But I can't reproduce this message, and rsync (v3.2.0, not exactly 
abandonware) runs several times a day, so to repeat Andrew's questions[0] 
from last year:

  > What are poor users supposed to do if this message comes out? 
  > Hopefully google the message and end up at this thread.  What do you
  > want to tell them?

Also, the PID is missing from that message. I had some long running rsync 
processes running earlier, maybe the RWE status would have been visible in 
/proc/$PID/map, or somewhere else maybe?

Please advise? :-)

Thanks,
Christian.

[0] https://lore.kernel.org/patchwork/patch/1164047/#1362722


$ checksec --format=json --extended --file=`which rsync` | jq
{
  "/usr/bin/rsync": {
"relro": "full",
"canary": "yes",
"nx": "no",
"pie": "yes",
"clangcfi": "no",
"safestack": "no",
"rpath": "no",
"runpath": "no",
"symbols": "no",
"fortify_source": "yes",
"fortified": "10",
"fortify-able": "19"
  }
}

-- 
BOFH excuse #244:

Your cat tried to eat the mouse.

Re: 5.7.0 / BUG: kernel NULL pointer dereference / setup_cpu_watcher

2020-06-05 Thread Christian Kujau

On Fri, 5 Jun 2020, Andrew Cooper wrote:
> PVH domains don't have the emulated platform device, so Linux will be
> finding ~0 when it goes looking in config space.
> 
> The diagnostic should be skipped in that case, to avoid giving the false
> impression that something is wrong.

Understood, thanks for explaining that. I won't be able to edit 
arch/x86/xen/platform-pci-unplug.c to fix that though :-\

Christian.
-- 
BOFH excuse #134:

because of network lag due to too many people playing deathmatch

Re: 5.7.0 / BUG: kernel NULL pointer dereference / setup_cpu_watcher

2020-06-05 Thread Christian Kujau

On Fri, 5 Jun 2020, Jürgen Groß wrote:
> Do you happen to start the guest with vcpus < maxvcpus?

Indeed, I was booting with vcpus=2, maxvcpus=4. Setting both to the same 
value made the domU boot.

> If yes there is already a patch pending for 5.8:
> https://git.kernel.org/pub/scm/linux/kernel/git/xen/tip.git/commit/?h=for-linus-5.8=c54b071c192dfe8061336f650ceaf358e6386e0b

Applied that manually and now the system boots even with vcpus < maxvcpus. 
So, if this still matters:

   Tested-by: Christian Kujau 

Thank you for your response, and the fix!

Christian.
-- 
BOFH excuse #146:

Communications satellite used by the military for star wars.

5.7.0 / BUG: kernel NULL pointer dereference / setup_cpu_watcher

2020-06-05 Thread Christian Kujau

Hi,

I'm running a small Xen PVH domain and upgrading from vanilla 5.6.0 to 
5.7.0 caused the splat below, really early during boot. The configuration 
has not changed, all new "make oldconfig" prompts have been answered with 
"N". Old and new config, dmesg are here:

  http://nerdbynature.de/bits/5.7.0/

Searching the interwebs for similar reports didn't return much:

 * drm_sched_get_cleanup_job: BUG: kernel NULL pointer dereference
   https://bugzilla.redhat.com/show_bug.cgi?id=1822984  -- but this 
   appears to be really DRM related. - https://lkml.org/lkml/2020/4/10/545

 * A recent mm/vmstat patch, mentioning "device_offline" in its output
   https://patchwork.kernel.org/patch/11563009/

But other than a few overlapping strings, I guess all of that is totally 
unrelated :(

Thanks,
Christian.


Note: that "Xen Platform PCI: unrecognised magic value" on the top appears 
in 5.6 kernels as well, but no ill effects so far.

---
Xen Platform PCI: unrecognised magic value
ACPI: No IOAPIC entries present
BUG: kernel NULL pointer dereference, address: 02d0
#PF: supervisor read access in kernel mode
#PF: error_code(0x) - not-present page
PGD 0 P4D 0 
Oops:  [#1] SMP PTI
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.7.0 #2
RIP: 0010:device_offline+0x8/0xf0
Code: 48 89 e7 e8 3a ee f3 ff 4c 89 e0 48 83 c4 10 5b 41 5c c3 45 31 e4 48 83 
c4 10 4c 89 e0 5b 41 5c c3 90 41 54 55 53 48 83 ec 10  87 d0 02 00 00 01 0f 
85 ca 00 00 00 48 89 fb 48 8b 7f 48 48 85
RSP: :bd9100013e78 EFLAGS: 00010286
RAX:  RBX:  RCX: 820001fa
RDX: 9c9c3dd0 RSI: 820001fa RDI: 
RBP: 0002 R08: 0001 R09: 
R10: 9c9c3d5072a8 R11:  R12: 9c9c3d594720
R13: 8a57e5a8 R14:  R15: 
FS:  () GS:9c9c3dc0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 02d0 CR3: 6b00a001 CR4: 001606b0
Call Trace:
 setup_cpu_watcher+0x44/0x60
 ? plt_clk_driver_init+0xe/0xe
 setup_vcpu_hotplug_event+0x23/0x26
 do_one_initcall+0x47/0x180
 kernel_init_freeable+0x13b/0x19d
 ? rest_init+0x95/0x95
 kernel_init+0x5/0xeb
 ret_from_fork+0x35/0x40
Modules linked in:
CR2: 02d0
---[ end trace b0cc587db609787f ]---

-- 
BOFH excuse #440:

Cache miss - please take better aim next time

Re: file system permissions regression affecting root

2020-05-16 Thread Christian Kujau

On Wed, 13 May 2020, Patrick Donnelly wrote:
> However, it seems odd that this depends on the owner of the directory.
> i.e. this protection only seems to be enforced if the sticky directory
> is owned by root. That's expected?

According to the documentation[0] this appears to be intentional:

 protected_regular:
   [...]
   When set to "1" don't allow O_CREAT open on regular files that we
   don't own in world writable sticky directories, unless they are
   owned by the owner of the directory.

C.

[0] https://www.kernel.org/doc/Documentation/sysctl/fs.txt
-- 
BOFH excuse #263:

It's stuck in the Web.

Re: [Jfs-discussion] [fs] 05c5a0273b: netperf.Throughput_total_tps -71.8% regression

2020-05-13 Thread Christian Kujau

On Tue, 12 May 2020, kernel test robot wrote:
> FYI, we noticed a -71.8% regression of netperf.Throughput_total_tps due to 
> commit:

As noted in this report, netperf is used to "measure various aspect of 
networking performance". Are we sure the bisect is correct? JFS is a 
filesystem and is not touching net/ in any way. So, having not attempted 
to reproduce this, maybe the JFS commit is a red herring?

C.
-- 
BOFH excuse #50:

Change in Earth's rotational speed

ptrace.c:202:6: warning: this statement may fall through

2019-07-30 Thread Christian Kujau

While compiling mainline with gcc-9.1.1 the following warning is emitted:

===
../arch/x86/kernel/ptrace.c: In function ‘set_segment_reg’: 
../arch/x86/kernel/ptrace.c:202:6: warning: this statement may fall 
through [-Wimplicit-fallthrough=]
  202 |   if (unlikely(value == 0))
  |  ^
../arch/x86/kernel/ptrace.c:205:2: note: here
  205 |  default:
  |  ^~~
===

The patch below silences the warning, but I don't know if this is actual 
intended behaviour.

Christian.

diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index 0fdbe89d0754..0030456d6e5c 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -201,6 +201,7 @@ static int set_segment_reg(struct task_struct *task,
case offsetof(struct user_regs_struct, ss):
if (unlikely(value == 0))
return -EIO;
+   /* fall through */

default:
*pt_regs_access(task_pt_regs(task), offset) = value;
-- 
BOFH excuse #326:

We need a licensed electrician to replace the light bulbs in the computer room.

Re: FS-Cache: Duplicate cookie detected

2019-03-31 Thread Christian Kujau

Hi David,

On Tue, 12 Mar 2019, David Howells wrote:
> > My /usr/local/src mount was mounted with vers=4.2 (default), while 
> > nfstest_cache was mounting its test-mount with vers=4.1! Apart from the 
> > different rsize/wsize values, the version number stood out. And indeed, 
> > when I mount my regular NFS mount /usr/local/src with vers=4.1, the 
> > "duplicate cookie" is no longer printed.
> 
> Yeah - NFS superblocks are differentiated by a whole host of parameters,
> including protocol version number, and caches aren't shared between
> superblocks because this introduces a tricky coherency problem.
> 
> The issue is that NFS superblocks to the same place do not currently manage
> coherency (inode attributes, data) between themselves, except via the server.
> 
> However, if "fsc" isn't given on the mount commandline, the superblock
> probably shouldn't get a server-level cookie if we can avoid it.

Just checking - are you waiting for new results from me, should I test 
something that I missed? Or are new patches in the works? :-D

Thanks,
Christian.
-- 
BOFH excuse #139:

UBNC (user brain not connected)

Re: FS-Cache: Duplicate cookie detected

2019-03-12 Thread Christian Kujau

On Mon, 11 Mar 2019, David Howells wrote:
> I've a couple more patches for you - one a bugfix and one that will print more
> information.  They don't actually affect the problem you're seeing.  I'll post
> them as replies to this message.

Thanks for the patches. I've applied all three to v5.0 and ran 
"nfstest_cache" and was able to reproduce the messages. Please note that 
I'm only running "nfstest_cache" because it's somehow able to reproduce 
the message reliably - otherwise the message just shows up once or twice 
in syslog, but I didn't know how to reproduce it.

But I noticed something else this time, and I did not notice that before: 
while running nfstest_cache, the "duplicate cookie" messages were only 
triggered when my other, non-test mount was also mounted during the test. 
Let me describe my F29 test VM again:

* VM boots, and /usr/local/src gets mounted via NFS, read-only, and
  with w/o fsc options. cachefilesd isn't even installed here.

* I run nfstest_cache and apparently it's mounting the same NFS export
  from the server to /mnt/t, as a readonly mount.

So two mounts, one in /usr/local/src, the other in /mnt/t, both readonly 
and both w/o "fsc", but the "duplicate cookie" message is only printed 
when /usr/local/src was mounted. If /usr/local/src wasn't mounted, the 
test would complete[0] and no "duplicate message was printed. And then I 
noticed:

--
$ mount | tail -2 | fold
horus:/usr/local/src on /usr/local/src type nfs4 
(ro,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255, 
hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.56.139, 
local_lock=none,addr=192.168.0.115)

horus:/ on /mnt/t type nfs4 
(rw,relatime,vers=4.1,rsize=4096,wsize=4096,namlen=255, 
hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.56.139, 
local_lock=none,addr=192.168.0.115) 
--

My /usr/local/src mount was mounted with vers=4.2 (default), while 
nfstest_cache was mounting its test-mount with vers=4.1! Apart from the 
different rsize/wsize values, the version number stood out. And indeed, 
when I mount my regular NFS mount /usr/local/src with vers=4.1, the 
"duplicate cookie" is no longer printed.

For simplicity, I've attached two logs to this email:

* nfs_no-mount.txt.xz - showing /proc/fs/nfsfs/volumes and
  /proc/fs/fscache/stats every 0.01 seconds, while running nfstest_cache
  in another terminal. Note that no duplicate "cookie messages" were 
  triggered, as /usr/local/src was not mounted.

* nfs_with-mount.txt.xz - same, but here /usr/local/src was mounted (and 
  defaulted to vers=4.2), and thus "duplicate cookie" messages were 
  printed.

I fear that all this may complicate this strange behaviour, and now we're 
examining NFS mount versions, but I only noticed that now, not earlier :-\

I can't comment on the patches much, as you mentioned they won't make the 
message go away, but I hope it printed more details now.

Thanks,
Christian.

[0] Again, I'm using nfstest_cache only to trigger the message. Everytime
I execute it, the test fails, because I think it expects a rw-mount:

 $ nfstest_cache --server horus --client fedora0 --runtest=acregmin_attr
 *** Verify consistency of attribute caching with NFSv4.1 on a file 
 acregmin = 10
 TEST: Running test 'acregmin_attr'
 FAIL: Traceback (most recent call last):
File "/usr/bin/nfstest_cache", line 199, in do_file_test
  fdw = open(self.absfile, "w")
IOError: [Errno 30] Read-only file system: 
 '/mnt/t/nfstest_cache_20190311223404_f_1'
TIME: 4.497078s
 1 tests (0 passed, 1 failed)
 Total time: 5.529826s

-- 
BOFH excuse #209:

Only people with names beginning with 'A' are getting mail this week (a la 
Microsoft)

nfs_with-mount.txt.xz
Description: application/xz

nfs_no-mount.txt.xz
Description: application/xz

Re: FS-Cache: Duplicate cookie detected

2019-03-08 Thread Christian Kujau

On Fri, 8 Mar 2019, Christian Kujau wrote:
> Running Linux v5.0 with this patch applied does indeed still produce the 
> "Duplicate cookie detected" messages, but I only ever see wrq=0 when 
> running nfstest_cache: 
> 
>https://paste.fedoraproject.org/paste/dkav0FQzYZxE9-V7GphjAQ

And again with the whole /proc/fs/fscache/stats output and better time 
stamps: https://paste.fedoraproject.org/paste/hZtCPStJlqB1d9JXnTFndQ

C
-- 
BOFH excuse #5:

static from plastic slide rules

Re: FS-Cache: Duplicate cookie detected

2019-03-08 Thread Christian Kujau

On Fri, 8 Mar 2019, David Howells wrote:
> See the attached for a patch that helps with certain kinds of collision,
> though I can't see that it should help with what you're seeing since the
> RELINQUISHED flag isn't set on the old cookie (fl=222, but 0x10 isn't in
> there).  You can monitor the number of waits by looking in
> /proc/fs/fscache/stats for the:
> 
>   Acquire: n=289166 nul=0 noc=0 ok=286331 nbf=2 oom=0 wrq=23748

Running Linux v5.0 with this patch applied does indeed still produce the 
"Duplicate cookie detected" messages, but I only ever see wrq=0 when 
running nfstest_cache: 

   https://paste.fedoraproject.org/paste/dkav0FQzYZxE9-V7GphjAQ

(Scroll down until the messages start to appear again)

Only the n= field seems to change during that test:


fedora0# grep wrq n2.log  | sort | uniq -c | sort -n
 28 Acquire: n=8 nul=0 noc=0 ok=1 nbf=0 oom=0 wrq=0
 29 Acquire: n=7 nul=0 noc=0 ok=1 nbf=0 oom=0 wrq=0
 34 Acquire: n=6 nul=0 noc=0 ok=1 nbf=0 oom=0 wrq=0
 82 Acquire: n=9 nul=0 noc=0 ok=1 nbf=0 oom=0 wrq=0
 93 Acquire: n=5 nul=0 noc=0 ok=1 nbf=0 oom=0 wrq=0

HTH,
Christian.
-- 
BOFH excuse #5:

static from plastic slide rules

Re: FS-Cache: Duplicate cookie detected

2019-03-08 Thread Christian Kujau

On Fri, 8 Mar 2019, David Howells wrote:
> > $ mount | grep nfs4
> > nfs:/usr/local/src on /usr/local/src type nfs4 
> > (ro,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.56.139,local_lock=none,addr=192.168.0.115)
> > 
> > ...so FS-Cache ("fsc") isn't even used here.
> 
> Interesting.  Can you do:
> 
>   cat /proc/fs/nfsfs/volumes

That seems to confirm the mount options, fsc is disabled:

# cat /proc/fs/nfsfs/volumes
NV SERVER   PORT DEV  FSID  FSC
v4 c0a80073  801 0:46 1cfd45bf1921474d:a795870ea80f5ff7 no

> See the attached for a patch that helps with certain kinds of collision,
> though I can't see that it should help with what you're seeing since the
> RELINQUISHED flag isn't set on the old cookie (fl=222, but 0x10 isn't in
> there).  You can monitor the number of waits by looking in
> /proc/fs/fscache/stats for the:
> 
>   Acquire: n=289166 nul=0 noc=0 ok=286331 nbf=2 oom=0 wrq=23748

Ah, the wrq= field gets only introduced by this patch. OK, I'll see if I 
can build a test kernel with that and will report back.

Thanks for looking in to this,
Christian.
-- 
BOFH excuse #290:

The CPU has shifted, and become decentralized.

Re: FS-Cache: Duplicate cookie detected

2019-03-06 Thread Christian Kujau

On Wed, 6 Mar 2019, David Howells wrote:
> I can reproduce a slightly different problem by setting off ~6000 parallel
> processes, each reading its own individual directory of files.

Ususually I only see it shortly after mount, and only once, but I too can 
reproduce it with NFStest ([0], and there's a Fedora package too) via 
"nfstest_cache --server $SERVER --client `hostname`", which then produces 
a couple of these messages:

FS-Cache: Duplicate cookie detected
FS-Cache: O-cookie c=2fcc866b [p=c10c6e18 fl=222 nc=0 na=1]
FS-Cache: O-cookie d=d5ed73bb n=076c9150
FS-Cache: O-key=[10] '040002000801c0a80073'
FS-Cache: N-cookie c=e8d5dcd4 [p=c10c6e18 fl=2 nc=0 na=1]
FS-Cache: N-cookie d=d5ed73bb n=a54e9705
FS-Cache: N-key=[10] '040002000801c0a80073'


...and the O-key does indeed seem to resemble the server address, 
somewhat:

 >>> s = "040002000801c0a80073";
 >>> bytes = ["".join(x) for x in zip(*[iter(s)]*2)]; bytes = [int(x, 16) for x 
 >>> in bytes]; print ".".join(str(x) for x in reversed(bytes))
 115.0.168.192.1.8.0.2.0.4
 ^

Mount options on that client are:

$ mount | grep nfs4
nfs:/usr/local/src on /usr/local/src type nfs4 
(ro,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.56.139,local_lock=none,addr=192.168.0.115)

...so FS-Cache ("fsc") isn't even used here. The server in that scenario 
is a current Fedora 29 installation.

HTH,
Christian.

[0] http://wiki.linux-nfs.org/wiki/index.php/NFStest

> 
> I also see reports like this:
> 
>FS-Cache: Duplicate cookie detected
>FS-Cache: O-cookie c=db33ad59 [p=4bc53500 fl=218 nc=0 na=0]
>FS-Cache: O-cookie d=  (null) n=  (null)
>FS-Cache: O-cookie o=6cf6db4f
>FS-Cache: O-key=[16] '010001010100e51fc6000323ae02'
>FS-Cache: N-cookie c=791c49d0 [p=4bc53500 fl=2 nc=0 na=1]
>FS-Cache: N-cookie d=e220fe14 n=d4484489
>FS-Cache: N-key=[16] '010001010100e51fc6000323ae02'
> 
> with no cookie def or netfs data and flags ACQUIRED, RELINQUISHED and
> INVALIDATING - which I can insert a wait for.
> 
> David

-- 
BOFH excuse #420:

Feature was not beta tested

FS-Cache: Duplicate cookie detected

2019-03-05 Thread Christian Kujau

Hi,

ever since ec0328e46d6e ("fscache: Maintain a catalogue of allocated 
cookies") was commited, people are seeing[0] those "Duplicate cookie 
detected" messages in syslog, see below. NFS and CIFS mounts appear to 
continue to work, but these messsages are new and I too am wondering if 
this is something to worry about.

They _are_ logged with pr_err in fs/fscache/cookie.c, but maybe this needs 
to be changed to a different loglevel?

Thanks,
Christian.

 FS-Cache: Duplicate cookie detected
 FS-Cache: O-cookie c=9da9dbf0 [p=1593f904 fl=222 nc=0 na=1]
 FS-Cache: O-cookie d=287febd9 n=980c9e8a
 FS-Cache: O-key=[8] '020001bdc0a80064'
 FS-Cache: N-cookie c=bfe3f869 [p=1593f904 fl=2 nc=0 na=1]
 FS-Cache: N-cookie d=287febd9 n=e153f178
 FS-Cache: N-key=[8] '020001bdc0a80064'


[0] https://bugzilla.kernel.org/show_bug.cgi?id=200145
-- 
BOFH excuse #318:

Your EMAIL is now being delivered by the USPS.

Re: [PATCH] x86/uaccess: Remove unused __addr_ok() macro

2019-03-03 Thread Christian Kujau

On Mon, 25 Feb 2019, Joe Perches wrote:
> Looks like it's not used in several arches
> 
> $ git grep -w __addr_ok
> arch/arm/include/asm/uaccess.h:#define __addr_ok(addr)  
> ((void)(addr), 1)
> arch/csky/include/asm/uaccess.h:#define __addr_ok(addr) (access_ok(addr, 0))
> arch/openrisc/include/asm/uaccess.h:#define __addr_ok(addr) ((unsigned long) 
> addr < get_fs())
> arch/sh/include/asm/uaccess.h:#define __addr_ok(addr) \
> arch/sh/include/asm/uaccess.h:  __ao_end >= __ao_a && __addr_ok(__ao_end); })
> arch/x86/include/asm/uaccess.h:#define __addr_ok(addr)  \

If so, would simly removing it do the trick or is there more magic 
involved? I don't have that many cross-compilers though and it's not even 
build-tested:


commit f899653c64cce05fde426d0298cd67670f8ab8e2
Author: Christian Kujau 
Date:   Sun Mar 3 22:43:09 2019 -0800

Remove unused __addr_ok() macro.

 arch/arm/include/asm/uaccess.h  | 1 -
 arch/csky/include/asm/uaccess.h | 2 --
 arch/openrisc/include/asm/uaccess.h | 3 ---
 arch/sh/include/asm/uaccess.h   | 5 +
 arch/x86/include/asm/uaccess.h  | 2 --
 5 files changed, 1 insertion(+), 12 deletions(-)

Signed-off-by: Christian Kujau 

diff --git a/arch/arm/include/asm/uaccess.h b/arch/arm/include/asm/uaccess.h
index 42aa4a22803c..16411c76076d 100644
--- a/arch/arm/include/asm/uaccess.h
+++ b/arch/arm/include/asm/uaccess.h
@@ -266,7 +266,6 @@ extern int __put_user_8(void *, unsigned long long);
 #define USER_DSKERNEL_DS
 
 #define segment_eq(a, b)   (1)
-#define __addr_ok(addr)((void)(addr), 1)
 #define __range_ok(addr, size) ((void)(addr), 0)
 #define get_fs()   (KERNEL_DS)
 
diff --git a/arch/csky/include/asm/uaccess.h b/arch/csky/include/asm/uaccess.h
index eaa1c3403a42..c02b243fecaa 100644
--- a/arch/csky/include/asm/uaccess.h
+++ b/arch/csky/include/asm/uaccess.h
@@ -24,8 +24,6 @@ static inline int access_ok(const void *addr, unsigned long 
size)
((unsigned long)(addr + size) < limit));
 }
 
-#define __addr_ok(addr) (access_ok(addr, 0))
-
 extern int __put_user_bad(void);
 
 /*
diff --git a/arch/openrisc/include/asm/uaccess.h 
b/arch/openrisc/include/asm/uaccess.h
index a44682c8adc3..9198371e30c2 100644
--- a/arch/openrisc/include/asm/uaccess.h
+++ b/arch/openrisc/include/asm/uaccess.h
@@ -55,9 +55,6 @@
  */
 #define __range_ok(addr, size) (size <= get_fs() && addr <= (get_fs()-size))
 
-/* Ensure that addr is below task's addr_limit */
-#define __addr_ok(addr) ((unsigned long) addr < get_fs())
-
 #define access_ok(addr, size)  \
 ({ \
unsigned long __ao_addr = (unsigned long)(addr);\
diff --git a/arch/sh/include/asm/uaccess.h b/arch/sh/include/asm/uaccess.h
index 5fe751ad7582..b41f6a011474 100644
--- a/arch/sh/include/asm/uaccess.h
+++ b/arch/sh/include/asm/uaccess.h
@@ -5,9 +5,6 @@
 #include 
 #include 
 
-#define __addr_ok(addr) \
-   ((unsigned long __force)(addr) < current_thread_info()->addr_limit.seg)
-
 /*
  * __access_ok: Check if address with size is OK or not.
  *
@@ -19,7 +16,7 @@
 #define __access_ok(addr, size)({  \
unsigned long __ao_a = (addr), __ao_b = (size); \
unsigned long __ao_end = __ao_a + __ao_b - !!__ao_b;\
-   __ao_end >= __ao_a && __addr_ok(__ao_end); })
+   __ao_end >= __ao_a; })
 
 #define access_ok(addr, size)  \
(__chk_user_ptr(addr),  \
diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index c133478d..d630978738dc 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -37,8 +37,6 @@ static inline void set_fs(mm_segment_t fs)
 #define segment_eq(a, b)   ((a).seg == (b).seg)
 
 #define user_addr_max() (current->thread.addr_limit.seg)
-#define __addr_ok(addr)\
-   ((unsigned long __force)(addr) < user_addr_max())
 
 /*
  * Test whether a block of memory is a valid user space address.


-- 
BOFH excuse #123:

user to computer ratio too high.

RIP: e030:move_page_tables+0xaa3/0xb80

2019-02-01 Thread Christian Kujau

Hi,

I'm running an Ubuntu "mainline" kernel[0] as a Xen 4.11.1 DomU (PV) and 
ever since upgrading to Linux 5.0-rcX I get these WARNING messages shown 
below. Going back in my logs[1] I can see that I got a similar messages 
for v4.20 too, but with v5.0 they appear more often and upgrading from 
v5.0-rc3 to -rc4 made it even worse, now the messages show up quickly 
after boot and some commands (w, ps, top) become stuck and shutdown would 
hang too.

I found an email thread[2] from earlier this month (hence the CC list) 
about this, but could not find out if this issue has been concluded or 
even fixed. 

I've gone back to v4.20 now and the message hasn't appeared yet, but it 
probably will in a few days again. Let me know if you need more details, 
v5.0-rc4 should make it easier for me to reproduce.

Thanks,
Christian.

[0] https://kernel.ubuntu.com/~kernel-ppa/mainline
https://wiki.ubuntu.com/Kernel/MainlineBuilds
[1] http://nerdbynature.de/bits/5.0.0-rc4/kern_msg.txt
[2] https://www.spinics.net/lists/stable/msg279001.html


WARNING: CPU: 1 PID: 386 at arch/x86/xen/multicalls.c:102 
xen_mc_flush+0x196/0x1f0
Modules linked in: rpcsec_gss_krb5 auth_rpcgss lz4 lz4_compress 
crct10dif_pclmul xen_kbdfront(-) ghash_clmulni_intel xen_fbfront 
fb_sys_fops syscopyarea sysfillrect sysimgblt aesni_intel aes_x86_64 
crypto_simd cryptd glue_helper intel_rapl_perf sch_fq_codel 
nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct 
nf_conntrack zram nf_defrag_ipv6 nf_defrag_ipv4 reiserfs nfsv4 nfs 
nf_tables_set lockd grace fscache dm_crypt sunrpc btrfs nf_tables 
nfnetlink zstd_compress ip_tables x_tables autofs4 raid10 raid456 
async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq 
libcrc32c raid1 raid0 multipath linear [last unloaded: mac_hid]
CPU: 1 PID: 386 Comm: systemd-udevd Not tainted 5.0.0-05rc4-generic 
#201901272036
RIP: e030:xen_mc_flush+0x196/0x1f0
Code: 48 c1 e0 05 4d 8b 55 38 4d 8b 45 40 48 05 00 10 00 81 e8 5d ae dd 00 
49 89 45 18 48 c1 e8 3f 48 89 c6 85 f6 0f 84 05 ff ff ff <0f> 0b 65 8b 0d 
b1 92 fe 7e 41 8b 55 00 48 c7 c7 30 b0 2d 82 e8 f4
RSP: e02b:c900410dbd00 EFLAGS: 00010002
RAX: 888175c946d8 RBX: 777f8000 RCX: 0001
RDX: 888175c946d8 RSI: 0001 RDI: 888175c94310
RBP: c900410dbd30 R08: 0001 R09: 88816fe5b000
R10: 7ff0 R11: 0011 R12: 000f
R13: 888175c94300 R14: 0200 R15: 
FS:  7fca75421680() GS:888175c8() 
knlGS:
CS:  1e030 DS:  ES:  CR0: 80050033
CR2: 5606a0d8bfb8 CR3: 00016d51a000 CR4: 00042660
Call Trace:
 __xen_pgd_pin+0x10c/0x300
 xen_activate_mm+0x28/0x40
 xen_dup_mmap+0xe/0x10
 copy_process.part.37+0x1e7a/0x1f70
 _do_fork+0xe8/0x3a0
 __x64_sys_clone+0x27/0x30
 do_syscall_64+0x5a/0x110
 entry_SYSCALL_64_after_hwframe+0x44/0xa9



-- 
BOFH excuse #160:

non-redundant fan failure

Re: [PATCH] Revert "scripts/setlocalversion: git: Make -dirty check more robust"

2018-11-06 Thread Christian Kujau

On Tue, 6 Nov 2018, Brian Norris wrote:
> > Perhaps both scenarios could be satisfied by having
> > scripts/setlocalversion first check if .git has write permissions, and
> > acting accordingly. Looking into history, this actually used to be
> > done, but cdf2bc632ebc ("scripts/setlocalversion on write-protected
> > source tree", 2013-06-14) removed the updating of the index.
> 
> A "writeable" check (e.g., [ -w . ]) would be sufficient for our case.
> But I'm not so sure about that older NFS report, and I'm also not sure
> that we should be writing to the source tree at all in this case. Maybe
> we can also check whether there's a build output directory specified?

FWIW, the issue I reported back in 2013[0] was not an ill-configured NFS 
export, but a read-only NFS export (and then a read-write exported NFS 
export, but the user compiling the kernel did not have write permission) 
and so "test -w .git" did not help in determining if the source tree can 
actually written to. And depending on the user's shell[1], this may or may 
not still be the case.

So I'm all for the $(touch .git/some-file-here) test to decide if the 
kernel has to be modified during build.

Christian. 

[0] https://lkml.org/lkml/2013/6/14/574
[1] https://manpages.debian.org/unstable/dash/dash.1.en.html

> > However, I admit I don't understand the justification in that commit
> > from 2013. I'm no NFS expert, but perhaps the real problem there is an
> > incorrectly configured NFS setup (uid/gid mismatch between NFS
> > client/server, or permissions mismatch between mount options and NFS
> > server?). Christian Kujau: can you speak to that?
> > 
> > Well, we could also make our check $(touch .git/some-file-here
> > 2>/dev/null && ...) instead of $(test -w .git) to handle misconfigured
> > NFS setups. But not sure if that has its own problems.
> 
> Trying to 'touch' the source tree will also break us. No matter whether
> you redirect stderr, our sandbox will still notice the build is doing
> something fishy and complain.

-- 
BOFH excuse #192:

runaway cat on system.

Re: [PATCH] Revert "scripts/setlocalversion: git: Make -dirty check more robust"

2018-11-06 Thread Christian Kujau

On Tue, 6 Nov 2018, Brian Norris wrote:
> > Perhaps both scenarios could be satisfied by having
> > scripts/setlocalversion first check if .git has write permissions, and
> > acting accordingly. Looking into history, this actually used to be
> > done, but cdf2bc632ebc ("scripts/setlocalversion on write-protected
> > source tree", 2013-06-14) removed the updating of the index.
> 
> A "writeable" check (e.g., [ -w . ]) would be sufficient for our case.
> But I'm not so sure about that older NFS report, and I'm also not sure
> that we should be writing to the source tree at all in this case. Maybe
> we can also check whether there's a build output directory specified?

FWIW, the issue I reported back in 2013[0] was not an ill-configured NFS 
export, but a read-only NFS export (and then a read-write exported NFS 
export, but the user compiling the kernel did not have write permission) 
and so "test -w .git" did not help in determining if the source tree can 
actually written to. And depending on the user's shell[1], this may or may 
not still be the case.

So I'm all for the $(touch .git/some-file-here) test to decide if the 
kernel has to be modified during build.

Christian. 

[0] https://lkml.org/lkml/2013/6/14/574
[1] https://manpages.debian.org/unstable/dash/dash.1.en.html

> > However, I admit I don't understand the justification in that commit
> > from 2013. I'm no NFS expert, but perhaps the real problem there is an
> > incorrectly configured NFS setup (uid/gid mismatch between NFS
> > client/server, or permissions mismatch between mount options and NFS
> > server?). Christian Kujau: can you speak to that?
> > 
> > Well, we could also make our check $(touch .git/some-file-here
> > 2>/dev/null && ...) instead of $(test -w .git) to handle misconfigured
> > NFS setups. But not sure if that has its own problems.
> 
> Trying to 'touch' the source tree will also break us. No matter whether
> you redirect stderr, our sandbox will still notice the build is doing
> something fishy and complain.

-- 
BOFH excuse #192:

runaway cat on system.

Re: [Jfs-discussion] [PATCH] jfs: Expand usercopy whitelist for inline inode data

2018-08-20 Thread Christian Kujau

On Fri, 17 Aug 2018, Kees Cook wrote:
> On Thu, Aug 16, 2018 at 11:56 PM, Christian Kujau  
> wrote:
> > On Fri, 3 Aug 2018, Kees Cook via Jfs-discussion wrote:
> >> Bart Massey reported what turned out to be a usercopy whitelist false
> >> positive in JFS when symlink contents exceeded 128 bytes. The inline
> >> inode data (i_inline) is actually designed to overflow into the "extended
> >
> > So, this may be a stupid question, but: is there a way to disable this
> > hardened usercopy thing with a boot option maybe?
> >
> > Apparently, CONFIG_HARDENED_USERCOPY_FALLBACK was disabled in Debian's
> > 4.16.0-0.bpo.2-amd64 (4.16.16) kernels[0] and I have a VMware guest here
> > that prints a BUG message (below) whenever a certain directory is being
> > accesses. ls(1) is fine, but "ls -l" (i.e. with stat()) produces the splat
> > below. And indeed, the target of one of the symlinks inside is 129
> > characters long, and every attempt to stat it prints the splat below.
> >
> > Going back to 4.16.0-0.bpo.1-amd64 (4.16.5) helps, but I was wondering if
> > there was a magic boot option to disable it while I wait for 4.18 to land
> > in Debian? I booted with hardened_usercopy=off, but it doesn't seem to
> > have an effect and the directory is still inaccessible.
> 
> Precisely this was just added upstream[1] for 4.19 but isn't available
> in 4.16. It should be trivial to backport it, though, if Ben wants to
> do that? (The JFS fix is in the 4.17 and 4.18 -stable trees now, too,
> BTW.)

Ah, OK. While the patch does apply (almost) cleanly to 4.16, I think I'll 
just wait until it makes its way into the Debian (backports) kernel, as 
nobody else seems to be annoyed by this :-)

Thanks!
Christian.
-- 
BOFH excuse #53:

Little hamster in running wheel had coronary; waiting for replacement to be 
Fedexed from Wyoming

Re: [Jfs-discussion] [PATCH] jfs: Expand usercopy whitelist for inline inode data

2018-08-20 Thread Christian Kujau

On Fri, 17 Aug 2018, Kees Cook wrote:
> On Thu, Aug 16, 2018 at 11:56 PM, Christian Kujau  
> wrote:
> > On Fri, 3 Aug 2018, Kees Cook via Jfs-discussion wrote:
> >> Bart Massey reported what turned out to be a usercopy whitelist false
> >> positive in JFS when symlink contents exceeded 128 bytes. The inline
> >> inode data (i_inline) is actually designed to overflow into the "extended
> >
> > So, this may be a stupid question, but: is there a way to disable this
> > hardened usercopy thing with a boot option maybe?
> >
> > Apparently, CONFIG_HARDENED_USERCOPY_FALLBACK was disabled in Debian's
> > 4.16.0-0.bpo.2-amd64 (4.16.16) kernels[0] and I have a VMware guest here
> > that prints a BUG message (below) whenever a certain directory is being
> > accesses. ls(1) is fine, but "ls -l" (i.e. with stat()) produces the splat
> > below. And indeed, the target of one of the symlinks inside is 129
> > characters long, and every attempt to stat it prints the splat below.
> >
> > Going back to 4.16.0-0.bpo.1-amd64 (4.16.5) helps, but I was wondering if
> > there was a magic boot option to disable it while I wait for 4.18 to land
> > in Debian? I booted with hardened_usercopy=off, but it doesn't seem to
> > have an effect and the directory is still inaccessible.
> 
> Precisely this was just added upstream[1] for 4.19 but isn't available
> in 4.16. It should be trivial to backport it, though, if Ben wants to
> do that? (The JFS fix is in the 4.17 and 4.18 -stable trees now, too,
> BTW.)

Ah, OK. While the patch does apply (almost) cleanly to 4.16, I think I'll 
just wait until it makes its way into the Debian (backports) kernel, as 
nobody else seems to be annoyed by this :-)

Thanks!
Christian.
-- 
BOFH excuse #53:

Little hamster in running wheel had coronary; waiting for replacement to be 
Fedexed from Wyoming

Re: [Jfs-discussion] [PATCH] jfs: Expand usercopy whitelist for inline inode data

2018-08-17 Thread Christian Kujau

On Fri, 3 Aug 2018, Kees Cook via Jfs-discussion wrote:
> Bart Massey reported what turned out to be a usercopy whitelist false
> positive in JFS when symlink contents exceeded 128 bytes. The inline
> inode data (i_inline) is actually designed to overflow into the "extended

So, this may be a stupid question, but: is there a way to disable this 
hardened usercopy thing with a boot option maybe?

Apparently, CONFIG_HARDENED_USERCOPY_FALLBACK was disabled in Debian's 
4.16.0-0.bpo.2-amd64 (4.16.16) kernels[0] and I have a VMware guest here 
that prints a BUG message (below) whenever a certain directory is being 
accesses. ls(1) is fine, but "ls -l" (i.e. with stat()) produces the splat 
below. And indeed, the target of one of the symlinks inside is 129 
characters long, and every attempt to stat it prints the splat below.

Going back to 4.16.0-0.bpo.1-amd64 (4.16.5) helps, but I was wondering if 
there was a magic boot option to disable it while I wait for 4.18 to land 
in Debian? I booted with hardened_usercopy=off, but it doesn't seem to 
have an effect and the directory is still inaccessible.

Thanks,
Christian.

[0] 
https://salsa.debian.org/kernel-team/linux/tree/stretch-backports/debian/config/


---[ end trace dbb1a6dfa1411526 ]---
usercopy: Kernel memory exposure attempt detected from SLUB object 
'jfs_ip' (offset 288, size 129)!
[ cut here ]
kernel BUG at /build/linux-hvYKKE/linux-4.17.8/mm/usercopy.c:100!
invalid opcode:  [#2] SMP PTI
Modules linked in: xt_tcpudp iptable_filter binfmt_misc zram zsmalloc 
vmw_vsock_vmci_transport vsock ip_tables x_tables xts twofish_x86_64_3way 
twofish_x86_64 twofish_common lrw jfs glue_helper gf128mul dm_crypt dm_mod 
sd_mod evdev vmxnet3 mptsas scsi_transport_sas mptscsih mptbase vmw_vmci 
ata_piix libata scsi_mod button
CPU: 0 PID: 1349 Comm: ls Tainted: G  D   4.17.0-0.bpo.1-amd64 
#1 Debian 4.17.8-1~bpo9+1
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop 
Reference Platform, BIOS 6.00 09/21/2015
RIP: 0010:usercopy_abort+0x69/0x80
RSP: 0018:b84e40e2fe18 EFLAGS: 00010286
RAX: 0063 RBX: 0081 RCX: 
RDX:  RSI: 9786ffc16738 RDI: 9786ffc16738
RBP: 0081 R08:  R09: 042e
R10: 9c68af71 R11: 323120657a697320 R12: 0001
R13: 9786f93146a1 R14: 0082 R15: 559dd2edb170
FS:  7fe8f13733c0() GS:9786ffc0() 
knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 559dd2edb088 CR3: 3d104002 CR4: 003606f0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
 __check_heap_object+0xeb/0x120
 __check_object_size+0xb8/0x1a0
 readlink_copy+0x3e/0x60
 vfs_readlink+0x60/0x120
 do_readlinkat+0xf9/0x120
 __x64_sys_readlink+0x1b/0x20
 do_syscall_64+0x55/0x110
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7fe8f0c6fe47
RSP: 002b:7ffe94d04528 EFLAGS: 0202 ORIG_RAX: 0059
RAX: ffda RBX: 0082 RCX: 7fe8f0c6fe47
RDX: 0082 RSI: 559dd2edb170 RDI: 7ffe94d04570
RBP: 559dd2edb170 R08: 0003 R09: 0090
R10:  R11: 0202 R12: 7ffe94d04570
R13: 7ffe94d04570 R14: 3fff R15: 7ffe
Code: 0f 44 d0 53 48 c7 c0 58 05 65 9c 51 48 c7 c6 12 f9 63 9c 41 53 48 89 
f9 48 0f 45 f0 4c 89 d2 48 c7 c7 40 06 65 9c e8 05 97 e9 ff <0f> 0b 49 c7 
c1 03 09 66 9c 4d 89 cb 4d 89 c8 eb a5 66 0f 1f 44 
RIP: usercopy_abort+0x69/0x80 RSP: b84e40e2fe18
---[ end trace dbb1a6dfa1411527 ]---


-- 
BOFH excuse #404:

Sysadmin accidentally destroyed pager with a large hammer.

Re: [Jfs-discussion] [PATCH] jfs: Expand usercopy whitelist for inline inode data

2018-08-17 Thread Christian Kujau

On Fri, 3 Aug 2018, Kees Cook via Jfs-discussion wrote:
> Bart Massey reported what turned out to be a usercopy whitelist false
> positive in JFS when symlink contents exceeded 128 bytes. The inline
> inode data (i_inline) is actually designed to overflow into the "extended

So, this may be a stupid question, but: is there a way to disable this 
hardened usercopy thing with a boot option maybe?

Apparently, CONFIG_HARDENED_USERCOPY_FALLBACK was disabled in Debian's 
4.16.0-0.bpo.2-amd64 (4.16.16) kernels[0] and I have a VMware guest here 
that prints a BUG message (below) whenever a certain directory is being 
accesses. ls(1) is fine, but "ls -l" (i.e. with stat()) produces the splat 
below. And indeed, the target of one of the symlinks inside is 129 
characters long, and every attempt to stat it prints the splat below.

Going back to 4.16.0-0.bpo.1-amd64 (4.16.5) helps, but I was wondering if 
there was a magic boot option to disable it while I wait for 4.18 to land 
in Debian? I booted with hardened_usercopy=off, but it doesn't seem to 
have an effect and the directory is still inaccessible.

Thanks,
Christian.

[0] 
https://salsa.debian.org/kernel-team/linux/tree/stretch-backports/debian/config/


---[ end trace dbb1a6dfa1411526 ]---
usercopy: Kernel memory exposure attempt detected from SLUB object 
'jfs_ip' (offset 288, size 129)!
[ cut here ]
kernel BUG at /build/linux-hvYKKE/linux-4.17.8/mm/usercopy.c:100!
invalid opcode:  [#2] SMP PTI
Modules linked in: xt_tcpudp iptable_filter binfmt_misc zram zsmalloc 
vmw_vsock_vmci_transport vsock ip_tables x_tables xts twofish_x86_64_3way 
twofish_x86_64 twofish_common lrw jfs glue_helper gf128mul dm_crypt dm_mod 
sd_mod evdev vmxnet3 mptsas scsi_transport_sas mptscsih mptbase vmw_vmci 
ata_piix libata scsi_mod button
CPU: 0 PID: 1349 Comm: ls Tainted: G  D   4.17.0-0.bpo.1-amd64 
#1 Debian 4.17.8-1~bpo9+1
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop 
Reference Platform, BIOS 6.00 09/21/2015
RIP: 0010:usercopy_abort+0x69/0x80
RSP: 0018:b84e40e2fe18 EFLAGS: 00010286
RAX: 0063 RBX: 0081 RCX: 
RDX:  RSI: 9786ffc16738 RDI: 9786ffc16738
RBP: 0081 R08:  R09: 042e
R10: 9c68af71 R11: 323120657a697320 R12: 0001
R13: 9786f93146a1 R14: 0082 R15: 559dd2edb170
FS:  7fe8f13733c0() GS:9786ffc0() 
knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 559dd2edb088 CR3: 3d104002 CR4: 003606f0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
 __check_heap_object+0xeb/0x120
 __check_object_size+0xb8/0x1a0
 readlink_copy+0x3e/0x60
 vfs_readlink+0x60/0x120
 do_readlinkat+0xf9/0x120
 __x64_sys_readlink+0x1b/0x20
 do_syscall_64+0x55/0x110
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7fe8f0c6fe47
RSP: 002b:7ffe94d04528 EFLAGS: 0202 ORIG_RAX: 0059
RAX: ffda RBX: 0082 RCX: 7fe8f0c6fe47
RDX: 0082 RSI: 559dd2edb170 RDI: 7ffe94d04570
RBP: 559dd2edb170 R08: 0003 R09: 0090
R10:  R11: 0202 R12: 7ffe94d04570
R13: 7ffe94d04570 R14: 3fff R15: 7ffe
Code: 0f 44 d0 53 48 c7 c0 58 05 65 9c 51 48 c7 c6 12 f9 63 9c 41 53 48 89 
f9 48 0f 45 f0 4c 89 d2 48 c7 c7 40 06 65 9c e8 05 97 e9 ff <0f> 0b 49 c7 
c1 03 09 66 9c 4d 89 cb 4d 89 c8 eb a5 66 0f 1f 44 
RIP: usercopy_abort+0x69/0x80 RSP: b84e40e2fe18
---[ end trace dbb1a6dfa1411527 ]---


-- 
BOFH excuse #404:

Sysadmin accidentally destroyed pager with a large hammer.

Re: [PATCH 00/25] staging: erofs: introduce erofs file system

2018-07-26 Thread Christian Kujau

On Thu, 26 Jul 2018, Gao Xiang wrote:
> EROFS file system is a read-only file system with compression
> support designed for certain devices (especially embeded
> devices) with very limited physical memory and lots of memory

Out of curiousity, and as Richard already asked[0] - what about existing 
file system, why can't they be used or extended instead of introducing yet 
another file system into the kernel? JFFS2? UBIFS? CramFs? SquashFS? 
ROMFS? F2FS? YAFFS?

Christian.

[0] https://marc.info/?l=linux-kernel=152783930418348=2
-- 
BOFH excuse #247:

Due to Federal Budget problems we have been forced to cut back on the number of 
users able to access the system at one time. (namely none allowed)

Re: [PATCH 00/25] staging: erofs: introduce erofs file system

2018-07-26 Thread Christian Kujau

On Thu, 26 Jul 2018, Gao Xiang wrote:
> EROFS file system is a read-only file system with compression
> support designed for certain devices (especially embeded
> devices) with very limited physical memory and lots of memory

Out of curiousity, and as Richard already asked[0] - what about existing 
file system, why can't they be used or extended instead of introducing yet 
another file system into the kernel? JFFS2? UBIFS? CramFs? SquashFS? 
ROMFS? F2FS? YAFFS?

Christian.

[0] https://marc.info/?l=linux-kernel=152783930418348=2
-- 
BOFH excuse #247:

Due to Federal Budget problems we have been forced to cut back on the number of 
users able to access the system at one time. (namely none allowed)

Re: 4.15-rc6+ hang

2018-01-07 Thread Christian Kujau

On Thu, 4 Jan 2018, Tom Hromatka wrote:
> > > [0.00] [ cut here ]
> > > [0.00] XSAVE consistency problem, dumping leaves
> > I think this is a vbox issue, with virtualbox not exposing all the
> > xsave state, so that when the kernel adds up the xsave areas, the end
> > result doesn't match what the total size is reported to be.
> 
> It seems probable that this is a VirtualBox issue.  I was
> able to boot my exact 4.15-rc6+ kernel in qemu-kvm v1.5.3
> just fine.

This was discussed on vbox-dev back in May 2017 (see the whole thread for 
more details):

 https://www.virtualbox.org/pipermail/vbox-dev/2017-May/014466.html

Does that help?

Christian.
-- 
BOFH excuse #9:

doppler effect

Re: 4.15-rc6+ hang

2018-01-07 Thread Christian Kujau

On Thu, 4 Jan 2018, Tom Hromatka wrote:
> > > [0.00] [ cut here ]
> > > [0.00] XSAVE consistency problem, dumping leaves
> > I think this is a vbox issue, with virtualbox not exposing all the
> > xsave state, so that when the kernel adds up the xsave areas, the end
> > result doesn't match what the total size is reported to be.
> 
> It seems probable that this is a VirtualBox issue.  I was
> able to boot my exact 4.15-rc6+ kernel in qemu-kvm v1.5.3
> just fine.

This was discussed on vbox-dev back in May 2017 (see the whole thread for 
more details):

 https://www.virtualbox.org/pipermail/vbox-dev/2017-May/014466.html

Does that help?

Christian.
-- 
BOFH excuse #9:

doppler effect

WARNING: CPU: 1 PID: 1384 at lib/iov_iter.c:695 copy_page_to_iter+0x240/0x3b0

2017-12-24 Thread Christian Kujau

Hi,

this just happened on an i686 machine of mine:

[ cut here ]
WARNING: CPU: 1 PID: 1384 at lib/iov_iter.c:695 copy_page_to_iter+0x240/0x3b0
Modules linked in: xfs algif_skcipher af_alg uas nfsv4 dns_resolver nfs 
nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_meta 
nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4 lz4 nf_defrag_ipv4 
lz4_compress nft_ct nf_conntrack libcrc32c crc32c_generic nft_set_hash 
nf_tables_inet nf_tables_ipv6 nf_tables_ipv4 nf_tables nfnetlink 
cpufreq_conservative sch_fq_codel zram evdev tg3 ptp pps_core lpc_ich 
libphy input_leds ideapad_laptop sparse_keymap wmi serpent_sse2_i586 
thermal serpent_generic lrw glue_helper ablk_helper cryptd video xts 
dm_crypt acpi_cpufreq arc4 iTCO_wdt iTCO_vendor_support i2c_i801 fscache 
loop coretemp battery b43 bcma mac80211 cfg80211 dm_mod dax ssb mmc_core 
rfkill led_class rng_core pcmcia pcmcia_core nfsd auth_rpcgss oid_registry 
ac nfs_acl lockd grace
 sunrpc usb_storage sd_mod atkbd libps2 uhci_hcd ata_piix libata scsi_mod 
ehci_pci ehci_hcd usbcore usb_common i8042 serio jfs [last unloaded: 
soundcore]
CPU: 1 PID: 1384 Comm: java Not tainted 4.14.4-1.0-ARCH #1
Hardware name: LENOVO   Lenovo  /Mariana , BIOS 
14CN94WW   06/29/2009
task: f27c1380 task.stack: f1c64000
EIP: copy_page_to_iter+0x240/0x3b0
EFLAGS: 00010286 CPU: 1
EAX: 1000 EBX: ffb48000 ECX: 02c0 EDX: 8001006c
ESI: f67ecb60 EDI: 0d40 EBP: f1c65e30 ESP: f1c65e08
 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
CR0: 80050033 CR2: 09f060cc CR3: 3260f000 CR4: 06d0
Call Trace:
 ? touch_atime+0x2b/0xb0
 generic_file_read_iter+0x458/0x8c0
 ? xfs_ilock+0x10d/0x150 [xfs]
 ? xfs_file_buffered_aio_read+0xed/0x100 [xfs]
 xfs_file_buffered_aio_read+0x4e/0x100 [xfs]
 ? set_next_entity+0x13f/0x8b0
 xfs_file_read_iter+0x54/0xc0 [xfs]
 __vfs_read+0xe7/0x140
 vfs_read+0x7b/0x130
 SyS_pread64+0x81/0xb0
 do_fast_syscall_32+0x71/0x1d0
 entry_SYSENTER_32+0x4e/0x7c
EIP: 0xb7f69cd9
EFLAGS: 0293 CPU: 1
EAX: ffda EBX: 009b ECX: 23c77f10 EDX: 001a
ESI: 156172c0 EDI:  EBP: b7f50e70 ESP: 1f32ea60
 DS: 007b ES: 007b FS:  GS: 0033 SS: 007b
Code: 75 ec e9 ff fe ff ff 8d 74 26 00 8b 55 ec 8b 45 08 85 d2 8b 58 0c 74 
09 e8 6e d7 ff ff 84 c0 75 1a 31 f6 e9 13 ff ff ff 8d 76 00 <0f> ff 31 f6 
83 c4 1c 5b 89 f0 5e 5f 5d c3 66 90 8b 7d 08 8b 45
---[ end trace 0002deba6d00a28c ]---



This i686 laptop is running 4.14.4-1.0-ARCH [0] and is usually running 
just fine, although memory pressure is usually quite hight due to some 
Java program running on that machine. For some reason the system was even 
more busy today, commands would take a long time to complete and I 
rebooted the machine. Shortly after boot (and after starting this Java 
program again), the warning above happened.

I couldn't find this exact message in the archives, the closest thing I 
found was (mentioning that "EIP:copy_page_to_iter" message):
 
 > 4879b7ae05 ("Merge tag 'dmaengine-4.12-rc1' of .."): WARNING: kernel 
 > stack regs at bd92bc2e in 01-cpu-hotplug:3811 has bad 'bp' value 01be
 > https://patchwork.kernel.org/patch/9981273/

The XFS file system is mounted with:

 > XFS (dm-2): EXPERIMENTAL reverse mapping btree feature enabled. Use at your 
 > own risk!
 > XFS (dm-2): EXPERIMENTAL reflink feature enabled. Use at your own risk!

But I did not experience any problems with that, yet :)

Full dmesg & .config: http://nerdbynature.de/bits/4.14/

Any pointers?

Thanks,
Christian.

$ mount | grep xfs
/dev/mapper/opt on /opt type xfs 
(rw,nosuid,nodev,relatime,attr2,inode64,noquota)

$ xfs_info /opt/
meta-data=/dev/mapper/optisize=512agcount=4, agsize=9079797 
blks
 =   sectsz=512   attr=2, projid32bit=1
 =   crc=1finobt=1 spinodes=0 rmapbt=1
 =   reflink=1
data =   bsize=4096   blocks=36319185, imaxpct=25
 =   sunit=0  swidth=0 blks
naming   =version 2  bsize=4096   ascii-ci=0 ftype=1
log  =internal   bsize=4096   blocks=17733, version=2
 =   sectsz=512   sunit=0 blks, lazy-count=1
realtime =none   extsz=4096   blocks=0, rtextents=0


[0] https://mirror.archlinux32.org/i686/core/linux-4.14.4-1.0-i686.pkg.tar.xz
-- 
BOFH excuse #413:

Cow-tippers tipped a cow onto the server.

WARNING: CPU: 1 PID: 1384 at lib/iov_iter.c:695 copy_page_to_iter+0x240/0x3b0

2017-12-24 Thread Christian Kujau

Hi,

this just happened on an i686 machine of mine:

[ cut here ]
WARNING: CPU: 1 PID: 1384 at lib/iov_iter.c:695 copy_page_to_iter+0x240/0x3b0
Modules linked in: xfs algif_skcipher af_alg uas nfsv4 dns_resolver nfs 
nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_meta 
nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4 lz4 nf_defrag_ipv4 
lz4_compress nft_ct nf_conntrack libcrc32c crc32c_generic nft_set_hash 
nf_tables_inet nf_tables_ipv6 nf_tables_ipv4 nf_tables nfnetlink 
cpufreq_conservative sch_fq_codel zram evdev tg3 ptp pps_core lpc_ich 
libphy input_leds ideapad_laptop sparse_keymap wmi serpent_sse2_i586 
thermal serpent_generic lrw glue_helper ablk_helper cryptd video xts 
dm_crypt acpi_cpufreq arc4 iTCO_wdt iTCO_vendor_support i2c_i801 fscache 
loop coretemp battery b43 bcma mac80211 cfg80211 dm_mod dax ssb mmc_core 
rfkill led_class rng_core pcmcia pcmcia_core nfsd auth_rpcgss oid_registry 
ac nfs_acl lockd grace
 sunrpc usb_storage sd_mod atkbd libps2 uhci_hcd ata_piix libata scsi_mod 
ehci_pci ehci_hcd usbcore usb_common i8042 serio jfs [last unloaded: 
soundcore]
CPU: 1 PID: 1384 Comm: java Not tainted 4.14.4-1.0-ARCH #1
Hardware name: LENOVO   Lenovo  /Mariana , BIOS 
14CN94WW   06/29/2009
task: f27c1380 task.stack: f1c64000
EIP: copy_page_to_iter+0x240/0x3b0
EFLAGS: 00010286 CPU: 1
EAX: 1000 EBX: ffb48000 ECX: 02c0 EDX: 8001006c
ESI: f67ecb60 EDI: 0d40 EBP: f1c65e30 ESP: f1c65e08
 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
CR0: 80050033 CR2: 09f060cc CR3: 3260f000 CR4: 06d0
Call Trace:
 ? touch_atime+0x2b/0xb0
 generic_file_read_iter+0x458/0x8c0
 ? xfs_ilock+0x10d/0x150 [xfs]
 ? xfs_file_buffered_aio_read+0xed/0x100 [xfs]
 xfs_file_buffered_aio_read+0x4e/0x100 [xfs]
 ? set_next_entity+0x13f/0x8b0
 xfs_file_read_iter+0x54/0xc0 [xfs]
 __vfs_read+0xe7/0x140
 vfs_read+0x7b/0x130
 SyS_pread64+0x81/0xb0
 do_fast_syscall_32+0x71/0x1d0
 entry_SYSENTER_32+0x4e/0x7c
EIP: 0xb7f69cd9
EFLAGS: 0293 CPU: 1
EAX: ffda EBX: 009b ECX: 23c77f10 EDX: 001a
ESI: 156172c0 EDI:  EBP: b7f50e70 ESP: 1f32ea60
 DS: 007b ES: 007b FS:  GS: 0033 SS: 007b
Code: 75 ec e9 ff fe ff ff 8d 74 26 00 8b 55 ec 8b 45 08 85 d2 8b 58 0c 74 
09 e8 6e d7 ff ff 84 c0 75 1a 31 f6 e9 13 ff ff ff 8d 76 00 <0f> ff 31 f6 
83 c4 1c 5b 89 f0 5e 5f 5d c3 66 90 8b 7d 08 8b 45
---[ end trace 0002deba6d00a28c ]---



This i686 laptop is running 4.14.4-1.0-ARCH [0] and is usually running 
just fine, although memory pressure is usually quite hight due to some 
Java program running on that machine. For some reason the system was even 
more busy today, commands would take a long time to complete and I 
rebooted the machine. Shortly after boot (and after starting this Java 
program again), the warning above happened.

I couldn't find this exact message in the archives, the closest thing I 
found was (mentioning that "EIP:copy_page_to_iter" message):
 
 > 4879b7ae05 ("Merge tag 'dmaengine-4.12-rc1' of .."): WARNING: kernel 
 > stack regs at bd92bc2e in 01-cpu-hotplug:3811 has bad 'bp' value 01be
 > https://patchwork.kernel.org/patch/9981273/

The XFS file system is mounted with:

 > XFS (dm-2): EXPERIMENTAL reverse mapping btree feature enabled. Use at your 
 > own risk!
 > XFS (dm-2): EXPERIMENTAL reflink feature enabled. Use at your own risk!

But I did not experience any problems with that, yet :)

Full dmesg & .config: http://nerdbynature.de/bits/4.14/

Any pointers?

Thanks,
Christian.

$ mount | grep xfs
/dev/mapper/opt on /opt type xfs 
(rw,nosuid,nodev,relatime,attr2,inode64,noquota)

$ xfs_info /opt/
meta-data=/dev/mapper/optisize=512agcount=4, agsize=9079797 
blks
 =   sectsz=512   attr=2, projid32bit=1
 =   crc=1finobt=1 spinodes=0 rmapbt=1
 =   reflink=1
data =   bsize=4096   blocks=36319185, imaxpct=25
 =   sunit=0  swidth=0 blks
naming   =version 2  bsize=4096   ascii-ci=0 ftype=1
log  =internal   bsize=4096   blocks=17733, version=2
 =   sectsz=512   sunit=0 blks, lazy-count=1
realtime =none   extsz=4096   blocks=0, rtextents=0


[0] https://mirror.archlinux32.org/i686/core/linux-4.14.4-1.0-i686.pkg.tar.xz
-- 
BOFH excuse #413:

Cow-tippers tipped a cow onto the server.

Re: swap_info_get: Bad swap offset entry 0200f8a7

2017-10-20 Thread Christian Kujau

On Fri, 20 Oct 2017, huang ying wrote:
> >   4 May  < Linux version 4.11.2-1-ARCH
> >   4 Jun  < Linux version 4.11.3-1-ARCH
> >   7 Jul  < Linux version 4.11.9-1-ARCH
> >   4 Aug  < Linux version 4.12.8-2-ARCH
> >  24 Sep  < Linux version 4.12.13-1-ARCH
> > 158 Oct  < Linux version 4.13.5-1-ARCH
> 
> So you have never seen this before 4.11 like 4.10?

Unfortunately the kernel logs for that machine only go back until May 
2017 and I cannot tell if that hasn't happened before. I've seen these 
messages appear since then but didn't bother much. But as it now happens 
more frequently, I thought I should mention this to the list.

> Which operations will trigger this error messages?

I'm not able to reproduce it at will, but I suspect that memory pressure 
triggers these messages. The machine in question is an Lenovo Ideapad S10 
notebook running 24x7 and is equipped with 1 GB of RAM. Two Java processes 
are basically using up all the memory, so usually it tooks like this:

$ free -m
   total   used  free   shared  buff/cache available
Mem: 99486667   1   6020
Swap:760437   322

$ zramctl 
NAME   ALGORITHM DISKSIZE  DATA COMPR TOTAL STREAMS MOUNTPOINT
/dev/zram0 lz4 248.7M  247M 92.3M 97.4M   2 [SWAP]

I just assumed the message is triggered when the system is really low on 
memory and maybe zram is too slow to provide the memory requested. But 
that's just my layman's assumption :-) For example, today's message was 
emitted during the night:

Oct 20 01:26:18 len kernel: [638973.207849] \
   swap_info_get: Bad swap offset entry 0200f8a7

And here are the sysstat numbers for that time frame:

$ sar -r -s 00:00 -e 02:00
Linux 4.13.5-1-ARCH (len)   10/20/2017  _i686_  (2 CPU)
12:00:01 AM kbmemfree kbmemused  %memused kbbuffers  kbcached  kbcommit   
%commit  kbactive   kbinact   kbdirty
12:10:06 AM 70076948404 93.12 4 19004   1556176 
86.58376608379408   220
12:20:02 AM 80488937992 92.10 4180404   1563952 
87.01380184327736  5568
12:30:03 AM 83296935184 91.82 4137260   1569776 
87.3432951233   280
12:40:03 AM 65188953292 93.60 4 21156   1571048 
87.41386644389820  1144
12:50:03 AM 67512950968 93.37 4 33452   1570628 
87.38378936381580  1304
01:00:07 AM 65520952960 93.57 4 24996   1573180 
87.53385396386152   904
01:10:03 AM 66956951524 93.43 4 35520   1572696 
87.50379548379364   172
01:20:02 AM 67440951040 93.38 4 88736   1569864 
87.34381764370472  7080
01:30:03 AM 70048948432 93.12 4 29212   1572504 
87.49383516381900  1832
01:40:04 AM 71532946948 92.98 4 29220   1570096 
87.35380120380284  1000
01:50:03 AM 65828952652 93.54 4 34408   1570604 
87.38381040381028  1604
Average:70353948127 93.09 4 57579   1569139 
87.30376661371613  1919

 == If that is unreadable, here it is again: https://paste.debian.net/991927/

> Is it possible for you to check
> whether the error exists for normal swap device (not ZRAM)?

I have "normal" (but encrpted) swap configured but with a lower priority:

cat /proc/swaps 
FilenameTypeSizeUsedPriority
/dev/dm-0   partition   524284  194348  0
/dev/zram0  partition   254616  253536  32767

I shall disable the zram device and disable encryption too and will report 
back if the message appears again.

> 32bit or 64bit kernel do you use?

I'm using an i686 kernel for this Atom N270 processor (with HT enabled).

Thanks for your response,
Christian.
-- 
BOFH excuse #403:

Sysadmin didn't hear pager go off due to loud music from bar-room speakers.

Re: swap_info_get: Bad swap offset entry 0200f8a7

2017-10-20 Thread Christian Kujau

On Fri, 20 Oct 2017, huang ying wrote:
> >   4 May  < Linux version 4.11.2-1-ARCH
> >   4 Jun  < Linux version 4.11.3-1-ARCH
> >   7 Jul  < Linux version 4.11.9-1-ARCH
> >   4 Aug  < Linux version 4.12.8-2-ARCH
> >  24 Sep  < Linux version 4.12.13-1-ARCH
> > 158 Oct  < Linux version 4.13.5-1-ARCH
> 
> So you have never seen this before 4.11 like 4.10?

Unfortunately the kernel logs for that machine only go back until May 
2017 and I cannot tell if that hasn't happened before. I've seen these 
messages appear since then but didn't bother much. But as it now happens 
more frequently, I thought I should mention this to the list.

> Which operations will trigger this error messages?

I'm not able to reproduce it at will, but I suspect that memory pressure 
triggers these messages. The machine in question is an Lenovo Ideapad S10 
notebook running 24x7 and is equipped with 1 GB of RAM. Two Java processes 
are basically using up all the memory, so usually it tooks like this:

$ free -m
   total   used  free   shared  buff/cache available
Mem: 99486667   1   6020
Swap:760437   322

$ zramctl 
NAME   ALGORITHM DISKSIZE  DATA COMPR TOTAL STREAMS MOUNTPOINT
/dev/zram0 lz4 248.7M  247M 92.3M 97.4M   2 [SWAP]

I just assumed the message is triggered when the system is really low on 
memory and maybe zram is too slow to provide the memory requested. But 
that's just my layman's assumption :-) For example, today's message was 
emitted during the night:

Oct 20 01:26:18 len kernel: [638973.207849] \
   swap_info_get: Bad swap offset entry 0200f8a7

And here are the sysstat numbers for that time frame:

$ sar -r -s 00:00 -e 02:00
Linux 4.13.5-1-ARCH (len)   10/20/2017  _i686_  (2 CPU)
12:00:01 AM kbmemfree kbmemused  %memused kbbuffers  kbcached  kbcommit   
%commit  kbactive   kbinact   kbdirty
12:10:06 AM 70076948404 93.12 4 19004   1556176 
86.58376608379408   220
12:20:02 AM 80488937992 92.10 4180404   1563952 
87.01380184327736  5568
12:30:03 AM 83296935184 91.82 4137260   1569776 
87.3432951233   280
12:40:03 AM 65188953292 93.60 4 21156   1571048 
87.41386644389820  1144
12:50:03 AM 67512950968 93.37 4 33452   1570628 
87.38378936381580  1304
01:00:07 AM 65520952960 93.57 4 24996   1573180 
87.53385396386152   904
01:10:03 AM 66956951524 93.43 4 35520   1572696 
87.50379548379364   172
01:20:02 AM 67440951040 93.38 4 88736   1569864 
87.34381764370472  7080
01:30:03 AM 70048948432 93.12 4 29212   1572504 
87.49383516381900  1832
01:40:04 AM 71532946948 92.98 4 29220   1570096 
87.35380120380284  1000
01:50:03 AM 65828952652 93.54 4 34408   1570604 
87.38381040381028  1604
Average:70353948127 93.09 4 57579   1569139 
87.30376661371613  1919

 == If that is unreadable, here it is again: https://paste.debian.net/991927/

> Is it possible for you to check
> whether the error exists for normal swap device (not ZRAM)?

I have "normal" (but encrpted) swap configured but with a lower priority:

cat /proc/swaps 
FilenameTypeSizeUsedPriority
/dev/dm-0   partition   524284  194348  0
/dev/zram0  partition   254616  253536  32767

I shall disable the zram device and disable encryption too and will report 
back if the message appears again.

> 32bit or 64bit kernel do you use?

I'm using an i686 kernel for this Atom N270 processor (with HT enabled).

Thanks for your response,
Christian.
-- 
BOFH excuse #403:

Sysadmin didn't hear pager go off due to loud music from bar-room speakers.

swap_info_get: Bad swap offset entry 0200f8a7

2017-10-15 Thread Christian Kujau

Hi,

every now and then (and more frequently now) I receive the following 
message on this Atom N270 netbook:

  swap_info_get: Bad swap offset entry 0200f8a7

This started to show up a few months ago but appears to happen more 
frequently now:

  4 May  < Linux version 4.11.2-1-ARCH
  4 Jun  < Linux version 4.11.3-1-ARCH
  7 Jul  < Linux version 4.11.9-1-ARCH
  4 Aug  < Linux version 4.12.8-2-ARCH
 24 Sep  < Linux version 4.12.13-1-ARCH
158 Oct  < Linux version 4.13.5-1-ARCH

I've only found (very) old reports for this[0][2] with either no 
solution[1] or some hinting that this may be caused by hardware errors.

In my case howerver no kernel BUG messages or oopses are involved and no
PTE errors are logged. The machine appears to be very stable, although
memory usage is quite high on that machine (but no OOM situations so
far either). As the machine is only equipped with 1GB of RAM, I'm
using ZRAM on this system, which usually looks something like this:

  $ zramctl 
  NAME   ALGORITHM DISKSIZE   DATA COMPR TOTAL STREAMS MOUNTPOINT
  /dev/zram0 lz4 248.7M 195.7M   74M 78.7M   2 [SWAP]

I suspect that, when memory pressure is high, zram may not be quick enough 
to decompress a page leading to these messages, but then I'd have expected 
a zram error message too.

Can anybody comment on these messages? If they're really indicating a 
hardware error, shouldn't there be other messages too? So far, rasdaemon 
has not logged any errors.

Thanks,
Christian.

[0] http://lkml.iu.edu/hypermail/linux/kernel/0204.3/0165.html
[1] https://bugzilla.redhat.com/show_bug.cgi?id=432337
[2] https://access.redhat.com/solutions/218733
-- 
BOFH excuse #323:

Your processor has processed too many instructions.  Turn it off immediately, 
do not type any commands!!

swap_info_get: Bad swap offset entry 0200f8a7

2017-10-15 Thread Christian Kujau

Hi,

every now and then (and more frequently now) I receive the following 
message on this Atom N270 netbook:

  swap_info_get: Bad swap offset entry 0200f8a7

This started to show up a few months ago but appears to happen more 
frequently now:

  4 May  < Linux version 4.11.2-1-ARCH
  4 Jun  < Linux version 4.11.3-1-ARCH
  7 Jul  < Linux version 4.11.9-1-ARCH
  4 Aug  < Linux version 4.12.8-2-ARCH
 24 Sep  < Linux version 4.12.13-1-ARCH
158 Oct  < Linux version 4.13.5-1-ARCH

I've only found (very) old reports for this[0][2] with either no 
solution[1] or some hinting that this may be caused by hardware errors.

In my case howerver no kernel BUG messages or oopses are involved and no
PTE errors are logged. The machine appears to be very stable, although
memory usage is quite high on that machine (but no OOM situations so
far either). As the machine is only equipped with 1GB of RAM, I'm
using ZRAM on this system, which usually looks something like this:

  $ zramctl 
  NAME   ALGORITHM DISKSIZE   DATA COMPR TOTAL STREAMS MOUNTPOINT
  /dev/zram0 lz4 248.7M 195.7M   74M 78.7M   2 [SWAP]

I suspect that, when memory pressure is high, zram may not be quick enough 
to decompress a page leading to these messages, but then I'd have expected 
a zram error message too.

Can anybody comment on these messages? If they're really indicating a 
hardware error, shouldn't there be other messages too? So far, rasdaemon 
has not logged any errors.

Thanks,
Christian.

[0] http://lkml.iu.edu/hypermail/linux/kernel/0204.3/0165.html
[1] https://bugzilla.redhat.com/show_bug.cgi?id=432337
[2] https://access.redhat.com/solutions/218733
-- 
BOFH excuse #323:

Your processor has processed too many instructions.  Turn it off immediately, 
do not type any commands!!

Re: [Kernel.org Helpdesk #40777] Re: Linux 4.12-rc1 (file locations)

2017-05-24 Thread Christian Kujau via RT

On Mon, 15 May 2017, Konstantin Ryabitsev via RT wrote:
> On 2017-05-15 14:34:56, francoisvalen...@gmail.com wrote:
> > It doesn't work with Firefox-53.0. After quite a long time while
> > firefox
> > uses 100% of CPU, I finally get a text file and not a gzip file of the
> > patch for 4.12-rc1. It was almost instantaneous previously. I don't
> > see
> > this as a progress.
> 
> Firefox will request a gzip version of the patch, download it and then ungzip 
> it for you and display it in the browser. If you'd rather not display 
> that, please use a commandline tool like wget or curl to get the patch.

Yeah, same here: clicking on 4.12-rc2/patch on the kernel.org main 
page makes Firefox 53 freeze for a few minutes, and then display (!) the 
text file (85 MB!) in full. Wow.

> We are trying to identify who are the people who still need to download
> patches as opposed to using git directly, and what their use-case 

I never use the links on the kernel.org main page to download patches, but 
still: can those be changed to say ".gz" or something, so that $browser 
won't _display_ it by default but _download_ it instead?

Thanks,
Christian.
-- 
BOFH excuse #435:

Internet shut down due to maintenance

Re: [Kernel.org Helpdesk #40777] Re: Linux 4.12-rc1 (file locations)

2017-05-24 Thread Christian Kujau via RT

On Mon, 15 May 2017, Konstantin Ryabitsev via RT wrote:
> On 2017-05-15 14:34:56, francoisvalen...@gmail.com wrote:
> > It doesn't work with Firefox-53.0. After quite a long time while
> > firefox
> > uses 100% of CPU, I finally get a text file and not a gzip file of the
> > patch for 4.12-rc1. It was almost instantaneous previously. I don't
> > see
> > this as a progress.
> 
> Firefox will request a gzip version of the patch, download it and then ungzip 
> it for you and display it in the browser. If you'd rather not display 
> that, please use a commandline tool like wget or curl to get the patch.

Yeah, same here: clicking on 4.12-rc2/patch on the kernel.org main 
page makes Firefox 53 freeze for a few minutes, and then display (!) the 
text file (85 MB!) in full. Wow.

> We are trying to identify who are the people who still need to download
> patches as opposed to using git directly, and what their use-case 

I never use the links on the kernel.org main page to download patches, but 
still: can those be changed to say ".gz" or something, so that $browser 
won't _display_ it by default but _download_ it instead?

Thanks,
Christian.
-- 
BOFH excuse #435:

Internet shut down due to maintenance

Re: [Kernel.org Helpdesk #40777] Re: Linux 4.12-rc1 (file locations)

2017-05-24 Thread Christian Kujau

On Mon, 15 May 2017, Konstantin Ryabitsev via RT wrote:
> On 2017-05-15 14:34:56, francoisvalen...@gmail.com wrote:
> > It doesn't work with Firefox-53.0. After quite a long time while
> > firefox
> > uses 100% of CPU, I finally get a text file and not a gzip file of the
> > patch for 4.12-rc1. It was almost instantaneous previously. I don't
> > see
> > this as a progress.
> 
> Firefox will request a gzip version of the patch, download it and then ungzip 
> it for you and display it in the browser. If you'd rather not display 
> that, please use a commandline tool like wget or curl to get the patch.

Yeah, same here: clicking on 4.12-rc2/patch on the kernel.org main 
page makes Firefox 53 freeze for a few minutes, and then display (!) the 
text file (85 MB!) in full. Wow.

> We are trying to identify who are the people who still need to download
> patches as opposed to using git directly, and what their use-case 

I never use the links on the kernel.org main page to download patches, but 
still: can those be changed to say ".gz" or something, so that $browser 
won't _display_ it by default but _download_ it instead?

Thanks,
Christian.
-- 
BOFH excuse #435:

Internet shut down due to maintenance

Re: [Kernel.org Helpdesk #40777] Re: Linux 4.12-rc1 (file locations)

2017-05-24 Thread Christian Kujau

On Mon, 15 May 2017, Konstantin Ryabitsev via RT wrote:
> On 2017-05-15 14:34:56, francoisvalen...@gmail.com wrote:
> > It doesn't work with Firefox-53.0. After quite a long time while
> > firefox
> > uses 100% of CPU, I finally get a text file and not a gzip file of the
> > patch for 4.12-rc1. It was almost instantaneous previously. I don't
> > see
> > this as a progress.
> 
> Firefox will request a gzip version of the patch, download it and then ungzip 
> it for you and display it in the browser. If you'd rather not display 
> that, please use a commandline tool like wget or curl to get the patch.

Yeah, same here: clicking on 4.12-rc2/patch on the kernel.org main 
page makes Firefox 53 freeze for a few minutes, and then display (!) the 
text file (85 MB!) in full. Wow.

> We are trying to identify who are the people who still need to download
> patches as opposed to using git directly, and what their use-case 

I never use the links on the kernel.org main page to download patches, but 
still: can those be changed to say ".gz" or something, so that $browser 
won't _display_ it by default but _download_ it instead?

Thanks,
Christian.
-- 
BOFH excuse #435:

Internet shut down due to maintenance

Re: [PATCH v4 2/2] procfs/tasks: add a simple per-task procfs hidepid= field

2017-02-13 Thread Christian Kujau

On Mon, 13 Feb 2017, Kees Cook wrote:
> Okay, cool. Thanks. (Also, where does "setpriv" live? I must need a
> new set of util-linux or something?)

Indeed, a newer version of util-linux[0] should do, although 
Debian/testing appears to have an extra package just for "setpriv":

  https://packages.debian.org/stretch/setpriv

C.

[0] 
https://git.kernel.org/cgit/utils/util-linux/util-linux.git/commit/?id=5600c40
-- 
BOFH excuse #65:

system needs to be rebooted

Re: [PATCH v4 2/2] procfs/tasks: add a simple per-task procfs hidepid= field

2017-02-13 Thread Christian Kujau

On Mon, 13 Feb 2017, Kees Cook wrote:
> Okay, cool. Thanks. (Also, where does "setpriv" live? I must need a
> new set of util-linux or something?)

Indeed, a newer version of util-linux[0] should do, although 
Debian/testing appears to have an extra package just for "setpriv":

  https://packages.debian.org/stretch/setpriv

C.

[0] 
https://git.kernel.org/cgit/utils/util-linux/util-linux.git/commit/?id=5600c40
-- 
BOFH excuse #65:

system needs to be rebooted

alg: comp: Compression test 1 failed for lz4

2017-02-11 Thread Christian Kujau

Hi,

while the LZ4 and LZ4HC module appears to be available on PowerPC 32-bit 
(it's a PowerBook G4), I get these warnings below during module load. The 
lzo module is working just fine and I'm using it extensively for ZRAM on 
that machine.

Is this a configuration[0] error or is LZ4 just not supported for this 
architecture?

Thanks,
Christian.

[0] http://nerdbynature.de/bits/4.10-rc7/config_4.10-rc7.txt - but
I get the same messages with a stock Debian/4.9 configuration too.


$ modprobe lz4
alg: comp: Compression test 1 failed for lz4-generic
: f0 10 4a 6f 69 6e 20 75 73 20 6e 6f 77 20 61 6e
0010: 64 20 73 68 61 72 65 20 74 68 65 20 73 6f 66 74
0020: 77 00 0d 0f 00 23 0b 50 77 61 72 65 20
alg: acomp: Compression test 1 failed for lz4-scomp
: f0 10 4a 6f 69 6e 20 75 73 20 6e 6f 77 20 61 6e
0010: 64 20 73 68 61 72 65 20 74 68 65 20 73 6f 66 74
0020: 77 00 0d 0f 00 23 0b 50 77 61 72 65 20


$ modprobe lz4hc
alg: comp: Compression test 1 failed for lz4hc-generic
: f0 10 4a 6f 69 6e 20 75 73 20 6e 6f 77 20 61 6e
0010: 64 20 73 68 61 72 65 20 74 68 65 20 73 6f 66 74
0020: 77 00 0d 0f 00 23 0b 50 77 61 72 65 20
alg: acomp: Compression test 1 failed for lz4hc-scomp
: f0 10 4a 6f 69 6e 20 75 73 20 6e 6f 77 20 61 6e
0010: 64 20 73 68 61 72 65 20 74 68 65 20 73 6f 66 74
0020: 77 00 0d 0f 00 23 0b 50 77 61 72 65 20

-- 
BOFH excuse #174:

Backbone adjustment

alg: comp: Compression test 1 failed for lz4

2017-02-11 Thread Christian Kujau

Hi,

while the LZ4 and LZ4HC module appears to be available on PowerPC 32-bit 
(it's a PowerBook G4), I get these warnings below during module load. The 
lzo module is working just fine and I'm using it extensively for ZRAM on 
that machine.

Is this a configuration[0] error or is LZ4 just not supported for this 
architecture?

Thanks,
Christian.

[0] http://nerdbynature.de/bits/4.10-rc7/config_4.10-rc7.txt - but
I get the same messages with a stock Debian/4.9 configuration too.


$ modprobe lz4
alg: comp: Compression test 1 failed for lz4-generic
: f0 10 4a 6f 69 6e 20 75 73 20 6e 6f 77 20 61 6e
0010: 64 20 73 68 61 72 65 20 74 68 65 20 73 6f 66 74
0020: 77 00 0d 0f 00 23 0b 50 77 61 72 65 20
alg: acomp: Compression test 1 failed for lz4-scomp
: f0 10 4a 6f 69 6e 20 75 73 20 6e 6f 77 20 61 6e
0010: 64 20 73 68 61 72 65 20 74 68 65 20 73 6f 66 74
0020: 77 00 0d 0f 00 23 0b 50 77 61 72 65 20


$ modprobe lz4hc
alg: comp: Compression test 1 failed for lz4hc-generic
: f0 10 4a 6f 69 6e 20 75 73 20 6e 6f 77 20 61 6e
0010: 64 20 73 68 61 72 65 20 74 68 65 20 73 6f 66 74
0020: 77 00 0d 0f 00 23 0b 50 77 61 72 65 20
alg: acomp: Compression test 1 failed for lz4hc-scomp
: f0 10 4a 6f 69 6e 20 75 73 20 6e 6f 77 20 61 6e
0010: 64 20 73 68 61 72 65 20 74 68 65 20 73 6f 66 74
0020: 77 00 0d 0f 00 23 0b 50 77 61 72 65 20

-- 
BOFH excuse #174:

Backbone adjustment

Re: [PATCH RFC] powerpc/32: fix handling of stack protector with recent GCC

2017-01-30 Thread Christian Kujau

On Mon, 16 Jan 2017, Christophe Leroy wrote:
> Since 2005, powerpc GCC doesn't manage anymore __stack_chk_guard as
> a global variable but as some value located at -0x7008(r2)

Is this still an "RFC" or is there a chance that this will land in 4.10?

Thanks,
Christian.

> In the Linux kernel, r2 is used as a pointer to current task struct.
> 
> This patch changes the meaning of r2 when stack protector
> is activated:
> - current is taken from thread_info and not kept in r2 anymore
> - r2 is set to current + offset of stack canary + 0x7008 so
> that -0x7008(r2) directly points to current->stack_canary
> 
> current could have been more efficiently calculated from r2
> but some circular inclusion prevent inserting struct task_struct
> into arch/powerpc/include/asm/current.h so it is not possible
> to get offset of stack_canary within current task_struct from there.
> 
> fixes: 6533b7c16ee57 ("powerpc: Initial stack protector
> (-fstack-protector) support")
> Reported-by: Christian Kujau <li...@nerdbynature.de>
> 
> Signed-off-by: Christophe Leroy <christophe.le...@c-s.fr>
> ---
>  Christian, can you test it ?
> 
>  arch/powerpc/include/asm/current.h| 12 +++-
>  arch/powerpc/include/asm/stackprotector.h | 13 +
>  arch/powerpc/kernel/entry_32.S| 19 +++
>  arch/powerpc/kernel/head_32.S |  7 +++
>  arch/powerpc/kernel/head_40x.S|  4 
>  arch/powerpc/kernel/head_44x.S|  4 
>  arch/powerpc/kernel/head_8xx.S|  4 
>  arch/powerpc/kernel/head_fsl_booke.S  |  7 +++
>  arch/powerpc/kernel/process.c |  6 --
>  9 files changed, 61 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/current.h 
> b/arch/powerpc/include/asm/current.h
> index e2c7f06..2f67f02 100644
> --- a/arch/powerpc/include/asm/current.h
> +++ b/arch/powerpc/include/asm/current.h
> @@ -27,8 +27,16 @@ static inline struct task_struct *get_current(void)
>  }
>  #define current  get_current()
>  
> -#else
> +#else /* __powerpc64__ */
> +#if defined(CONFIG_CC_STACKPROTECTOR)
> +#include 
>  
> +static inline struct task_struct *get_current(void)
> +{
> + return current_thread_info()->task;
> +}
> +#define current  get_current()
> +#else
>  /*
>   * We keep `current' in r2 for speed.
>   */
> @@ -36,5 +44,7 @@ register struct task_struct *current asm ("r2");
>  
>  #endif
>  
> +#endif /* __powerpc64__ */
> +
>  #endif /* __KERNEL__ */
>  #endif /* _ASM_POWERPC_CURRENT_H */
> diff --git a/arch/powerpc/include/asm/stackprotector.h 
> b/arch/powerpc/include/asm/stackprotector.h
> index 6720190..bf30509 100644
> --- a/arch/powerpc/include/asm/stackprotector.h
> +++ b/arch/powerpc/include/asm/stackprotector.h
> @@ -12,12 +12,18 @@
>  #ifndef _ASM_STACKPROTECTOR_H
>  #define _ASM_STACKPROTECTOR_H
>  
> +#ifdef CONFIG_PPC64
> +#define SSP_OFFSET   0x7010
> +#else
> +#define SSP_OFFSET   0x7008
> +#endif
> +
> +#ifndef __ASSEMBLY__
> +
>  #include 
>  #include 
>  #include 
>  
> -extern unsigned long __stack_chk_guard;
> -
>  /*
>   * Initialize the stackprotector canary value.
>   *
> @@ -34,7 +40,6 @@ static __always_inline void boot_init_stack_canary(void)
>   canary ^= LINUX_VERSION_CODE;
>  
>   current->stack_canary = canary;
> - __stack_chk_guard = current->stack_canary;
>  }
> -
> +#endif /* __ASSEMBLY__ */
>  #endif   /* _ASM_STACKPROTECTOR_H */
> diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
> index 5742dbd..b3a363c 100644
> --- a/arch/powerpc/kernel/entry_32.S
> +++ b/arch/powerpc/kernel/entry_32.S
> @@ -34,6 +34,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  /*
>   * MSR_KERNEL is > 0x1 on 4xx/Book-E since it include MSR_CE.
> @@ -149,6 +150,9 @@ transfer_to_handler:
>   mfspr   r12,SPRN_SPRG_THREAD
>   addir2,r12,-THREAD
>   tovirt(r2,r2)   /* set r2 to current */
> +#if defined(CONFIG_CC_STACKPROTECTOR)
> + addir2,r2,TSK_STACK_CANARY+SSP_OFFSET
> +#endif
>   beq 2f  /* if from user, fix up THREAD.regs */
>   addir11,r1,STACK_FRAME_OVERHEAD
>   stw r11,PT_REGS(r12)
> @@ -385,6 +389,9 @@ syscall_exit_cont:
>   lwz r3,GPR3(r1)
>  1:
>  #endif /* CONFIG_TRACE_IRQFLAGS */
> +#if defined(CONFIG_CC_STACKPROTECTOR)
> + subir2,r2,TSK_STACK_CANARY+SSP_OFFSET
> +#endif
>  #if defined(CONFIG_4xx) || defined(CONFIG_BOOKE)
>   /* If the process has its own DBCR0 value, load it u

Re: [PATCH RFC] powerpc/32: fix handling of stack protector with recent GCC

2017-01-30 Thread Christian Kujau

On Mon, 16 Jan 2017, Christophe Leroy wrote:
> Since 2005, powerpc GCC doesn't manage anymore __stack_chk_guard as
> a global variable but as some value located at -0x7008(r2)

Is this still an "RFC" or is there a chance that this will land in 4.10?

Thanks,
Christian.

> In the Linux kernel, r2 is used as a pointer to current task struct.
> 
> This patch changes the meaning of r2 when stack protector
> is activated:
> - current is taken from thread_info and not kept in r2 anymore
> - r2 is set to current + offset of stack canary + 0x7008 so
> that -0x7008(r2) directly points to current->stack_canary
> 
> current could have been more efficiently calculated from r2
> but some circular inclusion prevent inserting struct task_struct
> into arch/powerpc/include/asm/current.h so it is not possible
> to get offset of stack_canary within current task_struct from there.
> 
> fixes: 6533b7c16ee57 ("powerpc: Initial stack protector
> (-fstack-protector) support")
> Reported-by: Christian Kujau 
> 
> Signed-off-by: Christophe Leroy 
> ---
>  Christian, can you test it ?
> 
>  arch/powerpc/include/asm/current.h| 12 +++-
>  arch/powerpc/include/asm/stackprotector.h | 13 +
>  arch/powerpc/kernel/entry_32.S| 19 +++
>  arch/powerpc/kernel/head_32.S |  7 +++
>  arch/powerpc/kernel/head_40x.S|  4 
>  arch/powerpc/kernel/head_44x.S|  4 
>  arch/powerpc/kernel/head_8xx.S|  4 
>  arch/powerpc/kernel/head_fsl_booke.S  |  7 +++
>  arch/powerpc/kernel/process.c |  6 --
>  9 files changed, 61 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/current.h 
> b/arch/powerpc/include/asm/current.h
> index e2c7f06..2f67f02 100644
> --- a/arch/powerpc/include/asm/current.h
> +++ b/arch/powerpc/include/asm/current.h
> @@ -27,8 +27,16 @@ static inline struct task_struct *get_current(void)
>  }
>  #define current  get_current()
>  
> -#else
> +#else /* __powerpc64__ */
> +#if defined(CONFIG_CC_STACKPROTECTOR)
> +#include 
>  
> +static inline struct task_struct *get_current(void)
> +{
> + return current_thread_info()->task;
> +}
> +#define current  get_current()
> +#else
>  /*
>   * We keep `current' in r2 for speed.
>   */
> @@ -36,5 +44,7 @@ register struct task_struct *current asm ("r2");
>  
>  #endif
>  
> +#endif /* __powerpc64__ */
> +
>  #endif /* __KERNEL__ */
>  #endif /* _ASM_POWERPC_CURRENT_H */
> diff --git a/arch/powerpc/include/asm/stackprotector.h 
> b/arch/powerpc/include/asm/stackprotector.h
> index 6720190..bf30509 100644
> --- a/arch/powerpc/include/asm/stackprotector.h
> +++ b/arch/powerpc/include/asm/stackprotector.h
> @@ -12,12 +12,18 @@
>  #ifndef _ASM_STACKPROTECTOR_H
>  #define _ASM_STACKPROTECTOR_H
>  
> +#ifdef CONFIG_PPC64
> +#define SSP_OFFSET   0x7010
> +#else
> +#define SSP_OFFSET   0x7008
> +#endif
> +
> +#ifndef __ASSEMBLY__
> +
>  #include 
>  #include 
>  #include 
>  
> -extern unsigned long __stack_chk_guard;
> -
>  /*
>   * Initialize the stackprotector canary value.
>   *
> @@ -34,7 +40,6 @@ static __always_inline void boot_init_stack_canary(void)
>   canary ^= LINUX_VERSION_CODE;
>  
>   current->stack_canary = canary;
> - __stack_chk_guard = current->stack_canary;
>  }
> -
> +#endif /* __ASSEMBLY__ */
>  #endif   /* _ASM_STACKPROTECTOR_H */
> diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
> index 5742dbd..b3a363c 100644
> --- a/arch/powerpc/kernel/entry_32.S
> +++ b/arch/powerpc/kernel/entry_32.S
> @@ -34,6 +34,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  /*
>   * MSR_KERNEL is > 0x1 on 4xx/Book-E since it include MSR_CE.
> @@ -149,6 +150,9 @@ transfer_to_handler:
>   mfspr   r12,SPRN_SPRG_THREAD
>   addir2,r12,-THREAD
>   tovirt(r2,r2)   /* set r2 to current */
> +#if defined(CONFIG_CC_STACKPROTECTOR)
> + addir2,r2,TSK_STACK_CANARY+SSP_OFFSET
> +#endif
>   beq 2f  /* if from user, fix up THREAD.regs */
>   addir11,r1,STACK_FRAME_OVERHEAD
>   stw r11,PT_REGS(r12)
> @@ -385,6 +389,9 @@ syscall_exit_cont:
>   lwz r3,GPR3(r1)
>  1:
>  #endif /* CONFIG_TRACE_IRQFLAGS */
> +#if defined(CONFIG_CC_STACKPROTECTOR)
> + subir2,r2,TSK_STACK_CANARY+SSP_OFFSET
> +#endif
>  #if defined(CONFIG_4xx) || defined(CONFIG_BOOKE)
>   /* If the process has its own DBCR0 value, load it up.  The internal
>  debug mode bit tells us t

btrfs_alloc_tree_block: Faulting instruction address: 0xc02d4584

2017-01-19 Thread Christian Kujau

Hi,

after upgrading this powerpc32 box from 4.10-rc2 to -rc4, the message 
below occured a few hours after boot. Full dmesg and .config:

  http://nerdbynature.de/bits/4.10-rc4/

Any ideas?

Thanks,
Christian.


Faulting instruction address: 0xc02d4584
Oops: Kernel access of bad area, sig: 11 [#1]
PowerMac
Modules linked in: ecb xt_tcpudp iptable_filter ip_tables x_tables 
nfnetlink_log nfnetlink sha256_generic twofish_generic twofish_common 
usb_storage therm_adt746x loop i2c_powermac arc4 firewire_sbp2 b43 
rng_core ssb bcma mac80211 cfg80211 ecryptfs [last unloaded: nbd]
CPU: 0 PID: 1395 Comm: btrfs-transacti Tainted: GW   
4.10.0-rc4-1-gab8184b #1
task: ee7162e0 task.stack: ee9cc000
NIP: c02d4584 LR: c02d4574 CTR: c00d0df0
REGS: ee9cdaa0 TRAP: 0300   Tainted: GW
(4.10.0-rc4-1-gab8184b)
MSR: 9032 
  CR: 24422248  XER: 
DAR: 10581054 DSISR: 4200 
GPR00: c02d4574 ee9cdb50 ee71d4b8 10581050 0001 dbc88118  0020 
GPR08: 1205 0004   24422444   93c3 
GPR16: f000 0001    ee260800   
GPR24:  0001 ee9cdc1b eef5c1a0 1000 ee47 dbc88118 ee470170 
NIP [c02d4584] btrfs_alloc_tree_block+0x18c/0x5c4
LR [c02d4574] btrfs_alloc_tree_block+0x17c/0x5c4
Call Trace:
[ee9cdb50] [c02d4574] btrfs_alloc_tree_block+0x17c/0x5c4 (unreliable)
[ee9cdbf0] [c02b86d4] __btrfs_cow_block+0x110/0x638
[ee9cdc70] [c02b8d74] btrfs_cow_block+0xdc/0x1b0
[ee9cdca0] [c02bc48c] btrfs_search_slot+0x1c0/0x904
[ee9cdd10] [c02dc680] btrfs_lookup_inode+0x3c/0x124
[ee9cdd50] [c02ec204] btrfs_update_inode_item+0x4c/0x10c
[ee9cdd80] [c02d05e4] cache_save_setup+0xc0/0x400
[ee9cdde0] [c02d4d54] btrfs_start_dirty_block_groups+0x184/0x47c
[ee9cde50] [c02e7e84] btrfs_commit_transaction+0x148/0xac4
[ee9cdeb0] [c02e313c] transaction_kthread+0x1d0/0x1ec
[ee9cdf00] [c004f1fc] kthread+0xf8/0x124
[ee9cdf40] [c0011480] ret_from_kernel_thread+0x5c/0x64
--- interrupt: 0 at   (null)
LR =   (null)
Instruction dump:
4800b3ed 7f838040 7c7e1b78 419d0430 806300d4 81db 81fb0004 4bdfe2b9 
3924 7ee6bb78 38630050 7fc5f378 <7dc91d2c> 7de01d2c 809501cf 807501cb 
---[ end trace 937683537ecd986b ]---



-- 
BOFH excuse #342:

HTTPD Error 4004 : very old Intel cpu - insufficient processing power

btrfs_alloc_tree_block: Faulting instruction address: 0xc02d4584

2017-01-19 Thread Christian Kujau

Hi,

after upgrading this powerpc32 box from 4.10-rc2 to -rc4, the message 
below occured a few hours after boot. Full dmesg and .config:

  http://nerdbynature.de/bits/4.10-rc4/

Any ideas?

Thanks,
Christian.


Faulting instruction address: 0xc02d4584
Oops: Kernel access of bad area, sig: 11 [#1]
PowerMac
Modules linked in: ecb xt_tcpudp iptable_filter ip_tables x_tables 
nfnetlink_log nfnetlink sha256_generic twofish_generic twofish_common 
usb_storage therm_adt746x loop i2c_powermac arc4 firewire_sbp2 b43 
rng_core ssb bcma mac80211 cfg80211 ecryptfs [last unloaded: nbd]
CPU: 0 PID: 1395 Comm: btrfs-transacti Tainted: GW   
4.10.0-rc4-1-gab8184b #1
task: ee7162e0 task.stack: ee9cc000
NIP: c02d4584 LR: c02d4574 CTR: c00d0df0
REGS: ee9cdaa0 TRAP: 0300   Tainted: GW
(4.10.0-rc4-1-gab8184b)
MSR: 9032 
  CR: 24422248  XER: 
DAR: 10581054 DSISR: 4200 
GPR00: c02d4574 ee9cdb50 ee71d4b8 10581050 0001 dbc88118  0020 
GPR08: 1205 0004   24422444   93c3 
GPR16: f000 0001    ee260800   
GPR24:  0001 ee9cdc1b eef5c1a0 1000 ee47 dbc88118 ee470170 
NIP [c02d4584] btrfs_alloc_tree_block+0x18c/0x5c4
LR [c02d4574] btrfs_alloc_tree_block+0x17c/0x5c4
Call Trace:
[ee9cdb50] [c02d4574] btrfs_alloc_tree_block+0x17c/0x5c4 (unreliable)
[ee9cdbf0] [c02b86d4] __btrfs_cow_block+0x110/0x638
[ee9cdc70] [c02b8d74] btrfs_cow_block+0xdc/0x1b0
[ee9cdca0] [c02bc48c] btrfs_search_slot+0x1c0/0x904
[ee9cdd10] [c02dc680] btrfs_lookup_inode+0x3c/0x124
[ee9cdd50] [c02ec204] btrfs_update_inode_item+0x4c/0x10c
[ee9cdd80] [c02d05e4] cache_save_setup+0xc0/0x400
[ee9cdde0] [c02d4d54] btrfs_start_dirty_block_groups+0x184/0x47c
[ee9cde50] [c02e7e84] btrfs_commit_transaction+0x148/0xac4
[ee9cdeb0] [c02e313c] transaction_kthread+0x1d0/0x1ec
[ee9cdf00] [c004f1fc] kthread+0xf8/0x124
[ee9cdf40] [c0011480] ret_from_kernel_thread+0x5c/0x64
--- interrupt: 0 at   (null)
LR =   (null)
Instruction dump:
4800b3ed 7f838040 7c7e1b78 419d0430 806300d4 81db 81fb0004 4bdfe2b9 
3924 7ee6bb78 38630050 7fc5f378 <7dc91d2c> 7de01d2c 809501cf 807501cb 
---[ end trace 937683537ecd986b ]---



-- 
BOFH excuse #342:

HTTPD Error 4004 : very old Intel cpu - insufficient processing power

Re: [PATCH RFC] powerpc/32: fix handling of stack protector with recent GCC

2017-01-18 Thread Christian Kujau

On Mon, 16 Jan 2017, Christophe Leroy wrote:
>  Christian, can you test it ?

OK, so with that applied to v4.10-rc4, compilation still fails with GCC 
4.9.2 and CC_STACKPROTECTOR_STRONG=y, see below. But it compiles just fine 
with CC_STACKPROTECTOR_REGULAR=y and boots to!

Cross-compiling the same with GCC 5.2.0 works, even for 
CC_STACKPROTECTOR_STRONG=y and the system boots just fine.

So, with that limitation, feel free to add:

 Tested-by: Christian Kujau <li...@nerdbynature.de>


Thanks for the fix!
Christian.



$ gcc --version | head -1
gcc-4.9.real (Debian 4.9.2-10) 4.9.2

$ grep CC_STACKPROTECTOR_STRONG $DIR/.config
CONFIG_CC_STACKPROTECTOR_STRONG=y

$ make O=$DIR V=1 bindeb-pkg
[...]
+ ld -EB -m elf32ppc -Bstatic --build-id -X -o .tmp_vmlinux1 -T 
./arch/powerpc/kernel/vmlinux.lds arch/powerpc/kernel/head_32.o 
arch/powerpc/kernel/fpu.o arch/powerpc/kernel/vector.o 
arch/powerpc/kernel/prom_init.o init/built-in.o --start-group 
usr/built-in.o arch/powerpc/kernel/built-in.o arch/powerpc/mm/built-in.o 
arch/powerpc/lib/built-in.o arch/powerpc/sysdev/built-in.o 
arch/powerpc/platforms/built-in.o arch/powerpc/math-emu/built-in.o 
arch/powerpc/crypto/built-in.o arch/powerpc/net/built-in.o 
kernel/built-in.o certs/built-in.o mm/built-in.o fs/built-in.o 
ipc/built-in.o security/built-in.o crypto/built-in.o block/built-in.o 
lib/lib.a lib/built-in.o drivers/built-in.o sound/built-in.o 
firmware/built-in.o net/built-in.o virt/built-in.o --end-group
arch/powerpc/platforms/built-in.o: In function `bootx_printf':
/usr/local/src/linux-git/arch/powerpc/platforms/powermac/bootx_init.c:88: 
undefined reference to `__stack_chk_fail_local'
arch/powerpc/platforms/built-in.o: In function `bootx_add_display_props':
/usr/local/src/linux-git/arch/powerpc/platforms/powermac/bootx_init.c:211: 
undefined reference to `__stack_chk_fail_local' 
arch/powerpc/platforms/built-in.o: In function `bootx_scan_dt_build_struct':
/usr/local/src/linux-git/arch/powerpc/platforms/powermac/bootx_init.c:350: 
undefined reference to `__stack_chk_fail_local'
arch/powerpc/platforms/built-in.o: In function `bootx_init':
/usr/local/src/linux-git/arch/powerpc/platforms/powermac/bootx_init.c:596: 
undefined reference to `__stack_chk_fail_local'
/usr/bin/ld.bfd.real: .tmp_vmlinux1: hidden symbol `__stack_chk_fail_local' 
isn't defined
/usr/bin/ld.bfd.real: final link failed: Bad value

-- 
BOFH excuse #66:

bit bucket overflow

Re: [PATCH RFC] powerpc/32: fix handling of stack protector with recent GCC

2017-01-18 Thread Christian Kujau

On Mon, 16 Jan 2017, Christophe Leroy wrote:
>  Christian, can you test it ?

OK, so with that applied to v4.10-rc4, compilation still fails with GCC 
4.9.2 and CC_STACKPROTECTOR_STRONG=y, see below. But it compiles just fine 
with CC_STACKPROTECTOR_REGULAR=y and boots to!

Cross-compiling the same with GCC 5.2.0 works, even for 
CC_STACKPROTECTOR_STRONG=y and the system boots just fine.

So, with that limitation, feel free to add:

 Tested-by: Christian Kujau 


Thanks for the fix!
Christian.



$ gcc --version | head -1
gcc-4.9.real (Debian 4.9.2-10) 4.9.2

$ grep CC_STACKPROTECTOR_STRONG $DIR/.config
CONFIG_CC_STACKPROTECTOR_STRONG=y

$ make O=$DIR V=1 bindeb-pkg
[...]
+ ld -EB -m elf32ppc -Bstatic --build-id -X -o .tmp_vmlinux1 -T 
./arch/powerpc/kernel/vmlinux.lds arch/powerpc/kernel/head_32.o 
arch/powerpc/kernel/fpu.o arch/powerpc/kernel/vector.o 
arch/powerpc/kernel/prom_init.o init/built-in.o --start-group 
usr/built-in.o arch/powerpc/kernel/built-in.o arch/powerpc/mm/built-in.o 
arch/powerpc/lib/built-in.o arch/powerpc/sysdev/built-in.o 
arch/powerpc/platforms/built-in.o arch/powerpc/math-emu/built-in.o 
arch/powerpc/crypto/built-in.o arch/powerpc/net/built-in.o 
kernel/built-in.o certs/built-in.o mm/built-in.o fs/built-in.o 
ipc/built-in.o security/built-in.o crypto/built-in.o block/built-in.o 
lib/lib.a lib/built-in.o drivers/built-in.o sound/built-in.o 
firmware/built-in.o net/built-in.o virt/built-in.o --end-group
arch/powerpc/platforms/built-in.o: In function `bootx_printf':
/usr/local/src/linux-git/arch/powerpc/platforms/powermac/bootx_init.c:88: 
undefined reference to `__stack_chk_fail_local'
arch/powerpc/platforms/built-in.o: In function `bootx_add_display_props':
/usr/local/src/linux-git/arch/powerpc/platforms/powermac/bootx_init.c:211: 
undefined reference to `__stack_chk_fail_local' 
arch/powerpc/platforms/built-in.o: In function `bootx_scan_dt_build_struct':
/usr/local/src/linux-git/arch/powerpc/platforms/powermac/bootx_init.c:350: 
undefined reference to `__stack_chk_fail_local'
arch/powerpc/platforms/built-in.o: In function `bootx_init':
/usr/local/src/linux-git/arch/powerpc/platforms/powermac/bootx_init.c:596: 
undefined reference to `__stack_chk_fail_local'
/usr/bin/ld.bfd.real: .tmp_vmlinux1: hidden symbol `__stack_chk_fail_local' 
isn't defined
/usr/bin/ld.bfd.real: final link failed: Bad value

-- 
BOFH excuse #66:

bit bucket overflow

DEBUG_LOCKS_WARN_ON(1) / lockdep.c:3134 lockdep_init_map+0x1e8/0x1f0

2017-01-03 Thread Christian Kujau

Hi,

booting v4.10-rc2 on this PowerPC G4 machine prints the following early 
on, but then continues to boot and the machine is running fine so far:


BUG: key ef0ba7d0 not in .data!
DEBUG_LOCKS_WARN_ON(1)
[ cut here ]
WARNING: CPU: 0 PID: 1 at 
/usr/local/src/linux-git/kernel/locking/lockdep.c:3134 
lockdep_init_map+0x1e8/0x1f0
Modules linked in:
CPU: 0 PID: 1 Comm: swapper Not tainted 4.10.0-rc2 #4
task: ef04aa60 task.stack: ef042000
NIP: c005eb78 LR: c005eb78 CTR: 
REGS: ef043d70 TRAP: 0700   Not tainted  (4.10.0-rc2)
MSR: 02029032 
  CR: 4822  XER: 2000
GPR00: c005eb78 ef043e20 ef04aa60 0016 0001 c0068b24  0001 
GPR08:   4ead 00d6 2824  c00047f0  
GPR16:   c08b9280 effedfa0 c078d6ac c107 c078eddc c078ee14 
GPR24: c078ed24 c078ee24 0002 ef085a00  c08b ef0ba7d0 ef0ba7b4 
NIP [c005eb78] lockdep_init_map+0x1e8/0x1f0
LR [c005eb78] lockdep_init_map+0x1e8/0x1f0
Call Trace:
[ef043e20] [c005eb78] lockdep_init_map+0x1e8/0x1f0 (unreliable)
[ef043e40] [c083adb4] kw_i2c_add+0xc0/0x134
[ef043e60] [c083b29c] pmac_i2c_init+0x3b8/0x518
[ef043ea0] [c00040c0] do_one_initcall+0x40/0x174
[ef043f00] [c0834064] kernel_init_freeable+0x134/0x1cc
[ef043f30] [c0004808] kernel_init+0x18/0x110
[ef043f40] [c0010ad8] ret_from_kernel_thread+0x5c/0x64
Instruction dump:
4837259d 2f83 41befec0 3d20c08b 812953a0 2f89 409efeb0 3c60c079 
3c80c07b 3884b48c 3863f9e4 4860d52d <0fe0> 4bfffe94 9421ff70 7c0802a6 
---[ end trace 8a79d8041d87d000 ]---


Full dmesg and .config: http://nerdbynature.de/bits/4.10-rc2/


Thanks for listening,
Christian.
-- 
BOFH excuse #409:

The vulcan-death-grip ping has been applied.

DEBUG_LOCKS_WARN_ON(1) / lockdep.c:3134 lockdep_init_map+0x1e8/0x1f0

2017-01-03 Thread Christian Kujau

Hi,

booting v4.10-rc2 on this PowerPC G4 machine prints the following early 
on, but then continues to boot and the machine is running fine so far:


BUG: key ef0ba7d0 not in .data!
DEBUG_LOCKS_WARN_ON(1)
[ cut here ]
WARNING: CPU: 0 PID: 1 at 
/usr/local/src/linux-git/kernel/locking/lockdep.c:3134 
lockdep_init_map+0x1e8/0x1f0
Modules linked in:
CPU: 0 PID: 1 Comm: swapper Not tainted 4.10.0-rc2 #4
task: ef04aa60 task.stack: ef042000
NIP: c005eb78 LR: c005eb78 CTR: 
REGS: ef043d70 TRAP: 0700   Not tainted  (4.10.0-rc2)
MSR: 02029032 
  CR: 4822  XER: 2000
GPR00: c005eb78 ef043e20 ef04aa60 0016 0001 c0068b24  0001 
GPR08:   4ead 00d6 2824  c00047f0  
GPR16:   c08b9280 effedfa0 c078d6ac c107 c078eddc c078ee14 
GPR24: c078ed24 c078ee24 0002 ef085a00  c08b ef0ba7d0 ef0ba7b4 
NIP [c005eb78] lockdep_init_map+0x1e8/0x1f0
LR [c005eb78] lockdep_init_map+0x1e8/0x1f0
Call Trace:
[ef043e20] [c005eb78] lockdep_init_map+0x1e8/0x1f0 (unreliable)
[ef043e40] [c083adb4] kw_i2c_add+0xc0/0x134
[ef043e60] [c083b29c] pmac_i2c_init+0x3b8/0x518
[ef043ea0] [c00040c0] do_one_initcall+0x40/0x174
[ef043f00] [c0834064] kernel_init_freeable+0x134/0x1cc
[ef043f30] [c0004808] kernel_init+0x18/0x110
[ef043f40] [c0010ad8] ret_from_kernel_thread+0x5c/0x64
Instruction dump:
4837259d 2f83 41befec0 3d20c08b 812953a0 2f89 409efeb0 3c60c079 
3c80c07b 3884b48c 3863f9e4 4860d52d <0fe0> 4bfffe94 9421ff70 7c0802a6 
---[ end trace 8a79d8041d87d000 ]---


Full dmesg and .config: http://nerdbynature.de/bits/4.10-rc2/


Thanks for listening,
Christian.
-- 
BOFH excuse #409:

The vulcan-death-grip ping has been applied.

Re: [PATCH v3 1/3] siphash: add cryptographically secure hashtable function

2016-12-17 Thread Christian Kujau

On Thu, 15 Dec 2016, Jason A. Donenfeld wrote:
> > I'd still drop the "24" unless you really think we're going to have
> > multiple variants coming into the kernel.
> 
> Okay. I don't have a problem with this, unless anybody has some reason
> to the contrary.

What if the 2/4-round version falls and we need more rounds to withstand 
future cryptoanalysis? We'd then have siphash_ and siphash48_ functions, 
no? My amateurish bike-shedding argument would be "let's keep the 24 then" :-)

C.
-- 
BOFH excuse #354:

Chewing gum on /dev/sd3c

Re: [PATCH v3 1/3] siphash: add cryptographically secure hashtable function

2016-12-17 Thread Christian Kujau

On Thu, 15 Dec 2016, Jason A. Donenfeld wrote:
> > I'd still drop the "24" unless you really think we're going to have
> > multiple variants coming into the kernel.
> 
> Okay. I don't have a problem with this, unless anybody has some reason
> to the contrary.

What if the 2/4-round version falls and we need more rounds to withstand 
future cryptoanalysis? We'd then have siphash_ and siphash48_ functions, 
no? My amateurish bike-shedding argument would be "let's keep the 24 then" :-)

C.
-- 
BOFH excuse #354:

Chewing gum on /dev/sd3c

Re: Locking API testsuite output mangled

2016-11-23 Thread Christian Kujau

On Wed, 23 Nov 2016, Michael Ellerman wrote:
> That's nothing powerpc specific AFAICS, does this fix it?

Hm, so s/printk/pr_cont/ - but not in all places? But yeah, this fixes it 
for me, at least on x86.

 Tested-by: Christian Kujau <li...@nerdbynature.de>

Thank you!
Christian.

> 
> cheers
> 
> diff --git a/lib/locking-selftest.c b/lib/locking-selftest.c
> index 872a15a2a637..f3a217ea0388 100644
> --- a/lib/locking-selftest.c
> +++ b/lib/locking-selftest.c
> @@ -980,23 +980,23 @@ static void dotest(void (*testcase_fn)(void), int 
> expected, int lockclass_mask)
>  #ifndef CONFIG_PROVE_LOCKING
>   if (expected == FAILURE && debug_locks) {
>   expected_testcase_failures++;
> - printk("failed|");
> + pr_cont("failed|");
>   }
>   else
>  #endif
>   if (debug_locks != expected) {
>   unexpected_testcase_failures++;
> - printk("FAILED|");
> + pr_cont("FAILED|");
>  
>   dump_stack();
>   } else {
>   testcase_successes++;
> - printk("  ok  |");
> + pr_cont("  ok  |");
>   }
>   testcase_total++;
>  
>   if (debug_locks_verbose)
> - printk(" lockclass mask: %x, debug_locks: %d, expected: %d\n",
> + pr_cont(" lockclass mask: %x, debug_locks: %d, expected: %d\n",
>   lockclass_mask, debug_locks, expected);
>   /*
>* Some tests (e.g. double-unlock) might corrupt the preemption
> @@ -1021,26 +1021,26 @@ static inline void print_testname(const char 
> *testname)
>  #define DO_TESTCASE_1(desc, name, nr)\
>   print_testname(desc"/"#nr); \
>   dotest(name##_##nr, SUCCESS, LOCKTYPE_RWLOCK);  \
> - printk("\n");
> + pr_cont("\n");
>  
>  #define DO_TESTCASE_1B(desc, name, nr)   \
>   print_testname(desc"/"#nr); \
>   dotest(name##_##nr, FAILURE, LOCKTYPE_RWLOCK);  \
> - printk("\n");
> + pr_cont("\n");
>  
>  #define DO_TESTCASE_3(desc, name, nr)\
>   print_testname(desc"/"#nr); \
>   dotest(name##_spin_##nr, FAILURE, LOCKTYPE_SPIN);   \
>   dotest(name##_wlock_##nr, FAILURE, LOCKTYPE_RWLOCK);\
>   dotest(name##_rlock_##nr, SUCCESS, LOCKTYPE_RWLOCK);\
> - printk("\n");
> + pr_cont("\n");
>  
>  #define DO_TESTCASE_3RW(desc, name, nr)  \
>   print_testname(desc"/"#nr); \
>   dotest(name##_spin_##nr, FAILURE, LOCKTYPE_SPIN|LOCKTYPE_RWLOCK);\
>   dotest(name##_wlock_##nr, FAILURE, LOCKTYPE_RWLOCK);\
>   dotest(name##_rlock_##nr, SUCCESS, LOCKTYPE_RWLOCK);\
> - printk("\n");
> + pr_cont("\n");
>  
>  #define DO_TESTCASE_6(desc, name)\
>   print_testname(desc);   \
> @@ -1050,7 +1050,7 @@ static inline void print_testname(const char *testname)
>   dotest(name##_mutex, FAILURE, LOCKTYPE_MUTEX);  \
>   dotest(name##_wsem, FAILURE, LOCKTYPE_RWSEM);   \
>   dotest(name##_rsem, FAILURE, LOCKTYPE_RWSEM);   \
> - printk("\n");
> + pr_cont("\n");
>  
>  #define DO_TESTCASE_6_SUCCESS(desc, name)\
>   print_testname(desc);   \
> @@ -1060,7 +1060,7 @@ static inline void print_testname(const char *testname)
>   dotest(name##_mutex, SUCCESS, LOCKTYPE_MUTEX);  \
>   dotest(name##_wsem, SUCCESS, LOCKTYPE_RWSEM);   \
>   dotest(name##_rsem, SUCCESS, LOCKTYPE_RWSEM);   \
> - printk("\n");
> + pr_cont("\n");
>  
>  /*
>   * 'read' variant: rlocks must not trigger.
> @@ -1073,7 +1073,7 @@ static inline void print_testname(const char *testname)
>   dotest(name##_mutex, FAILURE, LOCKTYPE_MUTEX);  \
>   dotest(name##_wsem, FAILURE, LOCKTYPE_RWSEM);   \
>   dotest(name##_rsem, FAILURE, LOCKTYPE_RWSEM);   \
> - printk("\n");
> + pr_cont("\n");
>  
>  #define DO_TESTCASE_2I(desc, name, nr)   \
>   DO_TESTCASE_1("hard-"desc, name##_hard, nr);\
> @@ -1726,25 +1726,25 @@ static void ww_tests(void)
>   dotest(ww_test_fail_acquire, S

Re: Locking API testsuite output mangled

2016-11-23 Thread Christian Kujau

On Wed, 23 Nov 2016, Michael Ellerman wrote:
> That's nothing powerpc specific AFAICS, does this fix it?

Hm, so s/printk/pr_cont/ - but not in all places? But yeah, this fixes it 
for me, at least on x86.

 Tested-by: Christian Kujau 

Thank you!
Christian.

> 
> cheers
> 
> diff --git a/lib/locking-selftest.c b/lib/locking-selftest.c
> index 872a15a2a637..f3a217ea0388 100644
> --- a/lib/locking-selftest.c
> +++ b/lib/locking-selftest.c
> @@ -980,23 +980,23 @@ static void dotest(void (*testcase_fn)(void), int 
> expected, int lockclass_mask)
>  #ifndef CONFIG_PROVE_LOCKING
>   if (expected == FAILURE && debug_locks) {
>   expected_testcase_failures++;
> - printk("failed|");
> + pr_cont("failed|");
>   }
>   else
>  #endif
>   if (debug_locks != expected) {
>   unexpected_testcase_failures++;
> - printk("FAILED|");
> + pr_cont("FAILED|");
>  
>   dump_stack();
>   } else {
>   testcase_successes++;
> - printk("  ok  |");
> + pr_cont("  ok  |");
>   }
>   testcase_total++;
>  
>   if (debug_locks_verbose)
> - printk(" lockclass mask: %x, debug_locks: %d, expected: %d\n",
> + pr_cont(" lockclass mask: %x, debug_locks: %d, expected: %d\n",
>   lockclass_mask, debug_locks, expected);
>   /*
>* Some tests (e.g. double-unlock) might corrupt the preemption
> @@ -1021,26 +1021,26 @@ static inline void print_testname(const char 
> *testname)
>  #define DO_TESTCASE_1(desc, name, nr)\
>   print_testname(desc"/"#nr); \
>   dotest(name##_##nr, SUCCESS, LOCKTYPE_RWLOCK);  \
> - printk("\n");
> + pr_cont("\n");
>  
>  #define DO_TESTCASE_1B(desc, name, nr)   \
>   print_testname(desc"/"#nr); \
>   dotest(name##_##nr, FAILURE, LOCKTYPE_RWLOCK);  \
> - printk("\n");
> + pr_cont("\n");
>  
>  #define DO_TESTCASE_3(desc, name, nr)\
>   print_testname(desc"/"#nr); \
>   dotest(name##_spin_##nr, FAILURE, LOCKTYPE_SPIN);   \
>   dotest(name##_wlock_##nr, FAILURE, LOCKTYPE_RWLOCK);\
>   dotest(name##_rlock_##nr, SUCCESS, LOCKTYPE_RWLOCK);\
> - printk("\n");
> + pr_cont("\n");
>  
>  #define DO_TESTCASE_3RW(desc, name, nr)  \
>   print_testname(desc"/"#nr); \
>   dotest(name##_spin_##nr, FAILURE, LOCKTYPE_SPIN|LOCKTYPE_RWLOCK);\
>   dotest(name##_wlock_##nr, FAILURE, LOCKTYPE_RWLOCK);\
>   dotest(name##_rlock_##nr, SUCCESS, LOCKTYPE_RWLOCK);\
> - printk("\n");
> + pr_cont("\n");
>  
>  #define DO_TESTCASE_6(desc, name)\
>   print_testname(desc);   \
> @@ -1050,7 +1050,7 @@ static inline void print_testname(const char *testname)
>   dotest(name##_mutex, FAILURE, LOCKTYPE_MUTEX);  \
>   dotest(name##_wsem, FAILURE, LOCKTYPE_RWSEM);   \
>   dotest(name##_rsem, FAILURE, LOCKTYPE_RWSEM);   \
> - printk("\n");
> + pr_cont("\n");
>  
>  #define DO_TESTCASE_6_SUCCESS(desc, name)\
>   print_testname(desc);   \
> @@ -1060,7 +1060,7 @@ static inline void print_testname(const char *testname)
>   dotest(name##_mutex, SUCCESS, LOCKTYPE_MUTEX);  \
>   dotest(name##_wsem, SUCCESS, LOCKTYPE_RWSEM);   \
>   dotest(name##_rsem, SUCCESS, LOCKTYPE_RWSEM);   \
> - printk("\n");
> + pr_cont("\n");
>  
>  /*
>   * 'read' variant: rlocks must not trigger.
> @@ -1073,7 +1073,7 @@ static inline void print_testname(const char *testname)
>   dotest(name##_mutex, FAILURE, LOCKTYPE_MUTEX);  \
>   dotest(name##_wsem, FAILURE, LOCKTYPE_RWSEM);   \
>   dotest(name##_rsem, FAILURE, LOCKTYPE_RWSEM);   \
> - printk("\n");
> + pr_cont("\n");
>  
>  #define DO_TESTCASE_2I(desc, name, nr)   \
>   DO_TESTCASE_1("hard-"desc, name##_hard, nr);\
> @@ -1726,25 +1726,25 @@ static void ww_tests(void)
>   dotest(ww_test_fail_acquire, SUCCESS, LOCKTYPE_WW);
&

Locking API testsuite output mangled

2016-11-22 Thread Christian Kujau

The "Locking API testsuite" output during bootup (with 
CONFIG_DEBUG_LOCKING_API_SELFTESTS=y) on this PowerPC system looks 
mangled, possibly related to the recent printk changes (4bcc595ccd80, 
"printk: reinstate KERN_CONT for printing continuation lines"). Before 
(e.g. with v4.6) it looked like this:

 http://nerdbynature.de/bits/4.6.0-rc7/dmesg.txt

See below for the current output.

Christian.

[0.001417] Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., 
Ingo Molnar
[0.001439] ... MAX_LOCKDEP_SUBCLASSES:  8
[0.001453] ... MAX_LOCK_DEPTH:  48
[0.001467] ... MAX_LOCKDEP_KEYS:8191
[0.001482] ... CLASSHASH_SIZE:  4096
[0.001497] ... MAX_LOCKDEP_ENTRIES: 32768
[0.001511] ... MAX_LOCKDEP_CHAINS:  65536
[0.001526] ... CHAINHASH_SIZE:  32768
[0.001541]  memory used by lock dependency info: 5167 kB
[0.001557]  per task-struct memory footprint: 1536 bytes
[0.001574] 
[0.001587] | Locking API testsuite:
[0.001600] 

[0.001622]  | spin |wlock |rlock |mutex | 
wsem | rsem |
[0.001644]   
--
[0.001681]  A-A deadlock:
[0.001705]   ok  |
[0.003198]   ok  |
[0.004555]   ok  |
[0.005962]   ok  |
[0.007307]   ok  |
[0.008647]   ok  |

[0.010015]  A-B-B-A deadlock:
[0.010045]   ok  |
[0.011401]   ok  |
[0.012736]   ok  |
[0.014116]   ok  |
[0.015458]   ok  |
[0.016812]   ok  |

[0.018175]  A-B-B-C-C-A deadlock:
[0.018212]   ok  |
[0.019575]   ok  |
[0.020916]   ok  |
[0.022304]   ok  |
[0.023654]   ok  |
[0.025017]   ok  |

[0.026382]  A-B-C-A-B-C deadlock:
[0.026419]   ok  |
[0.027781]   ok  |
[0.029122]   ok  |
[0.030510]   ok  |
[0.031860]   ok  |
[0.033223]   ok  |

[0.034587]  A-B-B-C-C-D-D-A deadlock:
[0.034633]   ok  |
[0.036007]   ok  |
[0.037356]   ok  |
[0.038757]   ok  |
[0.040118]   ok  |
[0.041492]   ok  |

[0.042859]  A-B-C-D-B-D-D-A deadlock:
[0.042905]   ok  |
[0.044278]   ok  |
[0.045628]   ok  |
[0.047029]   ok  |
[0.048388]   ok  |
[0.049761]   ok  |

[0.051130]  A-B-C-D-B-C-D-A deadlock:
[0.051176]   ok  |
[0.052551]   ok  |
[0.053901]   ok  |
[0.055303]   ok  |
[0.056665]   ok  |
[0.058040]   ok  |

[0.059408] double unlock:
[0.059429]   ok  |
[0.060774]   ok  |
[0.062103]   ok  |
[0.063469]   ok  |
[0.064800]   ok  |
[0.066145]   ok  |

[0.067508]   initialize held:
[0.067527]   ok  |
[0.068870]   ok  |
[0.070198]   ok  |
[0.071561]   ok  |
[0.072892]   ok  |
[0.074235]   ok  |

[0.075596]  bad unlock order:
[0.075623]   ok  |
[0.076979]   ok  |
[0.078316]   ok  |
[0.079691]   ok  |
[0.081031]   ok  |
[0.082387]   ok  |

[0.083753]   
--
[0.083791]   recursive read-lock:
[0.083804]  |
[0.083830]   ok  |
[0.085157]  |
[0.085183]   ok  |

[0.086526]recursive read-lock #2:
[0.086539]  |
[0.086564]   ok  |
[0.087908]  |
[0.087936]   ok  |

[0.089280] mixed read-write-lock:
[0.089293]  |
[0.089320]   ok  |
[0.090643]  |
[0.090672]   ok  |

[0.092035] mixed write-read-lock:
[0.092048]  |
[0.092075]   ok  |
[0.093399]  |
[0.093428]   ok  |

[0.094771]   
--
[0.094809]  hard-irqs-on + irq-safe-A/12:
[0.094829]   ok  |
[0.096192]   ok  |
[0.097523]   ok  |

[0.098882]  soft-irqs-on + irq-safe-A/12:
[0.098904]   ok  |
[0.100270]   ok  |
[0.101602]   ok  |

[0.102962]  hard-irqs-on + irq-safe-A/21:
[0.102982]   ok  |
[0.104345]   ok  |
[0.105678]   ok  |

[0.107037]  soft-irqs-on + irq-safe-A/21:
[0.107058]   ok  |
[0.108422]   ok  |
[0.109754]   ok  |

[0.12]sirq-safe-A => hirqs-on/12:
[0.33]   ok  |
[0.112498]   ok  |
[0.113830]   ok  |

[0.115189]sirq-safe-A => hirqs-on/21:
[0.115209]   ok  |
[0.116574]   ok  |
[0.117907]   ok  |

[0.119266]  hard-safe-A + irqs-on/12:
[0.119286]   ok  |
[0.120649]   ok  |
[0.121981]   ok  |

[0.123341]  soft-safe-A + irqs-on/12:
[0.123362]   ok  |
[0.124727]   ok  |
[0.126061]   ok  |

[0.127420]

Locking API testsuite output mangled

2016-11-22 Thread Christian Kujau

The "Locking API testsuite" output during bootup (with 
CONFIG_DEBUG_LOCKING_API_SELFTESTS=y) on this PowerPC system looks 
mangled, possibly related to the recent printk changes (4bcc595ccd80, 
"printk: reinstate KERN_CONT for printing continuation lines"). Before 
(e.g. with v4.6) it looked like this:

 http://nerdbynature.de/bits/4.6.0-rc7/dmesg.txt

See below for the current output.

Christian.

[0.001417] Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., 
Ingo Molnar
[0.001439] ... MAX_LOCKDEP_SUBCLASSES:  8
[0.001453] ... MAX_LOCK_DEPTH:  48
[0.001467] ... MAX_LOCKDEP_KEYS:8191
[0.001482] ... CLASSHASH_SIZE:  4096
[0.001497] ... MAX_LOCKDEP_ENTRIES: 32768
[0.001511] ... MAX_LOCKDEP_CHAINS:  65536
[0.001526] ... CHAINHASH_SIZE:  32768
[0.001541]  memory used by lock dependency info: 5167 kB
[0.001557]  per task-struct memory footprint: 1536 bytes
[0.001574] 
[0.001587] | Locking API testsuite:
[0.001600] 

[0.001622]  | spin |wlock |rlock |mutex | 
wsem | rsem |
[0.001644]   
--
[0.001681]  A-A deadlock:
[0.001705]   ok  |
[0.003198]   ok  |
[0.004555]   ok  |
[0.005962]   ok  |
[0.007307]   ok  |
[0.008647]   ok  |

[0.010015]  A-B-B-A deadlock:
[0.010045]   ok  |
[0.011401]   ok  |
[0.012736]   ok  |
[0.014116]   ok  |
[0.015458]   ok  |
[0.016812]   ok  |

[0.018175]  A-B-B-C-C-A deadlock:
[0.018212]   ok  |
[0.019575]   ok  |
[0.020916]   ok  |
[0.022304]   ok  |
[0.023654]   ok  |
[0.025017]   ok  |

[0.026382]  A-B-C-A-B-C deadlock:
[0.026419]   ok  |
[0.027781]   ok  |
[0.029122]   ok  |
[0.030510]   ok  |
[0.031860]   ok  |
[0.033223]   ok  |

[0.034587]  A-B-B-C-C-D-D-A deadlock:
[0.034633]   ok  |
[0.036007]   ok  |
[0.037356]   ok  |
[0.038757]   ok  |
[0.040118]   ok  |
[0.041492]   ok  |

[0.042859]  A-B-C-D-B-D-D-A deadlock:
[0.042905]   ok  |
[0.044278]   ok  |
[0.045628]   ok  |
[0.047029]   ok  |
[0.048388]   ok  |
[0.049761]   ok  |

[0.051130]  A-B-C-D-B-C-D-A deadlock:
[0.051176]   ok  |
[0.052551]   ok  |
[0.053901]   ok  |
[0.055303]   ok  |
[0.056665]   ok  |
[0.058040]   ok  |

[0.059408] double unlock:
[0.059429]   ok  |
[0.060774]   ok  |
[0.062103]   ok  |
[0.063469]   ok  |
[0.064800]   ok  |
[0.066145]   ok  |

[0.067508]   initialize held:
[0.067527]   ok  |
[0.068870]   ok  |
[0.070198]   ok  |
[0.071561]   ok  |
[0.072892]   ok  |
[0.074235]   ok  |

[0.075596]  bad unlock order:
[0.075623]   ok  |
[0.076979]   ok  |
[0.078316]   ok  |
[0.079691]   ok  |
[0.081031]   ok  |
[0.082387]   ok  |

[0.083753]   
--
[0.083791]   recursive read-lock:
[0.083804]  |
[0.083830]   ok  |
[0.085157]  |
[0.085183]   ok  |

[0.086526]recursive read-lock #2:
[0.086539]  |
[0.086564]   ok  |
[0.087908]  |
[0.087936]   ok  |

[0.089280] mixed read-write-lock:
[0.089293]  |
[0.089320]   ok  |
[0.090643]  |
[0.090672]   ok  |

[0.092035] mixed write-read-lock:
[0.092048]  |
[0.092075]   ok  |
[0.093399]  |
[0.093428]   ok  |

[0.094771]   
--
[0.094809]  hard-irqs-on + irq-safe-A/12:
[0.094829]   ok  |
[0.096192]   ok  |
[0.097523]   ok  |

[0.098882]  soft-irqs-on + irq-safe-A/12:
[0.098904]   ok  |
[0.100270]   ok  |
[0.101602]   ok  |

[0.102962]  hard-irqs-on + irq-safe-A/21:
[0.102982]   ok  |
[0.104345]   ok  |
[0.105678]   ok  |

[0.107037]  soft-irqs-on + irq-safe-A/21:
[0.107058]   ok  |
[0.108422]   ok  |
[0.109754]   ok  |

[0.12]sirq-safe-A => hirqs-on/12:
[0.33]   ok  |
[0.112498]   ok  |
[0.113830]   ok  |

[0.115189]sirq-safe-A => hirqs-on/21:
[0.115209]   ok  |
[0.116574]   ok  |
[0.117907]   ok  |

[0.119266]  hard-safe-A + irqs-on/12:
[0.119286]   ok  |
[0.120649]   ok  |
[0.121981]   ok  |

[0.123341]  soft-safe-A + irqs-on/12:
[0.123362]   ok  |
[0.124727]   ok  |
[0.126061]   ok  |

[0.127420]

jfs: mangled lockdep splat

2016-11-22 Thread Christian Kujau

For some time now, I always[0] receive a lockdep warning when there's some 
disk I/O on the system. But recently the warning looks kinda mangled,
I suspect the recent printk change (4bcc595ccd80, "printk: reinstate 
KERN_CONT for printing continuation lines") to be the reason for that.
In previous versions, the warning looked like this:

 http://nerdbynature.de/bits/4.6.0-rc7/dmesg.txt

Below is the new warning, which is barely readable anymore. Of course, 
best would be for the warning to vanish (hehe) but maybe the printout 
could be fixed too?

Thanks,
Christian.

[ 2401.254353] =
[ 2401.254410] [ INFO: possible irq lock inversion dependency detected ]
[ 2401.254469] 4.9.0-rc6 #1 Not tainted
[ 2401.254506] -
[ 2401.254560] kswapd0/282 just changed the state of lock:
[ 2401.254620]  (
[ 2401.254647] _ip->rdwrlock
[ 2401.254685] #2
[ 2401.254698] ){-.}
[ 2401.254730] , at: 
[ 2401.254764] [] jfs_get_block+0x50/0x370
[ 2401.254812] but this lock took another, RECLAIM_FS-unsafe lock in the past:
[ 2401.254868]  (
[ 2401.254890] _ip->commit_mutex
[ 2401.254927] ){+.+.+.}
[ 2401.254945] 

and interrupts could create inverse lock ordering between them.

[ 2401.255041] 
other info that might help us debug this:
[ 2401.255097]  Possible interrupt unsafe locking scenario:

[ 2401.255160]CPU0CPU1
[ 2401.255203]
[ 2401.255243]   lock(
[ 2401.255273] _ip->commit_mutex
[ 2401.255310] );
[ 2401.255334]local_irq_disable();
[ 2401.255381]lock(
[ 2401.255420] _ip->rdwrlock
[ 2401.255454] #2
[ 2401.255467] );
[ 2401.255494]lock(
[ 2401.255536] _ip->commit_mutex
[ 2401.255573] );
[ 2401.255596]   
[ 2401.255623] lock(
[ 2401.255648] _ip->rdwrlock
[ 2401.256059] #2
[ 2401.256071] );
[ 2401.256446] 
 *** DEADLOCK ***

[ 2401.257522] no locks held by kswapd0/282.
[ 2401.257888] 
the shortest dependencies between 2nd lock and 1st lock:
[ 2401.258622]  ->
[ 2401.258645]  (
[ 2401.259014] _ip->commit_mutex
[ 2401.259047] ){+.+.+.}
[ 2401.259418]  ops: 31698
[ 2401.259435]  {
[ 2401.259800] HARDIRQ-ON-W
[ 2401.259829]  at:
[ 2401.260192]   
[ 2401.260236] [] lock_acquire+0x4c/0x68
[ 2401.260619]   
[ 2401.260657] [] mutex_lock_nested+0x38/0x2f8
[ 2401.261048]   
[ 2401.261108] [] jfs_create+0x88/0x2c4
[ 2401.261839]   
[ 2401.261996] [] path_openat+0xc1c/0x100c
[ 2401.262689]   
[ 2401.262860] [] do_filp_open+0xb0/0x100
[ 2401.263639]   
[ 2401.263678] [] do_sys_open+0x154/0x21c
[ 2401.264368]   
[ 2401.264411] [] ret_from_syscall+0x0/0x38
[ 2401.265070] SOFTIRQ-ON-W
[ 2401.265099]  at:
[ 2401.265579]   
[ 2401.265751] [] lock_acquire+0x4c/0x68
[ 2401.266268]   
[ 2401.266437] [] mutex_lock_nested+0x38/0x2f8
[ 2401.266975]   
[ 2401.267135] [] jfs_create+0x88/0x2c4
[ 2401.267679]   
[ 2401.267837] [] path_openat+0xc1c/0x100c
[ 2401.268394]   
[ 2401.268560] [] do_filp_open+0xb0/0x100
[ 2401.269104]   
[ 2401.269273] [] do_sys_open+0x154/0x21c
[ 2401.269969]   
[ 2401.270040] [] ret_from_syscall+0x0/0x38
[ 2401.270727] RECLAIM_FS-ON-W
[ 2401.270761]  at:
[ 2401.271279]  
[ 2401.271418] [] lockdep_trace_alloc+0x8c/0xe4
[ 2401.272017]  
[ 2401.272122] [] __kmalloc+0x40/0x14c
[ 2401.272706]  
[ 2401.272839] [] __jfs_set_acl+0xa0/0x1a4
[ 2401.273428]  
[ 2401.273541] [] jfs_set_acl+0x50/0x9c
[ 2401.274135]  
[ 2401.274267] [] posix_acl_chmod+0xf0/0x130
[ 2401.274824]  
[ 2401.274991] [] notify_change+0x1c4/0x42c
[ 2401.275690]  
[ 2401.275727] [] chmod_common+0x74/0x10c
[ 2401.276382]  
[ 2401.276419] [] SyS_fchmod+0x30/0x64
[ 2401.277090]  
[ 2401.277129] [] ret_from_syscall+0x0/0x38
[ 2401.277805] INITIAL USE
[ 2401.277834]  at:
[ 2401.278352]  
[ 2401.278525] [] lock_acquire+0x4c/0x68
[ 2401.279054]  
[ 2401.279224] [] mutex_lock_nested+0x38/0x2f8
[ 2401.279783]  
[ 2401.279944] [] jfs_create+0x88/0x2c4
[ 2401.280528]  
[ 2401.280645] [] path_openat+0xc1c/0x100c
[ 2401.281287]  
[ 2401.281358] [] do_filp_open+0xb0/0x100
[ 2401.282048]  
[ 2401.282084] [] do_sys_open+0x154/0x21c
[ 2401.282840]  
[ 2401.282880] [] ret_from_syscall+0x0/0x38
[ 2401.283590]   }
[ 2401.284175]   ... key  at: 
[ 2401.284247] []

jfs: mangled lockdep splat

2016-11-22 Thread Christian Kujau

For some time now, I always[0] receive a lockdep warning when there's some 
disk I/O on the system. But recently the warning looks kinda mangled,
I suspect the recent printk change (4bcc595ccd80, "printk: reinstate 
KERN_CONT for printing continuation lines") to be the reason for that.
In previous versions, the warning looked like this:

 http://nerdbynature.de/bits/4.6.0-rc7/dmesg.txt

Below is the new warning, which is barely readable anymore. Of course, 
best would be for the warning to vanish (hehe) but maybe the printout 
could be fixed too?

Thanks,
Christian.

[ 2401.254353] =
[ 2401.254410] [ INFO: possible irq lock inversion dependency detected ]
[ 2401.254469] 4.9.0-rc6 #1 Not tainted
[ 2401.254506] -
[ 2401.254560] kswapd0/282 just changed the state of lock:
[ 2401.254620]  (
[ 2401.254647] _ip->rdwrlock
[ 2401.254685] #2
[ 2401.254698] ){-.}
[ 2401.254730] , at: 
[ 2401.254764] [] jfs_get_block+0x50/0x370
[ 2401.254812] but this lock took another, RECLAIM_FS-unsafe lock in the past:
[ 2401.254868]  (
[ 2401.254890] _ip->commit_mutex
[ 2401.254927] ){+.+.+.}
[ 2401.254945] 

and interrupts could create inverse lock ordering between them.

[ 2401.255041] 
other info that might help us debug this:
[ 2401.255097]  Possible interrupt unsafe locking scenario:

[ 2401.255160]CPU0CPU1
[ 2401.255203]
[ 2401.255243]   lock(
[ 2401.255273] _ip->commit_mutex
[ 2401.255310] );
[ 2401.255334]local_irq_disable();
[ 2401.255381]lock(
[ 2401.255420] _ip->rdwrlock
[ 2401.255454] #2
[ 2401.255467] );
[ 2401.255494]lock(
[ 2401.255536] _ip->commit_mutex
[ 2401.255573] );
[ 2401.255596]   
[ 2401.255623] lock(
[ 2401.255648] _ip->rdwrlock
[ 2401.256059] #2
[ 2401.256071] );
[ 2401.256446] 
 *** DEADLOCK ***

[ 2401.257522] no locks held by kswapd0/282.
[ 2401.257888] 
the shortest dependencies between 2nd lock and 1st lock:
[ 2401.258622]  ->
[ 2401.258645]  (
[ 2401.259014] _ip->commit_mutex
[ 2401.259047] ){+.+.+.}
[ 2401.259418]  ops: 31698
[ 2401.259435]  {
[ 2401.259800] HARDIRQ-ON-W
[ 2401.259829]  at:
[ 2401.260192]   
[ 2401.260236] [] lock_acquire+0x4c/0x68
[ 2401.260619]   
[ 2401.260657] [] mutex_lock_nested+0x38/0x2f8
[ 2401.261048]   
[ 2401.261108] [] jfs_create+0x88/0x2c4
[ 2401.261839]   
[ 2401.261996] [] path_openat+0xc1c/0x100c
[ 2401.262689]   
[ 2401.262860] [] do_filp_open+0xb0/0x100
[ 2401.263639]   
[ 2401.263678] [] do_sys_open+0x154/0x21c
[ 2401.264368]   
[ 2401.264411] [] ret_from_syscall+0x0/0x38
[ 2401.265070] SOFTIRQ-ON-W
[ 2401.265099]  at:
[ 2401.265579]   
[ 2401.265751] [] lock_acquire+0x4c/0x68
[ 2401.266268]   
[ 2401.266437] [] mutex_lock_nested+0x38/0x2f8
[ 2401.266975]   
[ 2401.267135] [] jfs_create+0x88/0x2c4
[ 2401.267679]   
[ 2401.267837] [] path_openat+0xc1c/0x100c
[ 2401.268394]   
[ 2401.268560] [] do_filp_open+0xb0/0x100
[ 2401.269104]   
[ 2401.269273] [] do_sys_open+0x154/0x21c
[ 2401.269969]   
[ 2401.270040] [] ret_from_syscall+0x0/0x38
[ 2401.270727] RECLAIM_FS-ON-W
[ 2401.270761]  at:
[ 2401.271279]  
[ 2401.271418] [] lockdep_trace_alloc+0x8c/0xe4
[ 2401.272017]  
[ 2401.272122] [] __kmalloc+0x40/0x14c
[ 2401.272706]  
[ 2401.272839] [] __jfs_set_acl+0xa0/0x1a4
[ 2401.273428]  
[ 2401.273541] [] jfs_set_acl+0x50/0x9c
[ 2401.274135]  
[ 2401.274267] [] posix_acl_chmod+0xf0/0x130
[ 2401.274824]  
[ 2401.274991] [] notify_change+0x1c4/0x42c
[ 2401.275690]  
[ 2401.275727] [] chmod_common+0x74/0x10c
[ 2401.276382]  
[ 2401.276419] [] SyS_fchmod+0x30/0x64
[ 2401.277090]  
[ 2401.277129] [] ret_from_syscall+0x0/0x38
[ 2401.277805] INITIAL USE
[ 2401.277834]  at:
[ 2401.278352]  
[ 2401.278525] [] lock_acquire+0x4c/0x68
[ 2401.279054]  
[ 2401.279224] [] mutex_lock_nested+0x38/0x2f8
[ 2401.279783]  
[ 2401.279944] [] jfs_create+0x88/0x2c4
[ 2401.280528]  
[ 2401.280645] [] path_openat+0xc1c/0x100c
[ 2401.281287]  
[ 2401.281358] [] do_filp_open+0xb0/0x100
[ 2401.282048]  
[ 2401.282084] [] do_sys_open+0x154/0x21c
[ 2401.282840]  
[ 2401.282880] [] ret_from_syscall+0x0/0x38
[ 2401.283590]   }
[ 2401.284175]   ... key  at: 
[ 2401.284247] []

Re: [4.8-rc1] make bindeb-pkg O= fails

2016-08-09 Thread Christian Kujau

[re-send]

On Mon, 8 Aug 2016, frank paulsen wrote:
> in 4.8-rc1 "make bindeb-pkg O=../debian" fails:
> | find: `scripts/gcc-plugins': No such file or directory
> | /usr/src/linus/scripts/package/Makefile:97: recipe for target
> 'bindeb-pkg' failed
> 
> this is due to a missing directory scripts/gcc-plugins if using O=
> 
> removing line 335 of scripts/package/builddeb helps:
> | (cd $objtree; find scripts/gcc-plugins -name \*.so -o -name
> gcc-common.h) >> "$objtree/debian/hdrobjfiles"
> 
> this clearly isn't the right fix, but i checked it anyway and the
> paket gets built.

This was introduced in 6b90bd4ba40b38dc13c2782469c1c77e4ed79915 ("GCC
plugin infrastructure"). Not failing hard when scripts/gcc-plugins
cannot be found, does the trick as well. But that too just papers over
the issue. Hopefully Emese has a better idea on how to solve this :-)

diff --git a/scripts/package/builddeb b/scripts/package/builddeb
index e1c09e2..89757f6 100755
--- a/scripts/package/builddeb
+++ b/scripts/package/builddeb
@@ -332,7 +332,7 @@ if grep -q '^CONFIG_STACK_VALIDATION=y'
$KCONFIG_CONFIG ; then
(cd $objtree; find tools/objtool -type f -executable) >>
"$objtree/debian/hdrobjfiles"
 fi
 (cd $objtree; find arch/$SRCARCH/include Module.symvers include scripts
-type f) >> "$objtree/debian/hdrobjfiles"
-(cd $objtree; find scripts/gcc-plugins -name \*.so -o -name
gcc-common.h) >> "$objtree/debian/hdrobjfiles"
+(cd $objtree; find scripts/gcc-plugins -name \*.so -o -name
gcc-common.h) >> "$objtree/debian/hdrobjfiles" || true
 destdir=$kernel_headers_dir/usr/src/linux-headers-$version
 mkdir -p "$destdir"
 (cd $srctree; tar -c -f - -T -) < "$objtree/debian/hdrsrcfiles" | (cd
$destdir; tar -xf -)


Thanks,
Christian.
-- 
BOFH excuse #269:

Melting hard drives

-- 
make bzImage, not war

Re: [4.8-rc1] make bindeb-pkg O= fails

2016-08-09 Thread Christian Kujau

[re-send]

On Mon, 8 Aug 2016, frank paulsen wrote:
> in 4.8-rc1 "make bindeb-pkg O=../debian" fails:
> | find: `scripts/gcc-plugins': No such file or directory
> | /usr/src/linus/scripts/package/Makefile:97: recipe for target
> 'bindeb-pkg' failed
> 
> this is due to a missing directory scripts/gcc-plugins if using O=
> 
> removing line 335 of scripts/package/builddeb helps:
> | (cd $objtree; find scripts/gcc-plugins -name \*.so -o -name
> gcc-common.h) >> "$objtree/debian/hdrobjfiles"
> 
> this clearly isn't the right fix, but i checked it anyway and the
> paket gets built.

This was introduced in 6b90bd4ba40b38dc13c2782469c1c77e4ed79915 ("GCC
plugin infrastructure"). Not failing hard when scripts/gcc-plugins
cannot be found, does the trick as well. But that too just papers over
the issue. Hopefully Emese has a better idea on how to solve this :-)

diff --git a/scripts/package/builddeb b/scripts/package/builddeb
index e1c09e2..89757f6 100755
--- a/scripts/package/builddeb
+++ b/scripts/package/builddeb
@@ -332,7 +332,7 @@ if grep -q '^CONFIG_STACK_VALIDATION=y'
$KCONFIG_CONFIG ; then
(cd $objtree; find tools/objtool -type f -executable) >>
"$objtree/debian/hdrobjfiles"
 fi
 (cd $objtree; find arch/$SRCARCH/include Module.symvers include scripts
-type f) >> "$objtree/debian/hdrobjfiles"
-(cd $objtree; find scripts/gcc-plugins -name \*.so -o -name
gcc-common.h) >> "$objtree/debian/hdrobjfiles"
+(cd $objtree; find scripts/gcc-plugins -name \*.so -o -name
gcc-common.h) >> "$objtree/debian/hdrobjfiles" || true
 destdir=$kernel_headers_dir/usr/src/linux-headers-$version
 mkdir -p "$destdir"
 (cd $srctree; tar -c -f - -T -) < "$objtree/debian/hdrsrcfiles" | (cd
$destdir; tar -xf -)


Thanks,
Christian.
-- 
BOFH excuse #269:

Melting hard drives

-- 
make bzImage, not war

Makefile.sphinx:17: The 'sphinx-build' command was not found

2016-07-28 Thread Christian Kujau

Hi,

since 22cba31bae ("Documentation/sphinx: add basic working Sphinx 
configuration and build") the following warning is emitted when running 
"make help":

$ make help > /dev/null 
Documentation/Makefile.sphinx:17: The 'sphinx-build' command was not 
found. Make sure you have Sphinx installed and in PATH, or set the 
SPHINXBUILD make variable to point to the full path of the 'sphinx-build' 
executable.

Indeed, I don't have "sphinx-build" installed (nor do I want to build 
documentation), running "make SPHINXBUILD=/bin/true help" makes the 
warning go away. Is there a way to omit the warning when 
running "make help"? E.g. by not including Documentation/Makefile.sphinx 
for that target?

Thanks,
Christian.
-- 
BOFH excuse #296:

The hardware bus needs a new token.

Makefile.sphinx:17: The 'sphinx-build' command was not found

2016-07-28 Thread Christian Kujau

Hi,

since 22cba31bae ("Documentation/sphinx: add basic working Sphinx 
configuration and build") the following warning is emitted when running 
"make help":

$ make help > /dev/null 
Documentation/Makefile.sphinx:17: The 'sphinx-build' command was not 
found. Make sure you have Sphinx installed and in PATH, or set the 
SPHINXBUILD make variable to point to the full path of the 'sphinx-build' 
executable.

Indeed, I don't have "sphinx-build" installed (nor do I want to build 
documentation), running "make SPHINXBUILD=/bin/true help" makes the 
warning go away. Is there a way to omit the warning when 
running "make help"? E.g. by not including Documentation/Makefile.sphinx 
for that target?

Thanks,
Christian.
-- 
BOFH excuse #296:

The hardware bus needs a new token.

Re: [PATCH] KERNEL: resource: Fix bug on leakage in /proc/iomem file

2016-04-06 Thread Christian Kujau

On Wed, 6 Apr 2016, e...@abdsec.com wrote:
> First, I wrote your attached patch, but then I thought zeroing other
> /proc/iomem values would be better. So I changed it.

On my systems, /proc/iomem, /proc/ioports and others get their 
world-readable bits removed during bootup - I guess that would mitigate 
this issue too?

Christian.
-- 
BOFH excuse #184:

loop found in loop in redundant loopback

Re: [PATCH] KERNEL: resource: Fix bug on leakage in /proc/iomem file

2016-04-06 Thread Christian Kujau

On Wed, 6 Apr 2016, e...@abdsec.com wrote:
> First, I wrote your attached patch, but then I thought zeroing other
> /proc/iomem values would be better. So I changed it.

On my systems, /proc/iomem, /proc/ioports and others get their 
world-readable bits removed during bootup - I guess that would mitigate 
this issue too?

Christian.
-- 
BOFH excuse #184:

loop found in loop in redundant loopback

iwlwifi: Error sending REPLY_ADD_STA

2015-09-25 Thread Christian Kujau

Hello,

sometimes the Wifi adapter (Wireless-N 2230) in this Lenovo Thinkpad E431 
"disappears" and cannot be fixed by re-loading the iwlwifi kernel module
either. Only a reboot will do.

When I was running 3.16.0-4-amd64 from Debian/stable, I noticed the 
following message, but only _once_ and Wifi worked fine even with that 
message:

iwlwifi :04:00.0: Error sending REPLY_TX_LINK_QUALITY_CMD: time out after 
2000ms.
iwlwifi :04:00.0: Current CMD queue read_ptr 25 write_ptr 26
iwlwifi :04:00.0: Loaded firmware version: 18.168.6.1
iwlwifi :04:00.0: Microcode SW error detected.  Restarting 0x200.

Now with Linux 4.2 it happens more often, I'll attach a the error below, 
the full kernel logs and .config can be found here: 
http://nerdbynature.de/bits/v4.2/

The initial error messages so far:

Linux version 3.16.0-4-amd64, Aug 17
iwlwifi :04:00.0: Error sending REPLY_TX_LINK_QUALITY_CMD: time out after 
2000ms.

Linux version 4.2.0-rc7, Aug 30
iwlwifi :04:00.0: Error sending REPLY_ADD_STA: time out after 2000ms.

Linux version 4.2.0, Sep 19
iwlwifi :04:00.0: Error sending REPLY_ADD_STA: time out after 2000ms.

Linux version 4.2.0, Sep 20
iwlwifi :04:00.0: Error sending REPLY_ADD_STA: time out after 2000ms.

Linux version 4.2.0, Sep 25
iwlwifi :04:00.0: Error sending REPLY_ADD_STA: time out after 2000ms.

I found old reports of these messages on the net, but either they were 
marked fixed[0] or years old[1][2] or not applicable to my card[3]

Does anybody has an idea what's going on here?

Thanks,
Christian.

[0] https://bugzilla.redhat.com/show_bug.cgi?id=1029785
[1] https://answers.launchpad.net/ubuntu/+source/gnome-nettool/+question/221076
[2] https://lkml.org/lkml/2012/2/8/247
[3] https://lists.fedoraproject.org/pipermail/users/2011-June/400906.html

=== dmesg
 usb 1-3: USB disconnect, device number 2
 iwlwifi :04:00.0: Error sending REPLY_ADD_STA: time out after 2000ms.
 iwlwifi :04:00.0: Current CMD queue read_ptr 192 write_ptr 193
 [ cut here ]
 WARNING: CPU: 2 PID: 25891 at 
/usr/local/src/linux-git/drivers/net/wireless/iwlwifi/pcie/trans.c:1444 
iwl_trans_pcie_grab_nic_access+0x100/0x110 [iwlwifi]()
 Timeout waiting for hardware access (CSR_GP_CNTRL 0x)
 Modules linked in: md4 nls_iso8859_15 cifs auth_rpcgss oid_registry nfsv4 
dns_resolver xfs libcrc32c ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat 
nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter xt_conntrack 
nf_conntrack ip_tables x_tables ctr ccm pci_stub vboxpci(O) vboxnetadp(O) 
vboxnetflt(O) vboxdrv(O) nfs lockd grace fscache sunrpc uvcvideo 
videobuf2_vmalloc videobuf2_memops videobuf2_core v4l2_common videodev 
sha256_ssse3 sha256_generic hmac x86_pkg_temp_thermal drbg snd_hda_codec_hdmi 
intel_powerclamp coretemp aesni_intel aes_x86_64 glue_helper arc4 lrw gf128mul 
ablk_helper cryptd snd_hda_codec_conexant snd_hda_codec_generic snd_pcsp 
psmouse iwldvm snd_hda_intel snd_hda_codec mac80211 snd_hwdep iwlwifi 
snd_hda_core cfg80211 lpc_ich i2c_i801 snd_pcm snd_timer snd shpchp soundcore 
battery ac processor loop fuse autofs4 mmc_block hid_logitech_hidpp 
hid_logitech_dj hid_generic usbhid hid btrfs xor raid6_pq rtsx_pci_sdmmc 
mmc_core xhci_pci ehci_pci xhci_hcd e
 hci_hcd sr_mod cdrom sg rtsx_pci usbcore mfd_core usb_common thermal
 CPU: 2 PID: 25891 Comm: kworker/u16:1 Tainted: G   O4.2.0 #2
 Hardware name: LENOVO 6277CTO/6277CTO, BIOS HEET48WW (1.29 ) 03/13/2015
 Workqueue: phy0 ieee80211_ba_session_work [mac80211]
  c0327bd8 baacb049 c0327bd8 8158173a
  8801cffb7aa0 8104ccb7 8800c27f4000 
  8800c27f7ca8 8801cffb7b38  8104cd48
 Call Trace:
  [] ? dump_stack+0x40/0x50
  [] ? warn_slowpath_common+0x77/0xb0
  [] ? warn_slowpath_fmt+0x58/0x80
  [] ? iwl_trans_pcie_grab_nic_access+0x100/0x110 [iwlwifi]
  [] ? iwl_write_prph+0x2e/0x70 [iwlwifi]
  [] ? iwl_force_nmi+0x1d/0x60 [iwlwifi]
  [] ? iwl_trans_pcie_send_hcmd+0x3c0/0x420 [iwlwifi]
  [] ? wait_woken+0x80/0x80
  [] ? iwl_send_add_sta+0x7f/0xd0 [iwldvm]
  [] ? iwl_sta_rx_agg_stop+0xfb/0x150 [iwldvm]
  [] ? iwlagn_mac_ampdu_action+0x103/0x1e0 [iwldvm]
  [] ? ___ieee80211_stop_rx_ba_session+0xaf/0x1b0 [mac80211]
  [] ? ieee80211_ba_session_work+0x100/0x170 [mac80211]
  [] ? process_one_work+0x137/0x360
  [] ? pwq_activate_delayed_work+0x27/0x40
  [] ? worker_thread+0x5d/0x450
  [] ? perf_cgroup_switch+0x1a0/0x1a0
  [] ? rescuer_thread+0x310/0x310
  [] ? kthread+0xda/0xf0
  [] ? kthread_create_on_node+0x1b0/0x1b0
  [] ? ret_from_fork+0x3f/0x70
  [] ? kthread_create_on_node+0x1b0/0x1b0
 ---[ end trace 6435c974dd1d2317 ]---
 iwlwifi :04:00.0: Loaded firmware version: 18.168.6.1
 iwlwifi :04:00.0: Start IWL Error Log Dump:
 iwlwifi :04:00.0: Status: 0x004C, count: -30719
 iwlwifi :04:00.0: 0xBAACB049 | ADVANCED_SYSASSERT  
 iwlwifi :04:00.0: 0x | uPc

iwlwifi: Error sending REPLY_ADD_STA

2015-09-25 Thread Christian Kujau

Hello,

sometimes the Wifi adapter (Wireless-N 2230) in this Lenovo Thinkpad E431 
"disappears" and cannot be fixed by re-loading the iwlwifi kernel module
either. Only a reboot will do.

When I was running 3.16.0-4-amd64 from Debian/stable, I noticed the 
following message, but only _once_ and Wifi worked fine even with that 
message:

iwlwifi :04:00.0: Error sending REPLY_TX_LINK_QUALITY_CMD: time out after 
2000ms.
iwlwifi :04:00.0: Current CMD queue read_ptr 25 write_ptr 26
iwlwifi :04:00.0: Loaded firmware version: 18.168.6.1
iwlwifi :04:00.0: Microcode SW error detected.  Restarting 0x200.

Now with Linux 4.2 it happens more often, I'll attach a the error below, 
the full kernel logs and .config can be found here: 
http://nerdbynature.de/bits/v4.2/

The initial error messages so far:

Linux version 3.16.0-4-amd64, Aug 17
iwlwifi :04:00.0: Error sending REPLY_TX_LINK_QUALITY_CMD: time out after 
2000ms.

Linux version 4.2.0-rc7, Aug 30
iwlwifi :04:00.0: Error sending REPLY_ADD_STA: time out after 2000ms.

Linux version 4.2.0, Sep 19
iwlwifi :04:00.0: Error sending REPLY_ADD_STA: time out after 2000ms.

Linux version 4.2.0, Sep 20
iwlwifi :04:00.0: Error sending REPLY_ADD_STA: time out after 2000ms.

Linux version 4.2.0, Sep 25
iwlwifi :04:00.0: Error sending REPLY_ADD_STA: time out after 2000ms.

I found old reports of these messages on the net, but either they were 
marked fixed[0] or years old[1][2] or not applicable to my card[3]

Does anybody has an idea what's going on here?

Thanks,
Christian.

[0] https://bugzilla.redhat.com/show_bug.cgi?id=1029785
[1] https://answers.launchpad.net/ubuntu/+source/gnome-nettool/+question/221076
[2] https://lkml.org/lkml/2012/2/8/247
[3] https://lists.fedoraproject.org/pipermail/users/2011-June/400906.html

=== dmesg
 usb 1-3: USB disconnect, device number 2
 iwlwifi :04:00.0: Error sending REPLY_ADD_STA: time out after 2000ms.
 iwlwifi :04:00.0: Current CMD queue read_ptr 192 write_ptr 193
 [ cut here ]
 WARNING: CPU: 2 PID: 25891 at 
/usr/local/src/linux-git/drivers/net/wireless/iwlwifi/pcie/trans.c:1444 
iwl_trans_pcie_grab_nic_access+0x100/0x110 [iwlwifi]()
 Timeout waiting for hardware access (CSR_GP_CNTRL 0x)
 Modules linked in: md4 nls_iso8859_15 cifs auth_rpcgss oid_registry nfsv4 
dns_resolver xfs libcrc32c ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat 
nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter xt_conntrack 
nf_conntrack ip_tables x_tables ctr ccm pci_stub vboxpci(O) vboxnetadp(O) 
vboxnetflt(O) vboxdrv(O) nfs lockd grace fscache sunrpc uvcvideo 
videobuf2_vmalloc videobuf2_memops videobuf2_core v4l2_common videodev 
sha256_ssse3 sha256_generic hmac x86_pkg_temp_thermal drbg snd_hda_codec_hdmi 
intel_powerclamp coretemp aesni_intel aes_x86_64 glue_helper arc4 lrw gf128mul 
ablk_helper cryptd snd_hda_codec_conexant snd_hda_codec_generic snd_pcsp 
psmouse iwldvm snd_hda_intel snd_hda_codec mac80211 snd_hwdep iwlwifi 
snd_hda_core cfg80211 lpc_ich i2c_i801 snd_pcm snd_timer snd shpchp soundcore 
battery ac processor loop fuse autofs4 mmc_block hid_logitech_hidpp 
hid_logitech_dj hid_generic usbhid hid btrfs xor raid6_pq rtsx_pci_sdmmc 
mmc_core xhci_pci ehci_pci xhci_hcd e
 hci_hcd sr_mod cdrom sg rtsx_pci usbcore mfd_core usb_common thermal
 CPU: 2 PID: 25891 Comm: kworker/u16:1 Tainted: G   O4.2.0 #2
 Hardware name: LENOVO 6277CTO/6277CTO, BIOS HEET48WW (1.29 ) 03/13/2015
 Workqueue: phy0 ieee80211_ba_session_work [mac80211]
  c0327bd8 baacb049 c0327bd8 8158173a
  8801cffb7aa0 8104ccb7 8800c27f4000 
  8800c27f7ca8 8801cffb7b38  8104cd48
 Call Trace:
  [] ? dump_stack+0x40/0x50
  [] ? warn_slowpath_common+0x77/0xb0
  [] ? warn_slowpath_fmt+0x58/0x80
  [] ? iwl_trans_pcie_grab_nic_access+0x100/0x110 [iwlwifi]
  [] ? iwl_write_prph+0x2e/0x70 [iwlwifi]
  [] ? iwl_force_nmi+0x1d/0x60 [iwlwifi]
  [] ? iwl_trans_pcie_send_hcmd+0x3c0/0x420 [iwlwifi]
  [] ? wait_woken+0x80/0x80
  [] ? iwl_send_add_sta+0x7f/0xd0 [iwldvm]
  [] ? iwl_sta_rx_agg_stop+0xfb/0x150 [iwldvm]
  [] ? iwlagn_mac_ampdu_action+0x103/0x1e0 [iwldvm]
  [] ? ___ieee80211_stop_rx_ba_session+0xaf/0x1b0 [mac80211]
  [] ? ieee80211_ba_session_work+0x100/0x170 [mac80211]
  [] ? process_one_work+0x137/0x360
  [] ? pwq_activate_delayed_work+0x27/0x40
  [] ? worker_thread+0x5d/0x450
  [] ? perf_cgroup_switch+0x1a0/0x1a0
  [] ? rescuer_thread+0x310/0x310
  [] ? kthread+0xda/0xf0
  [] ? kthread_create_on_node+0x1b0/0x1b0
  [] ? ret_from_fork+0x3f/0x70
  [] ? kthread_create_on_node+0x1b0/0x1b0
 ---[ end trace 6435c974dd1d2317 ]---
 iwlwifi :04:00.0: Loaded firmware version: 18.168.6.1
 iwlwifi :04:00.0: Start IWL Error Log Dump:
 iwlwifi :04:00.0: Status: 0x004C, count: -30719
 iwlwifi :04:00.0: 0xBAACB049 | ADVANCED_SYSASSERT  
 iwlwifi :04:00.0: 0x | uPc

Re: [PATCH] drivers/base: fix typo

2015-08-25 Thread Christian Kujau

On Thu, 20 Aug 2015, Junesung Lee wrote:
> The word "filesystem" is being used without the space.

I think both versions are acceptable:

  https://en.wiktionary.org/wiki/file_system

Even though the version w/o the space appears to be more common in the 
source:

$ git grep file\ system | wc -l
1473

$ git grep filesystem | wc -l
4321

Christian.

> 
> Signed-off-by: Junesung Lee 
> ---
>  drivers/base/Kconfig | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
> index 98504ec..9140666 100644
> --- a/drivers/base/Kconfig
> +++ b/drivers/base/Kconfig
> @@ -42,7 +42,7 @@ config DEVTMPFS
> rescue systems, and reliably handles dynamic major/minor numbers.
>  
> Notice: if CONFIG_TMPFS isn't enabled, the simpler ramfs
> -   file system will be used instead.
> +   filesystem will be used instead.
>  
>  config DEVTMPFS_MOUNT
>   bool "Automount devtmpfs at /dev, after the kernel mounted the rootfs"
> @@ -100,7 +100,7 @@ config FIRMWARE_IN_KERNEL
> Enabling this option will build each required firmware blob
> into the kernel directly, where request_firmware() will find
> them without having to call out to userspace. This may be
> -   useful if your root file system requires a device that uses
> +   useful if your root filesystem requires a device that uses
> such firmware and do not wish to use an initrd.
>  
> This single option controls the inclusion of firmware for
> -- 
> 2.1.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

-- 
BOFH excuse #416:

We're out of slots on the server
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] drivers/base: fix typo

2015-08-25 Thread Christian Kujau

On Thu, 20 Aug 2015, Junesung Lee wrote:
 The word filesystem is being used without the space.

I think both versions are acceptable:

  https://en.wiktionary.org/wiki/file_system

Even though the version w/o the space appears to be more common in the 
source:

$ git grep file\ system | wc -l
1473

$ git grep filesystem | wc -l
4321

Christian.

 
 Signed-off-by: Junesung Lee junesoung...@gmail.com
 ---
  drivers/base/Kconfig | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)
 
 diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
 index 98504ec..9140666 100644
 --- a/drivers/base/Kconfig
 +++ b/drivers/base/Kconfig
 @@ -42,7 +42,7 @@ config DEVTMPFS
 rescue systems, and reliably handles dynamic major/minor numbers.
  
 Notice: if CONFIG_TMPFS isn't enabled, the simpler ramfs
 -   file system will be used instead.
 +   filesystem will be used instead.
  
  config DEVTMPFS_MOUNT
   bool Automount devtmpfs at /dev, after the kernel mounted the rootfs
 @@ -100,7 +100,7 @@ config FIRMWARE_IN_KERNEL
 Enabling this option will build each required firmware blob
 into the kernel directly, where request_firmware() will find
 them without having to call out to userspace. This may be
 -   useful if your root file system requires a device that uses
 +   useful if your root filesystem requires a device that uses
 such firmware and do not wish to use an initrd.
  
 This single option controls the inclusion of firmware for
 -- 
 2.1.4
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
 

-- 
BOFH excuse #416:

We're out of slots on the server
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 4.1-rc6: ATA link is slow to respond, please be patient

2015-08-08 Thread Christian Kujau

On August 8, 2015 1:57:05 AM PDT, Denis Kirjanov  wrote:
>On 8/7/15, Christian Kujau  wrote:
>> Hi,
>>
>> this PowerBook G4 was running 3.16 for a while but now I wanted to
>upgrade
>> to latest mainline. However, during bootup the following happens:
>>
>> ===
>> [2.237102] ata1: PATA max UDMA/100 irq 39
>> [2.401708] ata1.00: ATA-8: SAMSUNG HM061GC, LR100-10, max
>UDMA/100
>> [2.401764] ata1.00: 117231408 sectors, multi 16: LBA48
>> [2.417633] ata1.00: configured for UDMA/100
>> [   44.918102] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action
>0x0
>> [   44.920452] ata1.00: failed command: READ DMA
>> [   44.922725] ata1.00: cmd c8/00:88:64:c2:12/00:00:00:00:00/e0 tag 0
>dma
>> 69632 in
>> [   44.927257] ata1.00: status: { DRDY }
>> [   49.971784] ata1.00: qc timeout (cmd 0xec)
>> [   49.976529] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
>> [   49.978908] ata1.00: revalidation failed (errno=-5)
>> [   55.019662] ata1: link is slow to respond, please be patient
>(ready=0)
>> [   60.007677] ata1: device not ready (errno=-16), forcing hardreset
>> [   60.012670] ata1: soft resetting link
>> [   60.193638] ata1.00: configured for UDMA/100
>> [   60.196158] ata1.00: device reported invalid CHS sector 0
>> [   60.198610] ata1: EH complete
>> ===
>
>Just tried 4.2.0-rc5+ and haven't hit the issue.
>
>[   17.180034] pata-pci-macio 0002:20:0d.0: enabling device ( ->
>0002)
>[   17.185862] adb: starting probe task...
>[   17.196011] pata-pci-macio 0002:20:0d.0: Activating pata-macio
>chipset UniNorth ATA-6, Apple bus ID 3
>[   17.202312] scsi host0: pata_macio
>[   17.203698] ata1: PATA max UDMA/100 irq 39
>[   17.219397] adb devices: [2]: 2 c4 [7]: 7 1f
>[   17.225400] ADB keyboard at 2, handler 1
>[   17.225560] Detected ADB keyboard, type ISO, swapping keys.
>[   17.226642] input: ADB keyboard as /devices/virtual/input/input0
>[   17.227590] input: ADB Powerbook buttons as
>/devices/virtual/input/input1
>[   17.227795] adb: finished probe task...
>[   17.368537] ata1.00: ATA-6: TOSHIBA MK8026GAX, PA005B, max UDMA/100
>[   17.368717] ata1.00: 156301488 sectors, multi 16: LBA48
>[   17.376346] ata1.00: configured for UDMA/100
>[   17.377544] scsi 0:0:0:0: Direct-Access ATA  TOSHIBA
>MK8026GA 5B   PQ: 0 ANSI: 5
>[   17.386989] sd 0:0:0:0: [sda] 156301488 512-byte logical blocks:
>(80.0 GB/74.5 GiB)
>[   17.393144] sd 0:0:0:0: [sda] Write Protect is off
>[   17.397579] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
>[   17.398215] sd 0:0:0:0: Attached scsi generic sg0 type 0
>[   17.404124] sd 0:0:0:0: [sda] Write cache: enabled, read cache:
>enabled, doesn't support DPO or FUA
>[   17.661225]  sda: [mac] sda1 sda2 sda3 sda4
>[   17.672937] sd 0:0:0:0: [sda] Attached SCSI disk
>[   18.223985] pata-macio 0.0002:ata-3: Activating pata-macio
>chipset KeyLargo ATA-3, Apple bus ID 0
>[   18.233397] scsi host1: pata_macio
>[   18.239172] ata2: PATA max MWDMA2 irq 24
>
>
>>
>> This happens only once, but systemd thinks there's a hard problem and
>will
>> drop to a recovery shell. I can start sshd and login remotely and
>then the
>> system appears to be running just fine.
>>
>> This happened in 4.2.0-rc5 so I went back a few versions and found
>that
>> 4.1-rc5 was OK (the error does not show up and the system boots just
>fine)
>> and 4.1-rc6 is not.
>>
>> Unfortunately a git-bisect between these two versions went completly
>off
>> the charts, I don't know what happened here:
>>
>> ==
>> first bad commit:
>>
>> 0fa372b6c95013af1334b3d5c9b5f03a70ecedab is the first bad commit
>> commit 0fa372b6c95013af1334b3d5c9b5f03a70ecedab
>> Author: Takashi Iwai 
>> Date:   Wed May 27 16:17:19 2015 +0200
>>
>> ALSA: hda - Fix noise on AMD radeon 290x controller
>> ==
>>
>> I don't have this driver (or ALSA) even selected. I can reproduce
>this
>> error pretty reliably and I'd like to attempt another git-bisect
>> run when I'm more awake. But maybe somebody recognizes this error and
>> has a hint where this could come from?
>>
>> dmesg & .config:  http://nerdbynature.de/bits/v4.1-rc6/
>>
>> Thanks,
>> Christian.
>> --
>> BOFH excuse #225:
>>
>> It's those computer people in X {city of world}.  They keep stuffing
>things
>> up.
>> ___
>> Linuxppc-dev mailing list
>> linuxppc-...@lists.ozlabs.org
>> https://lists.ozlabs.org/listinfo/linuxppc-dev

Can you send me your .config or did you use my .config, verbatim?

I'll try another git-bisect later today.

Thanks,
Christian.
-- 
make bzImage, not war
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 4.1-rc6: ATA link is slow to respond, please be patient

2015-08-08 Thread Christian Kujau

On August 8, 2015 1:57:05 AM PDT, Denis Kirjanov k...@linux-powerpc.org wrote:
On 8/7/15, Christian Kujau li...@nerdbynature.de wrote:
 Hi,

 this PowerBook G4 was running 3.16 for a while but now I wanted to
upgrade
 to latest mainline. However, during bootup the following happens:

 ===
 [2.237102] ata1: PATA max UDMA/100 irq 39
 [2.401708] ata1.00: ATA-8: SAMSUNG HM061GC, LR100-10, max
UDMA/100
 [2.401764] ata1.00: 117231408 sectors, multi 16: LBA48
 [2.417633] ata1.00: configured for UDMA/100
 [   44.918102] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action
0x0
 [   44.920452] ata1.00: failed command: READ DMA
 [   44.922725] ata1.00: cmd c8/00:88:64:c2:12/00:00:00:00:00/e0 tag 0
dma
 69632 in
 [   44.927257] ata1.00: status: { DRDY }
 [   49.971784] ata1.00: qc timeout (cmd 0xec)
 [   49.976529] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
 [   49.978908] ata1.00: revalidation failed (errno=-5)
 [   55.019662] ata1: link is slow to respond, please be patient
(ready=0)
 [   60.007677] ata1: device not ready (errno=-16), forcing hardreset
 [   60.012670] ata1: soft resetting link
 [   60.193638] ata1.00: configured for UDMA/100
 [   60.196158] ata1.00: device reported invalid CHS sector 0
 [   60.198610] ata1: EH complete
 ===

Just tried 4.2.0-rc5+ and haven't hit the issue.

[   17.180034] pata-pci-macio 0002:20:0d.0: enabling device ( -
0002)
[   17.185862] adb: starting probe task...
[   17.196011] pata-pci-macio 0002:20:0d.0: Activating pata-macio
chipset UniNorth ATA-6, Apple bus ID 3
[   17.202312] scsi host0: pata_macio
[   17.203698] ata1: PATA max UDMA/100 irq 39
[   17.219397] adb devices: [2]: 2 c4 [7]: 7 1f
[   17.225400] ADB keyboard at 2, handler 1
[   17.225560] Detected ADB keyboard, type ISO, swapping keys.
[   17.226642] input: ADB keyboard as /devices/virtual/input/input0
[   17.227590] input: ADB Powerbook buttons as
/devices/virtual/input/input1
[   17.227795] adb: finished probe task...
[   17.368537] ata1.00: ATA-6: TOSHIBA MK8026GAX, PA005B, max UDMA/100
[   17.368717] ata1.00: 156301488 sectors, multi 16: LBA48
[   17.376346] ata1.00: configured for UDMA/100
[   17.377544] scsi 0:0:0:0: Direct-Access ATA  TOSHIBA
MK8026GA 5B   PQ: 0 ANSI: 5
[   17.386989] sd 0:0:0:0: [sda] 156301488 512-byte logical blocks:
(80.0 GB/74.5 GiB)
[   17.393144] sd 0:0:0:0: [sda] Write Protect is off
[   17.397579] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[   17.398215] sd 0:0:0:0: Attached scsi generic sg0 type 0
[   17.404124] sd 0:0:0:0: [sda] Write cache: enabled, read cache:
enabled, doesn't support DPO or FUA
[   17.661225]  sda: [mac] sda1 sda2 sda3 sda4
[   17.672937] sd 0:0:0:0: [sda] Attached SCSI disk
[   18.223985] pata-macio 0.0002:ata-3: Activating pata-macio
chipset KeyLargo ATA-3, Apple bus ID 0
[   18.233397] scsi host1: pata_macio
[   18.239172] ata2: PATA max MWDMA2 irq 24



 This happens only once, but systemd thinks there's a hard problem and
will
 drop to a recovery shell. I can start sshd and login remotely and
then the
 system appears to be running just fine.

 This happened in 4.2.0-rc5 so I went back a few versions and found
that
 4.1-rc5 was OK (the error does not show up and the system boots just
fine)
 and 4.1-rc6 is not.

 Unfortunately a git-bisect between these two versions went completly
off
 the charts, I don't know what happened here:

 ==
 first bad commit:

 0fa372b6c95013af1334b3d5c9b5f03a70ecedab is the first bad commit
 commit 0fa372b6c95013af1334b3d5c9b5f03a70ecedab
 Author: Takashi Iwai ti...@suse.de
 Date:   Wed May 27 16:17:19 2015 +0200

 ALSA: hda - Fix noise on AMD radeon 290x controller
 ==

 I don't have this driver (or ALSA) even selected. I can reproduce
this
 error pretty reliably and I'd like to attempt another git-bisect
 run when I'm more awake. But maybe somebody recognizes this error and
 has a hint where this could come from?

 dmesg  .config:  http://nerdbynature.de/bits/v4.1-rc6/

 Thanks,
 Christian.
 --
 BOFH excuse #225:

 It's those computer people in X {city of world}.  They keep stuffing
things
 up.
 ___
 Linuxppc-dev mailing list
 linuxppc-...@lists.ozlabs.org
 https://lists.ozlabs.org/listinfo/linuxppc-dev

Can you send me your .config or did you use my .config, verbatim?

I'll try another git-bisect later today.

Thanks,
Christian.
-- 
make bzImage, not war
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

4.1-rc6: ATA link is slow to respond, please be patient

2015-08-07 Thread Christian Kujau

Hi,

this PowerBook G4 was running 3.16 for a while but now I wanted to upgrade 
to latest mainline. However, during bootup the following happens:

===
[2.237102] ata1: PATA max UDMA/100 irq 39
[2.401708] ata1.00: ATA-8: SAMSUNG HM061GC, LR100-10, max UDMA/100
[2.401764] ata1.00: 117231408 sectors, multi 16: LBA48 
[2.417633] ata1.00: configured for UDMA/100
[   44.918102] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[   44.920452] ata1.00: failed command: READ DMA
[   44.922725] ata1.00: cmd c8/00:88:64:c2:12/00:00:00:00:00/e0 tag 0 dma 69632 
in
[   44.927257] ata1.00: status: { DRDY }
[   49.971784] ata1.00: qc timeout (cmd 0xec)
[   49.976529] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[   49.978908] ata1.00: revalidation failed (errno=-5)
[   55.019662] ata1: link is slow to respond, please be patient (ready=0)
[   60.007677] ata1: device not ready (errno=-16), forcing hardreset
[   60.012670] ata1: soft resetting link
[   60.193638] ata1.00: configured for UDMA/100
[   60.196158] ata1.00: device reported invalid CHS sector 0
[   60.198610] ata1: EH complete
===

This happens only once, but systemd thinks there's a hard problem and will 
drop to a recovery shell. I can start sshd and login remotely and then the 
system appears to be running just fine.

This happened in 4.2.0-rc5 so I went back a few versions and found that
4.1-rc5 was OK (the error does not show up and the system boots just fine)
and 4.1-rc6 is not.

Unfortunately a git-bisect between these two versions went completly off 
the charts, I don't know what happened here:

==
first bad commit:

0fa372b6c95013af1334b3d5c9b5f03a70ecedab is the first bad commit
commit 0fa372b6c95013af1334b3d5c9b5f03a70ecedab
Author: Takashi Iwai 
Date:   Wed May 27 16:17:19 2015 +0200

ALSA: hda - Fix noise on AMD radeon 290x controller
==

I don't have this driver (or ALSA) even selected. I can reproduce this 
error pretty reliably and I'd like to attempt another git-bisect
run when I'm more awake. But maybe somebody recognizes this error and
has a hint where this could come from?

dmesg & .config:  http://nerdbynature.de/bits/v4.1-rc6/

Thanks,
Christian.
-- 
BOFH excuse #225:

It's those computer people in X {city of world}.  They keep stuffing things up.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

4.1-rc6: ATA link is slow to respond, please be patient

2015-08-07 Thread Christian Kujau

Hi,

this PowerBook G4 was running 3.16 for a while but now I wanted to upgrade 
to latest mainline. However, during bootup the following happens:

===
[2.237102] ata1: PATA max UDMA/100 irq 39
[2.401708] ata1.00: ATA-8: SAMSUNG HM061GC, LR100-10, max UDMA/100
[2.401764] ata1.00: 117231408 sectors, multi 16: LBA48 
[2.417633] ata1.00: configured for UDMA/100
[   44.918102] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[   44.920452] ata1.00: failed command: READ DMA
[   44.922725] ata1.00: cmd c8/00:88:64:c2:12/00:00:00:00:00/e0 tag 0 dma 69632 
in
[   44.927257] ata1.00: status: { DRDY }
[   49.971784] ata1.00: qc timeout (cmd 0xec)
[   49.976529] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[   49.978908] ata1.00: revalidation failed (errno=-5)
[   55.019662] ata1: link is slow to respond, please be patient (ready=0)
[   60.007677] ata1: device not ready (errno=-16), forcing hardreset
[   60.012670] ata1: soft resetting link
[   60.193638] ata1.00: configured for UDMA/100
[   60.196158] ata1.00: device reported invalid CHS sector 0
[   60.198610] ata1: EH complete
===

This happens only once, but systemd thinks there's a hard problem and will 
drop to a recovery shell. I can start sshd and login remotely and then the 
system appears to be running just fine.

This happened in 4.2.0-rc5 so I went back a few versions and found that
4.1-rc5 was OK (the error does not show up and the system boots just fine)
and 4.1-rc6 is not.

Unfortunately a git-bisect between these two versions went completly off 
the charts, I don't know what happened here:

==
first bad commit:

0fa372b6c95013af1334b3d5c9b5f03a70ecedab is the first bad commit
commit 0fa372b6c95013af1334b3d5c9b5f03a70ecedab
Author: Takashi Iwai ti...@suse.de
Date:   Wed May 27 16:17:19 2015 +0200

ALSA: hda - Fix noise on AMD radeon 290x controller
==

I don't have this driver (or ALSA) even selected. I can reproduce this 
error pretty reliably and I'd like to attempt another git-bisect
run when I'm more awake. But maybe somebody recognizes this error and
has a hint where this could come from?

dmesg  .config:  http://nerdbynature.de/bits/v4.1-rc6/

Thanks,
Christian.
-- 
BOFH excuse #225:

It's those computer people in X {city of world}.  They keep stuffing things up.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] fallback to hostname in scripts/package/builddeb

2015-08-02 Thread Christian Kujau

Hi,

I happened to build a kernel with "make deb-pkg" on a machine with no 
network connectivity, but this failed with:

[...]
  INSTALL debian/headertmp/usr/include/asm/ (65 files)
hostname: Name or service not known
../scripts/package/Makefile:90: recipe for target 'deb-pkg' failed
make[2]: *** [deb-pkg] Error 1

In scripts/package/builddeb it tries to construct an email address (that 
can be queried in /proc/version later on) but with no network, 
the "hostname -f" fails. The following patch falls back to just use the 
shortname if we cannot determine our FQDN.

Signed-off-by: Christian Kujau 

diff --git a/scripts/package/builddeb b/scripts/package/builddeb
index 88dbf23..7de1d1c 100755
--- a/scripts/package/builddeb
+++ b/scripts/package/builddeb
@@ -206,7 +206,7 @@ if [ -n "$DEBEMAIL" ]; then
 elif [ -n "$EMAIL" ]; then
email=$EMAIL
 else
-   email=$(id -nu)@$(hostname -f)
+   email=$(id -nu)@$(hostname -f 2>/dev/null || hostname)
 fi
 if [ -n "$DEBFULLNAME" ]; then
name=$DEBFULLNAME


-- 
BOFH excuse #334:

50% of the manual is in .pdf readme files
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] fallback to hostname in scripts/package/builddeb

2015-08-02 Thread Christian Kujau

Hi,

I happened to build a kernel with make deb-pkg on a machine with no 
network connectivity, but this failed with:

[...]
  INSTALL debian/headertmp/usr/include/asm/ (65 files)
hostname: Name or service not known
../scripts/package/Makefile:90: recipe for target 'deb-pkg' failed
make[2]: *** [deb-pkg] Error 1

In scripts/package/builddeb it tries to construct an email address (that 
can be queried in /proc/version later on) but with no network, 
the hostname -f fails. The following patch falls back to just use the 
shortname if we cannot determine our FQDN.

Signed-off-by: Christian Kujau li...@nerdbynature.de

diff --git a/scripts/package/builddeb b/scripts/package/builddeb
index 88dbf23..7de1d1c 100755
--- a/scripts/package/builddeb
+++ b/scripts/package/builddeb
@@ -206,7 +206,7 @@ if [ -n $DEBEMAIL ]; then
 elif [ -n $EMAIL ]; then
email=$EMAIL
 else
-   email=$(id -nu)@$(hostname -f)
+   email=$(id -nu)@$(hostname -f 2/dev/null || hostname)
 fi
 if [ -n $DEBFULLNAME ]; then
name=$DEBFULLNAME


-- 
BOFH excuse #334:

50% of the manual is in .pdf readme files
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: WARNING: CPU: 6 PID: 79 at fs/proc/generic.c:521 remove_proc_entry+0x170/0x180()

2014-08-19 Thread Christian Kujau

On Tue, 19 Aug 2014 at 20:13, Cong Wang wrote:
> On Tue, Aug 19, 2014 at 7:50 PM, Jiang Liu  wrote:
> > Hi Kujau,
> > It seems like a different issue, something wrong with
> > void nfs_fs_proc_net_exit(struct net *net)
> 
> http://marc.info/?l=linux-nfs=140821782107427=2

Thanks, that helped!

Christian.
-- 
BOFH excuse #182:

endothermal recalibration
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

WARNING: CPU: 6 PID: 79 at fs/proc/generic.c:521 remove_proc_entry+0x170/0x180()

2014-08-19 Thread Christian Kujau

Hi,

the warning below appeared while booting 3.17.0-rc1. I haven't seen the 
warning before, but found a recent report on oops.kernel.org:

http://oops.kernel.org/oops/warning-at-fs-proc-generic-c521-remove_proc_entry0x18f-0x1a0/

and also reports from July 2014, where the issue was reported to be fixed:

https://lkml.org/lkml/2014/7/16/9
https://lkml.org/lkml/2014/7/18/116

And the patch really made it into 3.17.0-rc1, so maybe it's something else 
this time. Details and .config: http://nerdbynature.de/bits/3.17-rc1/

Thanks,
Christian.

[ cut here ]
WARNING: CPU: 6 PID: 79 at /usr/local/src/linux-git/fs/proc/generic.c:521 
remove_proc_entry+0x170/0x180()
remove_proc_entry: removing non-empty directory 'fs/nfsfs', leaking at least 
'volumes'
Modules linked in: uvcvideo videobuf2_vmalloc videobuf2_memops 
videobuf2_core v4l2_common btusb videodev bluetooth hid_logitech_dj 
sha256_ssse3 sha256_generic twofish_generic twofish_avx_x86_64 
twofish_x86_64_3way twofish_x86_64 twofish_common 
nfs xts lockd sunrpc arc4 coretemp x86_pkg_temp_thermal usbhid 
intel_powerclamp hid iwldvm kvm_intel
 mac80211 kvm snd_hda_codec_hdmi i2c_i801 iwlwifi cfg80211 thinkpad_acpi 
snd_hda_codec_conexant snd_hda_codec_generic nvram hwmon led_class wmi 
rtc_cmos i915 snd_hda_intel 
i2c_algo_bit snd_hda_controller drm_kms_helper snd_hda_codec drm snd_hwdep 
snd_pcm i2ccore snd_timer snd soundcore fuse autofs4 btrfs xor raid6_pq 
aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd sr_mod cdrom 
sg ehci_pci 
ehci_hcd xhci_hcd
CPU: 6 PID: 79 Comm: kworker/u16:6 Not tainted 3.17.0-rc1 #1
Hardware name: LENOVO 6277CTO/6277CTO, BIOS HEET42WW (1.23 ) 01/27/2014
Workqueue: netns cleanup_net
 0009 8149c1e2 880406c8fd18 8104ee6d
 880406701580 880406c8fd68 0005 c0883bae
 c0883bb1 8104eed7 815b1578 88040030
Call Trace:
 [] ? dump_stack+0x41/0x51
 [] ? warn_slowpath_common+0x6d/0x90
 [] ? warn_slowpath_fmt+0x47/0x50
 [] ? proc_entry_rundown+0x41/0x80
 [] ? remove_proc_entry+0x170/0x180
 [] ? nfs_net_exit+0x9/0x20 [nfs]
 [] ? ops_exit_list.isra.2+0x31/0x60
 [] ? cleanup_net+0x100/0x1e0
 [] ? process_one_work+0x16b/0x3b0
 [] ? worker_thread+0x63/0x490
 [] ? rescuer_thread+0x280/0x280
 [] ? kthread+0xca/0xe0
 [] ? kthread_create_on_node+0x170/0x170
 [] ? ret_from_fork+0x7c/0xb0
 [] ? kthread_create_on_node+0x170/0x170
---[ end trace c92165dd3f372cf6 ]---


-- 
BOFH excuse #285:

Telecommunications is upgrading.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

WARNING: CPU: 6 PID: 79 at fs/proc/generic.c:521 remove_proc_entry+0x170/0x180()

2014-08-19 Thread Christian Kujau

Hi,

the warning below appeared while booting 3.17.0-rc1. I haven't seen the 
warning before, but found a recent report on oops.kernel.org:

http://oops.kernel.org/oops/warning-at-fs-proc-generic-c521-remove_proc_entry0x18f-0x1a0/

and also reports from July 2014, where the issue was reported to be fixed:

https://lkml.org/lkml/2014/7/16/9
https://lkml.org/lkml/2014/7/18/116

And the patch really made it into 3.17.0-rc1, so maybe it's something else 
this time. Details and .config: http://nerdbynature.de/bits/3.17-rc1/

Thanks,
Christian.

[ cut here ]
WARNING: CPU: 6 PID: 79 at /usr/local/src/linux-git/fs/proc/generic.c:521 
remove_proc_entry+0x170/0x180()
remove_proc_entry: removing non-empty directory 'fs/nfsfs', leaking at least 
'volumes'
Modules linked in: uvcvideo videobuf2_vmalloc videobuf2_memops 
videobuf2_core v4l2_common btusb videodev bluetooth hid_logitech_dj 
sha256_ssse3 sha256_generic twofish_generic twofish_avx_x86_64 
twofish_x86_64_3way twofish_x86_64 twofish_common 
nfs xts lockd sunrpc arc4 coretemp x86_pkg_temp_thermal usbhid 
intel_powerclamp hid iwldvm kvm_intel
 mac80211 kvm snd_hda_codec_hdmi i2c_i801 iwlwifi cfg80211 thinkpad_acpi 
snd_hda_codec_conexant snd_hda_codec_generic nvram hwmon led_class wmi 
rtc_cmos i915 snd_hda_intel 
i2c_algo_bit snd_hda_controller drm_kms_helper snd_hda_codec drm snd_hwdep 
snd_pcm i2ccore snd_timer snd soundcore fuse autofs4 btrfs xor raid6_pq 
aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd sr_mod cdrom 
sg ehci_pci 
ehci_hcd xhci_hcd
CPU: 6 PID: 79 Comm: kworker/u16:6 Not tainted 3.17.0-rc1 #1
Hardware name: LENOVO 6277CTO/6277CTO, BIOS HEET42WW (1.23 ) 01/27/2014
Workqueue: netns cleanup_net
 0009 8149c1e2 880406c8fd18 8104ee6d
 880406701580 880406c8fd68 0005 c0883bae
 c0883bb1 8104eed7 815b1578 88040030
Call Trace:
 [8149c1e2] ? dump_stack+0x41/0x51
 [8104ee6d] ? warn_slowpath_common+0x6d/0x90
 [8104eed7] ? warn_slowpath_fmt+0x47/0x50
 [811955d1] ? proc_entry_rundown+0x41/0x80
 [81199b50] ? remove_proc_entry+0x170/0x180
 [c0873a79] ? nfs_net_exit+0x9/0x20 [nfs]
 [813e5951] ? ops_exit_list.isra.2+0x31/0x60
 [813e6150] ? cleanup_net+0x100/0x1e0
 [8106316b] ? process_one_work+0x16b/0x3b0
 [81063ed3] ? worker_thread+0x63/0x490
 [81063e70] ? rescuer_thread+0x280/0x280
 [8106848a] ? kthread+0xca/0xe0
 [810683c0] ? kthread_create_on_node+0x170/0x170
 [814a1b7c] ? ret_from_fork+0x7c/0xb0
 [810683c0] ? kthread_create_on_node+0x170/0x170
---[ end trace c92165dd3f372cf6 ]---


-- 
BOFH excuse #285:

Telecommunications is upgrading.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: WARNING: CPU: 6 PID: 79 at fs/proc/generic.c:521 remove_proc_entry+0x170/0x180()

2014-08-19 Thread Christian Kujau

On Tue, 19 Aug 2014 at 20:13, Cong Wang wrote:
 On Tue, Aug 19, 2014 at 7:50 PM, Jiang Liu jiang@linux.intel.com wrote:
  Hi Kujau,
  It seems like a different issue, something wrong with
  void nfs_fs_proc_net_exit(struct net *net)
 
 http://marc.info/?l=linux-nfsm=140821782107427w=2

Thanks, that helped!

Christian.
-- 
BOFH excuse #182:

endothermal recalibration
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] remove non-existent files from MAINTAINERS

2014-08-10 Thread Christian Kujau

Inspired by some recent cleanups in MAINTAINERS the following
files (F:) cannot be found any more in the tree:

* arch/arm/mach-s5pv210/mach-aquila.c
* arch/arm/mach-s5pv210/mach-goni.c

  Those two got removed in 28c8331 ("ARM: S5PV210: Remove support for board 
files").
  Cc: Tomasz Figa 
  Cc: Kyungmin Park 

* arch/arm/configs/genmai_defconfig

  This one got removed in 3ed27bd9 ("ARM: shmobile: genmai: remove 
defconfig").
  Cc: Simon Horman 
  Cc: Magnus Damm 

* drivers/mmc/host/sdhci-st.c

  This one was sent to be included in June 2014 but got dropped shortly 
after:
  "mmc: sdhci-st: Intial support for ST SDHCI controller"
  https://lkml.org/lkml/2014/6/4/446
  https://lkml.org/lkml/2014/7/9/340
  Cc: Peter Griffin 
  Cc: Ulf Hansson 

* drivers/rtc/rtc-sec.c

  A MAINTAINERS fix was attempted in November 2012, but dismissed as 
rtc-sec.c
  was still being worked on. Alas, it's still not there.
  "MAINTAINERS: fix drivers/rtc/rtc-sec.c"
  http://lkml.iu.edu/hypermail/linux/kernel/1211.2/04820.html
  Cc: Sangbeom Kim 
  Cc: Cesar Eduardo Barros 

Signed-off-by: Christian Kujau 

diff --git a/MAINTAINERS b/MAINTAINERS
index 7e2eb4c..7831e8d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1303,8 +1303,7 @@ ARM/SAMSUNG MOBILE MACHINE SUPPORT
 M: Kyungmin Park 
 L: linux-arm-ker...@lists.infradead.org (moderated for non-subscribers)
 S: Maintained
-F: arch/arm/mach-s5pv210/mach-aquila.c
-F: arch/arm/mach-s5pv210/mach-goni.c
+F: arch/arm/mach-s5pv210/

 ARM/SAMSUNG S5P SERIES 2D GRAPHICS ACCELERATION (G2D) SUPPORT
 M: Kyungmin Park 
@@ -1347,7 +1346,6 @@ F:arch/arm/boot/dts/sh*
 F: arch/arm/configs/ape6evm_defconfig
 F: arch/arm/configs/armadillo800eva_defconfig
 F: arch/arm/configs/bockw_defconfig
-F: arch/arm/configs/genmai_defconfig
 F: arch/arm/configs/koelsch_defconfig
 F: arch/arm/configs/kzm9g_defconfig
 F: arch/arm/configs/lager_defconfig
@@ -1383,7 +1381,6 @@ F:drivers/pinctrl/pinctrl-st.c
 F: drivers/media/rc/st_rc.c
 F: drivers/i2c/busses/i2c-st.c
 F: drivers/tty/serial/st-asc.c
-F: drivers/mmc/host/sdhci-st.c

 ARM/TECHNOLOGIC SYSTEMS TS7250 MACHINE SUPPORT
 M: Lennert Buytenhek 
@@ -7809,7 +7806,6 @@ S:Supported
 F: drivers/mfd/sec*.c
 F: drivers/regulator/s2m*.c
 F: drivers/regulator/s5m*.c
-F: drivers/rtc/rtc-sec.c
 F: include/linux/mfd/samsung/

 SAMSUNG S5P/EXYNOS4 SOC SERIES CAMERA SUBSYSTEM DRIVERS
-- 
BOFH excuse #419:

Repeated reboots of the system failed to solve problem
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] remove non-existent files from MAINTAINERS

2014-08-10 Thread Christian Kujau

Inspired by some recent cleanups in MAINTAINERS the following
files (F:) cannot be found any more in the tree:

* arch/arm/mach-s5pv210/mach-aquila.c
* arch/arm/mach-s5pv210/mach-goni.c

  Those two got removed in 28c8331 (ARM: S5PV210: Remove support for board 
files).
  Cc: Tomasz Figa t.f...@samsung.com
  Cc: Kyungmin Park kyungmin.p...@samsung.com

* arch/arm/configs/genmai_defconfig

  This one got removed in 3ed27bd9 (ARM: shmobile: genmai: remove 
defconfig).
  Cc: Simon Horman ho...@verge.net.au
  Cc: Magnus Damm magnus.d...@gmail.com

* drivers/mmc/host/sdhci-st.c

  This one was sent to be included in June 2014 but got dropped shortly 
after:
  mmc: sdhci-st: Intial support for ST SDHCI controller
  https://lkml.org/lkml/2014/6/4/446
  https://lkml.org/lkml/2014/7/9/340
  Cc: Peter Griffin peter.grif...@linaro.org
  Cc: Ulf Hansson ulf.hans...@linaro.org

* drivers/rtc/rtc-sec.c

  A MAINTAINERS fix was attempted in November 2012, but dismissed as 
rtc-sec.c
  was still being worked on. Alas, it's still not there.
  MAINTAINERS: fix drivers/rtc/rtc-sec.c
  http://lkml.iu.edu/hypermail/linux/kernel/1211.2/04820.html
  Cc: Sangbeom Kim sbki...@samsung.com
  Cc: Cesar Eduardo Barros ces...@cesarb.eti.br

Signed-off-by: Christian Kujau li...@nerdbynature.de

diff --git a/MAINTAINERS b/MAINTAINERS
index 7e2eb4c..7831e8d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1303,8 +1303,7 @@ ARM/SAMSUNG MOBILE MACHINE SUPPORT
 M: Kyungmin Park kyungmin.p...@samsung.com
 L: linux-arm-ker...@lists.infradead.org (moderated for non-subscribers)
 S: Maintained
-F: arch/arm/mach-s5pv210/mach-aquila.c
-F: arch/arm/mach-s5pv210/mach-goni.c
+F: arch/arm/mach-s5pv210/

 ARM/SAMSUNG S5P SERIES 2D GRAPHICS ACCELERATION (G2D) SUPPORT
 M: Kyungmin Park kyungmin.p...@samsung.com
@@ -1347,7 +1346,6 @@ F:arch/arm/boot/dts/sh*
 F: arch/arm/configs/ape6evm_defconfig
 F: arch/arm/configs/armadillo800eva_defconfig
 F: arch/arm/configs/bockw_defconfig
-F: arch/arm/configs/genmai_defconfig
 F: arch/arm/configs/koelsch_defconfig
 F: arch/arm/configs/kzm9g_defconfig
 F: arch/arm/configs/lager_defconfig
@@ -1383,7 +1381,6 @@ F:drivers/pinctrl/pinctrl-st.c
 F: drivers/media/rc/st_rc.c
 F: drivers/i2c/busses/i2c-st.c
 F: drivers/tty/serial/st-asc.c
-F: drivers/mmc/host/sdhci-st.c

 ARM/TECHNOLOGIC SYSTEMS TS7250 MACHINE SUPPORT
 M: Lennert Buytenhek ker...@wantstofly.org
@@ -7809,7 +7806,6 @@ S:Supported
 F: drivers/mfd/sec*.c
 F: drivers/regulator/s2m*.c
 F: drivers/regulator/s5m*.c
-F: drivers/rtc/rtc-sec.c
 F: include/linux/mfd/samsung/

 SAMSUNG S5P/EXYNOS4 SOC SERIES CAMERA SUBSYSTEM DRIVERS
-- 
BOFH excuse #419:

Repeated reboots of the system failed to solve problem
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 3.14.0-rc2: WARNING: at mm/slub.c:1007

2014-02-13 Thread Christian Kujau

On Fri, 14 Feb 2014 at 12:14, Dave Chinner wrote:
> > OK, so the "possible irq lock inversion dependency detected" is a lockdep 
> > regression, as you explained in the xfs-list thread. What about the 
> > "RECLAIM_FS-safe -> RECLAIM_FS-unsafe lock order detected" warning - I 
> > haven't seen it again though, only once with 3.14.0-rc2.
> 
> That was also an i_lock/mmapsem issue, so it's likely to be the same
> root cause. I'm testing a fix for it at the moment.

Understood. Thanks for looking into this.

Christian.
-- 
BOFH excuse #129:

The ring needs another token
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 3.14.0-rc2: WARNING: at mm/slub.c:1007

2014-02-13 Thread Christian Kujau

On Fri, 14 Feb 2014 at 09:26, Dave Chinner wrote:
> > after upgrading from 3.13-rc8 to 3.14.0-rc2 on this PowerPC G4 machine, 
> > the WARNING below was printed.
> > 
> > Shortly after, a lockdep warning appeared (possibly related to my 
> > post to the XFS list yesterday[0]).
> 
> Unlikely.

OK, so the "possible irq lock inversion dependency detected" is a lockdep 
regression, as you explained in the xfs-list thread. What about the 
"RECLAIM_FS-safe -> RECLAIM_FS-unsafe lock order detected" warning - I 
haven't seen it again though, only once with 3.14.0-rc2.

Christian.
-- 
BOFH excuse #108:

The air conditioning water supply pipe ruptured over the machine room
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 3.14.0-rc2: WARNING: at mm/slub.c:1007

2014-02-13 Thread Christian Kujau

On Thu, 13 Feb 2014 at 11:53, Christian Kujau wrote:
> after upgrading from 3.13-rc8 to 3.14.0-rc2 on this PowerPC G4 machine, 
> the WARNING below was printed.
> 
> Shortly after, a lockdep warning appeared (possibly related to my 
> post to the XFS list yesterday[0]).

Sigh, only _after_ sending the email, I came across an earlier posting on 
lkml: http://marc.info/?l=linux-mm=139145788623391

Sorry for the noise. These out-of-memory messages below appeared without 
the WARNING though and started somewhere in 3.13, but are impossible to 
bisect, as they're happening only every few days / weeks.

Christian.

> Even later in the log an out-of-memory error appeared, that may or may not 
> be relatd to that WARNING at all but which I'm trying to chase down ever 
> since 3.13, but which tends to appear more often lately.
> 
> Can anyone take a look if this is something to worry about?
> 
> Full dmesg & .config: http://nerdbynature.de/bits/3.14-rc2/mm/
> 
> Thanks,
> Christian.
> 
> [0] http://oss.sgi.com/pipermail/xfs/2014-February/034054.html
> 
>  [ cut here ]
>  WARNING: at /usr/local/src/linux-git/mm/slub.c:1007
>  Modules linked in: md5 ecb nfs i2c_powermac therm_adt746x ecryptfs arc4 
> firewire_sbp2 b43 usb_storage mac80211 cfg80211
>  CPU: 0 PID: 9025 Comm: nfsd Not tainted 3.14.0-rc2 #1
>  task: efbf8000 ti: ed2a task.ti: ed2a
>  NIP: c00ccc28 LR: c00ccc20 CTR: 
>  REGS: ed2a1980 TRAP: 0700   Not tainted  (3.14.0-rc2)
>  MSR: 00021032   CR: 22f82b82  XER: 2000
>  
>  GPR00: c00ccc20 ed2a1a30 efbf8000  ef96e550   
> 2ce0 
>  GPR08:  0001 efbf86f8 05e7 82fc2b88  0001 
> 00080011 
>  GPR16:   c076  ef96e564 00100100 00200200 
> c1203914 
>  GPR24:  ef96e540 0002 ef96fa80   ed2a 
> c1203900 
>  NIP [c00ccc28] deactivate_slab+0x4c0/0x538
>  LR [c00ccc20] deactivate_slab+0x4b8/0x538
>  Call Trace:
>  [ed2a1a30] [c00ccc20] deactivate_slab+0x4b8/0x538 (unreliable)
>  [ed2a1ae0] [c055d5f0] __slab_alloc.constprop.77+0x260/0x38c
>  [ed2a1b50] [c00cd524] kmem_cache_alloc+0x118/0x140
>  [ed2a1b70] [c01de4bc] kmem_zone_alloc+0x94/0x108
>  [ed2a1ba0] [c01cccd4] xfs_inode_alloc+0x2c/0xd4
>  [ed2a1bc0] [c01cd7a4] xfs_iget+0x2e4/0x584
>  [ed2a1c30] [c020e664] xfs_lookup+0xc8/0xe4
>  [ed2a1c70] [c01d3c28] xfs_vn_lookup+0x64/0xbc
>  [ed2a1c90] [c00db3ac] lookup_real+0x30/0x70
>  [ed2a1ca0] [c00dc384] __lookup_hash+0x3c/0x58
>  [ed2a1cc0] [c00e1438] lookup_one_len+0x10c/0x15c
>  [ed2a1ce0] [c01a170c] nfsd4_encode_dirent+0xb4/0x328
>  [ed2a1d10] [c018f580] nfsd_readdir+0x1d4/0x288
>  [ed2a1d90] [c019d648] nfsd4_encode_readdir+0x138/0x1f4
>  [ed2a1dd0] [c01a1b18] nfsd4_encode_operation+0x8c/0xf0
>  [ed2a1df0] [c019aa4c] nfsd4_proc_compound+0x1b8/0x4f8
>  [ed2a1e30] [c0189d20] nfsd_dispatch+0x90/0x1a0
>  [ed2a1e50] [c0536b04] svc_process+0x3d0/0x698
>  [ed2a1e90] [c01895bc] nfsd+0xc0/0x120
>  [ed2a1eb0] [c004f8fc] kthread+0xbc/0xd0
>  [ed2a1f40] [c0010ae4] ret_from_kernel_thread+0x5c/0x64
>  Instruction dump:
>  7fe4fb78 800100b4 b9c10068 7d810120 7d808120 7c0803a6 382100b0 4bfffb00 
>  80610048 4bf95dc5 2f83 40beff4c <0fe0> 4b44 815e000c 394a0001 
>  ---[ end trace 1f5ed3ea8b3e4403 ]---
> 
> 
> -- 
> BOFH excuse #65:
> 
> system needs to be rebooted
> 
> ___
> xfs mailing list
> x...@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
> 

-- 
BOFH excuse #65:

system needs to be rebooted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

3.14.0-rc2: WARNING: at mm/slub.c:1007

2014-02-13 Thread Christian Kujau

Hi,

after upgrading from 3.13-rc8 to 3.14.0-rc2 on this PowerPC G4 machine, 
the WARNING below was printed.

Shortly after, a lockdep warning appeared (possibly related to my 
post to the XFS list yesterday[0]).

Even later in the log an out-of-memory error appeared, that may or may not 
be relatd to that WARNING at all but which I'm trying to chase down ever 
since 3.13, but which tends to appear more often lately.

Can anyone take a look if this is something to worry about?

Full dmesg & .config: http://nerdbynature.de/bits/3.14-rc2/mm/

Thanks,
Christian.

[0] http://oss.sgi.com/pipermail/xfs/2014-February/034054.html

 [ cut here ]
 WARNING: at /usr/local/src/linux-git/mm/slub.c:1007
 Modules linked in: md5 ecb nfs i2c_powermac therm_adt746x ecryptfs arc4 
firewire_sbp2 b43 usb_storage mac80211 cfg80211
 CPU: 0 PID: 9025 Comm: nfsd Not tainted 3.14.0-rc2 #1
 task: efbf8000 ti: ed2a task.ti: ed2a
 NIP: c00ccc28 LR: c00ccc20 CTR: 
 REGS: ed2a1980 TRAP: 0700   Not tainted  (3.14.0-rc2)
 MSR: 00021032   CR: 22f82b82  XER: 2000
 
 GPR00: c00ccc20 ed2a1a30 efbf8000  ef96e550   
2ce0 
 GPR08:  0001 efbf86f8 05e7 82fc2b88  0001 
00080011 
 GPR16:   c076  ef96e564 00100100 00200200 
c1203914 
 GPR24:  ef96e540 0002 ef96fa80   ed2a 
c1203900 
 NIP [c00ccc28] deactivate_slab+0x4c0/0x538
 LR [c00ccc20] deactivate_slab+0x4b8/0x538
 Call Trace:
 [ed2a1a30] [c00ccc20] deactivate_slab+0x4b8/0x538 (unreliable)
 [ed2a1ae0] [c055d5f0] __slab_alloc.constprop.77+0x260/0x38c
 [ed2a1b50] [c00cd524] kmem_cache_alloc+0x118/0x140
 [ed2a1b70] [c01de4bc] kmem_zone_alloc+0x94/0x108
 [ed2a1ba0] [c01cccd4] xfs_inode_alloc+0x2c/0xd4
 [ed2a1bc0] [c01cd7a4] xfs_iget+0x2e4/0x584
 [ed2a1c30] [c020e664] xfs_lookup+0xc8/0xe4
 [ed2a1c70] [c01d3c28] xfs_vn_lookup+0x64/0xbc
 [ed2a1c90] [c00db3ac] lookup_real+0x30/0x70
 [ed2a1ca0] [c00dc384] __lookup_hash+0x3c/0x58
 [ed2a1cc0] [c00e1438] lookup_one_len+0x10c/0x15c
 [ed2a1ce0] [c01a170c] nfsd4_encode_dirent+0xb4/0x328
 [ed2a1d10] [c018f580] nfsd_readdir+0x1d4/0x288
 [ed2a1d90] [c019d648] nfsd4_encode_readdir+0x138/0x1f4
 [ed2a1dd0] [c01a1b18] nfsd4_encode_operation+0x8c/0xf0
 [ed2a1df0] [c019aa4c] nfsd4_proc_compound+0x1b8/0x4f8
 [ed2a1e30] [c0189d20] nfsd_dispatch+0x90/0x1a0
 [ed2a1e50] [c0536b04] svc_process+0x3d0/0x698
 [ed2a1e90] [c01895bc] nfsd+0xc0/0x120
 [ed2a1eb0] [c004f8fc] kthread+0xbc/0xd0
 [ed2a1f40] [c0010ae4] ret_from_kernel_thread+0x5c/0x64
 Instruction dump:
 7fe4fb78 800100b4 b9c10068 7d810120 7d808120 7c0803a6 382100b0 4bfffb00 
 80610048 4bf95dc5 2f83 40beff4c <0fe0> 4b44 815e000c 394a0001 
 ---[ end trace 1f5ed3ea8b3e4403 ]---


-- 
BOFH excuse #65:

system needs to be rebooted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

3.14.0-rc2: WARNING: at mm/slub.c:1007

2014-02-13 Thread Christian Kujau

Hi,

after upgrading from 3.13-rc8 to 3.14.0-rc2 on this PowerPC G4 machine, 
the WARNING below was printed.

Shortly after, a lockdep warning appeared (possibly related to my 
post to the XFS list yesterday[0]).

Even later in the log an out-of-memory error appeared, that may or may not 
be relatd to that WARNING at all but which I'm trying to chase down ever 
since 3.13, but which tends to appear more often lately.

Can anyone take a look if this is something to worry about?

Full dmesg  .config: http://nerdbynature.de/bits/3.14-rc2/mm/

Thanks,
Christian.

[0] http://oss.sgi.com/pipermail/xfs/2014-February/034054.html

 [ cut here ]
 WARNING: at /usr/local/src/linux-git/mm/slub.c:1007
 Modules linked in: md5 ecb nfs i2c_powermac therm_adt746x ecryptfs arc4 
firewire_sbp2 b43 usb_storage mac80211 cfg80211
 CPU: 0 PID: 9025 Comm: nfsd Not tainted 3.14.0-rc2 #1
 task: efbf8000 ti: ed2a task.ti: ed2a
 NIP: c00ccc28 LR: c00ccc20 CTR: 
 REGS: ed2a1980 TRAP: 0700   Not tainted  (3.14.0-rc2)
 MSR: 00021032 ME,IR,DR,RI  CR: 22f82b82  XER: 2000
 
 GPR00: c00ccc20 ed2a1a30 efbf8000  ef96e550   
2ce0 
 GPR08:  0001 efbf86f8 05e7 82fc2b88  0001 
00080011 
 GPR16:   c076  ef96e564 00100100 00200200 
c1203914 
 GPR24:  ef96e540 0002 ef96fa80   ed2a 
c1203900 
 NIP [c00ccc28] deactivate_slab+0x4c0/0x538
 LR [c00ccc20] deactivate_slab+0x4b8/0x538
 Call Trace:
 [ed2a1a30] [c00ccc20] deactivate_slab+0x4b8/0x538 (unreliable)
 [ed2a1ae0] [c055d5f0] __slab_alloc.constprop.77+0x260/0x38c
 [ed2a1b50] [c00cd524] kmem_cache_alloc+0x118/0x140
 [ed2a1b70] [c01de4bc] kmem_zone_alloc+0x94/0x108
 [ed2a1ba0] [c01cccd4] xfs_inode_alloc+0x2c/0xd4
 [ed2a1bc0] [c01cd7a4] xfs_iget+0x2e4/0x584
 [ed2a1c30] [c020e664] xfs_lookup+0xc8/0xe4
 [ed2a1c70] [c01d3c28] xfs_vn_lookup+0x64/0xbc
 [ed2a1c90] [c00db3ac] lookup_real+0x30/0x70
 [ed2a1ca0] [c00dc384] __lookup_hash+0x3c/0x58
 [ed2a1cc0] [c00e1438] lookup_one_len+0x10c/0x15c
 [ed2a1ce0] [c01a170c] nfsd4_encode_dirent+0xb4/0x328
 [ed2a1d10] [c018f580] nfsd_readdir+0x1d4/0x288
 [ed2a1d90] [c019d648] nfsd4_encode_readdir+0x138/0x1f4
 [ed2a1dd0] [c01a1b18] nfsd4_encode_operation+0x8c/0xf0
 [ed2a1df0] [c019aa4c] nfsd4_proc_compound+0x1b8/0x4f8
 [ed2a1e30] [c0189d20] nfsd_dispatch+0x90/0x1a0
 [ed2a1e50] [c0536b04] svc_process+0x3d0/0x698
 [ed2a1e90] [c01895bc] nfsd+0xc0/0x120
 [ed2a1eb0] [c004f8fc] kthread+0xbc/0xd0
 [ed2a1f40] [c0010ae4] ret_from_kernel_thread+0x5c/0x64
 Instruction dump:
 7fe4fb78 800100b4 b9c10068 7d810120 7d808120 7c0803a6 382100b0 4bfffb00 
 80610048 4bf95dc5 2f83 40beff4c 0fe0 4b44 815e000c 394a0001 
 ---[ end trace 1f5ed3ea8b3e4403 ]---


-- 
BOFH excuse #65:

system needs to be rebooted
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 3.14.0-rc2: WARNING: at mm/slub.c:1007

2014-02-13 Thread Christian Kujau

On Thu, 13 Feb 2014 at 11:53, Christian Kujau wrote:
 after upgrading from 3.13-rc8 to 3.14.0-rc2 on this PowerPC G4 machine, 
 the WARNING below was printed.
 
 Shortly after, a lockdep warning appeared (possibly related to my 
 post to the XFS list yesterday[0]).

Sigh, only _after_ sending the email, I came across an earlier posting on 
lkml: http://marc.info/?l=linux-mmm=139145788623391

Sorry for the noise. These out-of-memory messages below appeared without 
the WARNING though and started somewhere in 3.13, but are impossible to 
bisect, as they're happening only every few days / weeks.

Christian.

 Even later in the log an out-of-memory error appeared, that may or may not 
 be relatd to that WARNING at all but which I'm trying to chase down ever 
 since 3.13, but which tends to appear more often lately.
 
 Can anyone take a look if this is something to worry about?
 
 Full dmesg  .config: http://nerdbynature.de/bits/3.14-rc2/mm/
 
 Thanks,
 Christian.
 
 [0] http://oss.sgi.com/pipermail/xfs/2014-February/034054.html
 
  [ cut here ]
  WARNING: at /usr/local/src/linux-git/mm/slub.c:1007
  Modules linked in: md5 ecb nfs i2c_powermac therm_adt746x ecryptfs arc4 
 firewire_sbp2 b43 usb_storage mac80211 cfg80211
  CPU: 0 PID: 9025 Comm: nfsd Not tainted 3.14.0-rc2 #1
  task: efbf8000 ti: ed2a task.ti: ed2a
  NIP: c00ccc28 LR: c00ccc20 CTR: 
  REGS: ed2a1980 TRAP: 0700   Not tainted  (3.14.0-rc2)
  MSR: 00021032 ME,IR,DR,RI  CR: 22f82b82  XER: 2000
  
  GPR00: c00ccc20 ed2a1a30 efbf8000  ef96e550   
 2ce0 
  GPR08:  0001 efbf86f8 05e7 82fc2b88  0001 
 00080011 
  GPR16:   c076  ef96e564 00100100 00200200 
 c1203914 
  GPR24:  ef96e540 0002 ef96fa80   ed2a 
 c1203900 
  NIP [c00ccc28] deactivate_slab+0x4c0/0x538
  LR [c00ccc20] deactivate_slab+0x4b8/0x538
  Call Trace:
  [ed2a1a30] [c00ccc20] deactivate_slab+0x4b8/0x538 (unreliable)
  [ed2a1ae0] [c055d5f0] __slab_alloc.constprop.77+0x260/0x38c
  [ed2a1b50] [c00cd524] kmem_cache_alloc+0x118/0x140
  [ed2a1b70] [c01de4bc] kmem_zone_alloc+0x94/0x108
  [ed2a1ba0] [c01cccd4] xfs_inode_alloc+0x2c/0xd4
  [ed2a1bc0] [c01cd7a4] xfs_iget+0x2e4/0x584
  [ed2a1c30] [c020e664] xfs_lookup+0xc8/0xe4
  [ed2a1c70] [c01d3c28] xfs_vn_lookup+0x64/0xbc
  [ed2a1c90] [c00db3ac] lookup_real+0x30/0x70
  [ed2a1ca0] [c00dc384] __lookup_hash+0x3c/0x58
  [ed2a1cc0] [c00e1438] lookup_one_len+0x10c/0x15c
  [ed2a1ce0] [c01a170c] nfsd4_encode_dirent+0xb4/0x328
  [ed2a1d10] [c018f580] nfsd_readdir+0x1d4/0x288
  [ed2a1d90] [c019d648] nfsd4_encode_readdir+0x138/0x1f4
  [ed2a1dd0] [c01a1b18] nfsd4_encode_operation+0x8c/0xf0
  [ed2a1df0] [c019aa4c] nfsd4_proc_compound+0x1b8/0x4f8
  [ed2a1e30] [c0189d20] nfsd_dispatch+0x90/0x1a0
  [ed2a1e50] [c0536b04] svc_process+0x3d0/0x698
  [ed2a1e90] [c01895bc] nfsd+0xc0/0x120
  [ed2a1eb0] [c004f8fc] kthread+0xbc/0xd0
  [ed2a1f40] [c0010ae4] ret_from_kernel_thread+0x5c/0x64
  Instruction dump:
  7fe4fb78 800100b4 b9c10068 7d810120 7d808120 7c0803a6 382100b0 4bfffb00 
  80610048 4bf95dc5 2f83 40beff4c 0fe0 4b44 815e000c 394a0001 
  ---[ end trace 1f5ed3ea8b3e4403 ]---
 
 
 -- 
 BOFH excuse #65:
 
 system needs to be rebooted
 
 ___
 xfs mailing list
 x...@oss.sgi.com
 http://oss.sgi.com/mailman/listinfo/xfs
 

-- 
BOFH excuse #65:

system needs to be rebooted
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 3.14.0-rc2: WARNING: at mm/slub.c:1007

2014-02-13 Thread Christian Kujau

On Fri, 14 Feb 2014 at 09:26, Dave Chinner wrote:
  after upgrading from 3.13-rc8 to 3.14.0-rc2 on this PowerPC G4 machine, 
  the WARNING below was printed.
  
  Shortly after, a lockdep warning appeared (possibly related to my 
  post to the XFS list yesterday[0]).
 
 Unlikely.

OK, so the possible irq lock inversion dependency detected is a lockdep 
regression, as you explained in the xfs-list thread. What about the 
RECLAIM_FS-safe - RECLAIM_FS-unsafe lock order detected warning - I 
haven't seen it again though, only once with 3.14.0-rc2.

Christian.
-- 
BOFH excuse #108:

The air conditioning water supply pipe ruptured over the machine room
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 3.14.0-rc2: WARNING: at mm/slub.c:1007

2014-02-13 Thread Christian Kujau

On Fri, 14 Feb 2014 at 12:14, Dave Chinner wrote:
  OK, so the possible irq lock inversion dependency detected is a lockdep 
  regression, as you explained in the xfs-list thread. What about the 
  RECLAIM_FS-safe - RECLAIM_FS-unsafe lock order detected warning - I 
  haven't seen it again though, only once with 3.14.0-rc2.
 
 That was also an i_lock/mmapsem issue, so it's likely to be the same
 root cause. I'm testing a fix for it at the moment.

Understood. Thanks for looking into this.

Christian.
-- 
BOFH excuse #129:

The ring needs another token
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

3.13-rc3: BUG: soft lockup - CPU#0 stuck for 23s!

2013-12-27 Thread Christian Kujau

I noticed that my machine locks up quite often with 3.13.-rc3.

PowerPC G4 again, but this machine was pretty much rock solid until now:
when there's lots of disk I/O going on, the system locks up, but not 
entirely: the calltrace is still written to netconsole (but not to its 
local disk) and answers ping requests - but SSH login is impossible and a 
reset is needed. The workload of the machine has not changed, when there's 
disk I/O it means that either rsync is running or some crazy remote Java 
application is scanning over this machine's NFS shares.

There's sometimes "xfs" mentioned in the call trace and the disk I/O is 
all happening on the xfs mounts, that's why I Cc'ed the xfs mailing list.

More details on: http://nerdbynature.de/bits/3.13-rc3/

Any ideas?

The most recent lockup is from today below, this time it wasn't rsync or 
NFS but I was experimenting with xfs on a loop device, backed by a 1GB 
file, like this:

  $ dd if=/dev/zero of=/usr/local/test.img bs=1M count=1k
  $ losetup -f /usr/local/test.img && mkfs.xfs /dev/loop0
  $ mount -t xfs /dev/loop0 /mnt/disk
  $ cd /mnt/disk
  $ cp -ax / /mnt/disk   - which filled the disk
  $ rm -rf lib/  - make some room
  $ i=1; while true; do printf "$i "; dd if=/dev/zero of=f$i \
count=100 bs=100k; i=$(($i+1)); done  - filling the disk again

  => and then the machine locked up.

 [308783.613600] BUG: soft lockup - CPU#0 stuck for 23s! [kworker/u2:1:14542]
 [308783.613703] Modules linked in: md5 ecb nfs i2c_powermac therm_adt746x 
ecryptfs arc4 b43 firewire_sbp2 usb_storage mac80211 cfg80211
 [308783.613944] irq event stamp: 37770086
 [308783.613980] hardirqs last  enabled at (37770085): [] 
_raw_spin_unlock_irq+0x30/0x60
 [308783.614075] hardirqs last disabled at (37770086): [] 
reenable_mmu+0x30/0x88
 [308783.614156] softirqs last  enabled at (37764418): [] 
__do_softirq+0x168/0x1e8
 [308783.614236] softirqs last disabled at (37764411): [] 
irq_exit+0xa4/0xc8
 [308783.614312] CPU: 0 PID: 14542 Comm: kworker/u2:1 Not tainted 
3.13.0-rc3-00365-gc48b660 #1
 [308783.614384] Workqueue: writeback bdi_writeback_workfn  (flush-7:0)
  
 [308783.614454] task: e8d20bb0 ti: e0c5a000 task.ti: e0c5a000
 [308783.614499] NIP: c0546ffc LR: c0546ff0 CTR: 
 [308783.614543] REGS: e0c5ba80 TRAP: 0901   Not tainted  
(3.13.0-rc3-00365-gc48b660)
 [308783.614596] MSR: 9032 ,ME ,IR ,DR ,RI > CR: 444c2224  XER: 2000
 [308783.614739] #012GPR00: #012GPR08: 
  
 [308783.614998] NIP [c0546ffc] _raw_spin_unlock_irq+0x3c/0x60
 [308783.615047] LR [c0546ff0] _raw_spin_unlock_irq+0x30/0x60
 [308783.615089] Call Trace:
 [308783.615121] [e0c5bb30] [c0546ff0] _raw_spin_unlock_irq+0x30/0x60  
(unreliable)
 [308783.615202] [e0c5bb40] [c009f0e4] __set_page_dirty_nobuffers+0xc8/0x144
 [308783.615264] [e0c5bb60] [c01bec28] xfs_vm_writepage+0x90/0x57c
 [308783.615322] [e0c5bbf0] [c009e618] __writepage+0x24/0x7c
 [308783.615376] [e0c5bc00] [c009ec38] write_cache_pages+0x1d0/0x380
 [308783.615433] [e0c5bca0] [c009ee34] generic_writepages+0x4c/0x70
 [308783.615493] [e0c5bce0] [c00f9af8] __writeback_single_inode+0x34/0x12c
 [308783.615968] [e0c5bd00] [c00f9e74] writeback_sb_inodes+0x1f4/0x344
 [308783.616418] [e0c5bd70] [c00fa050] __writeback_inodes_wb+0x8c/0xd0
 [308783.616864] [e0c5bda0] [c00fa258] wb_writeback+0x1c4/0x1cc
 [308783.617306] [e0c5bdd0] [c00fae14] bdi_writeback_workfn+0x158/0x33c
 [308783.617751] [e0c5be50] [c004906c] process_one_work+0x19c/0x3f0
 [308783.618193] [e0c5be80] [c0049a0c] worker_thread+0x128/0x3c0
 [308783.618630] [e0c5beb0] [c004fa8c] kthread+0xbc/0xd0
 [308783.619071] [e0c5bf40] [c001099c] ret_from_kernel_thread+0x5c/0x64
 [308783.619501] Instruction dump:
 [308783.619915] 7ca802a6 
 [308783.620437] 4bb1c681 

-- 
BOFH excuse #446:

Mailer-daemon is busy burning your message in hell.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

3.13-rc3: BUG: soft lockup - CPU#0 stuck for 23s!

2013-12-27 Thread Christian Kujau

I noticed that my machine locks up quite often with 3.13.-rc3.

PowerPC G4 again, but this machine was pretty much rock solid until now:
when there's lots of disk I/O going on, the system locks up, but not 
entirely: the calltrace is still written to netconsole (but not to its 
local disk) and answers ping requests - but SSH login is impossible and a 
reset is needed. The workload of the machine has not changed, when there's 
disk I/O it means that either rsync is running or some crazy remote Java 
application is scanning over this machine's NFS shares.

There's sometimes xfs mentioned in the call trace and the disk I/O is 
all happening on the xfs mounts, that's why I Cc'ed the xfs mailing list.

More details on: http://nerdbynature.de/bits/3.13-rc3/

Any ideas?

The most recent lockup is from today below, this time it wasn't rsync or 
NFS but I was experimenting with xfs on a loop device, backed by a 1GB 
file, like this:

  $ dd if=/dev/zero of=/usr/local/test.img bs=1M count=1k
  $ losetup -f /usr/local/test.img  mkfs.xfs /dev/loop0
  $ mount -t xfs /dev/loop0 /mnt/disk
  $ cd /mnt/disk
  $ cp -ax / /mnt/disk   - which filled the disk
  $ rm -rf lib/  - make some room
  $ i=1; while true; do printf $i ; dd if=/dev/zero of=f$i \
count=100 bs=100k; i=$(($i+1)); done  - filling the disk again

  = and then the machine locked up.

 [308783.613600] BUG: soft lockup - CPU#0 stuck for 23s! [kworker/u2:1:14542]
 [308783.613703] Modules linked in: md5 ecb nfs i2c_powermac therm_adt746x 
ecryptfs arc4 b43 firewire_sbp2 usb_storage mac80211 cfg80211
 [308783.613944] irq event stamp: 37770086
 [308783.613980] hardirqs last  enabled at (37770085): [c0546ff0] 
_raw_spin_unlock_irq+0x30/0x60
 [308783.614075] hardirqs last disabled at (37770086): [c0010700] 
reenable_mmu+0x30/0x88
 [308783.614156] softirqs last  enabled at (37764418): [c00354d4] 
__do_softirq+0x168/0x1e8
 [308783.614236] softirqs last disabled at (37764411): [c0035990] 
irq_exit+0xa4/0xc8
 [308783.614312] CPU: 0 PID: 14542 Comm: kworker/u2:1 Not tainted 
3.13.0-rc3-00365-gc48b660 #1
 [308783.614384] Workqueue: writeback bdi_writeback_workfn  (flush-7:0)
  
 [308783.614454] task: e8d20bb0 ti: e0c5a000 task.ti: e0c5a000
 [308783.614499] NIP: c0546ffc LR: c0546ff0 CTR: 
 [308783.614543] REGS: e0c5ba80 TRAP: 0901   Not tainted  
(3.13.0-rc3-00365-gc48b660)
 [308783.614596] MSR: 9032 ,ME ,IR ,DR ,RI  CR: 444c2224  XER: 2000
 [308783.614739] #012GPR00: #012GPR08: 
  
 [308783.614998] NIP [c0546ffc] _raw_spin_unlock_irq+0x3c/0x60
 [308783.615047] LR [c0546ff0] _raw_spin_unlock_irq+0x30/0x60
 [308783.615089] Call Trace:
 [308783.615121] [e0c5bb30] [c0546ff0] _raw_spin_unlock_irq+0x30/0x60  
(unreliable)
 [308783.615202] [e0c5bb40] [c009f0e4] __set_page_dirty_nobuffers+0xc8/0x144
 [308783.615264] [e0c5bb60] [c01bec28] xfs_vm_writepage+0x90/0x57c
 [308783.615322] [e0c5bbf0] [c009e618] __writepage+0x24/0x7c
 [308783.615376] [e0c5bc00] [c009ec38] write_cache_pages+0x1d0/0x380
 [308783.615433] [e0c5bca0] [c009ee34] generic_writepages+0x4c/0x70
 [308783.615493] [e0c5bce0] [c00f9af8] __writeback_single_inode+0x34/0x12c
 [308783.615968] [e0c5bd00] [c00f9e74] writeback_sb_inodes+0x1f4/0x344
 [308783.616418] [e0c5bd70] [c00fa050] __writeback_inodes_wb+0x8c/0xd0
 [308783.616864] [e0c5bda0] [c00fa258] wb_writeback+0x1c4/0x1cc
 [308783.617306] [e0c5bdd0] [c00fae14] bdi_writeback_workfn+0x158/0x33c
 [308783.617751] [e0c5be50] [c004906c] process_one_work+0x19c/0x3f0
 [308783.618193] [e0c5be80] [c0049a0c] worker_thread+0x128/0x3c0
 [308783.618630] [e0c5beb0] [c004fa8c] kthread+0xbc/0xd0
 [308783.619071] [e0c5bf40] [c001099c] ret_from_kernel_thread+0x5c/0x64
 [308783.619501] Instruction dump:
 [308783.619915] 7ca802a6 
 [308783.620437] 4bb1c681 

-- 
BOFH excuse #446:

Mailer-daemon is busy burning your message in hell.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: fs: proc: lockdep spew and questions

2013-12-09 Thread Christian Kujau

On Sun, 8 Dec 2013 at 22:14, Sasha Levin wrote:
> So how would you suggest to deal with the execution issue in procfs?

Files will not be executable by itsself if /proc is mounted with noexec, 
as some distributions now do by default.

C.
-- 
BOFH excuse #14:

sounds like a Windows problem, try calling Microsoft support
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: fs: proc: lockdep spew and questions

2013-12-09 Thread Christian Kujau

On Sun, 8 Dec 2013 at 22:14, Sasha Levin wrote:
 So how would you suggest to deal with the execution issue in procfs?

Files will not be executable by itsself if /proc is mounted with noexec, 
as some distributions now do by default.

C.
-- 
BOFH excuse #14:

sounds like a Windows problem, try calling Microsoft support
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] scripts/kconfig/menu.c: warning: jump may be used uninitialized in this function

2013-10-27 Thread Christian Kujau

On Sun, 27 Oct 2013 at 18:28, Christian Kujau wrote:
> While doing "make oldconfig" on 3.12-rc7 with gcc-4.7.2 (Debian), the 
> following warning is printed:
> 
>   HOSTCC  scripts/kconfig/zconf.tab.o
> In file included from scripts/kconfig/zconf.tab.c:2537:0:
> /usr/local/src/linux-git/scripts/kconfig/menu.c: In function ‘get_symbol_str’:
> /usr/local/src/linux-git/scripts/kconfig/menu.c:586:18: warning: ‘jump’ may 
> be used uninitialized in this function [-Wmaybe-uninitialized]
> /usr/local/src/linux-git/scripts/kconfig/menu.c:547:19: note: ‘jump’ was 
> declared here

Grrr, only after I sent this message I found this was reported in 
September already by Madhavan Srinivasan:  https://lkml.org/lkml/2013/9/19/24

Does anybody know the state of this fix?

Thanks,
Christian.

> The following patch seems to fix that:
> 
>  Signed-off-by: Christian Kujau 
> 
> diff --git a/scripts/kconfig/menu.c b/scripts/kconfig/menu.c
> index c1d5320..23b1827 100644
> --- a/scripts/kconfig/menu.c
> +++ b/scripts/kconfig/menu.c
> @@ -544,7 +544,7 @@ static void get_prompt_str(struct gstr *r, struct 
> property *prop,
>  {
>   int i, j;
>   struct menu *submenu[8], *menu, *location = NULL;
> - struct jump_key *jump;
> + struct jump_key *jump = NULL;
>  
>   str_printf(r, _("Prompt: %s\n"), _(prop->text));
>   menu = prop->menu->parent;
> 
> 
> Christian.
> -- 
> BOFH excuse #177:
> 
> sticktion
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

-- 
BOFH excuse #449:

greenpeace free'd the mallocs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] scripts/kconfig/menu.c: warning: jump may be used uninitialized in this function

2013-10-27 Thread Christian Kujau

While doing "make oldconfig" on 3.12-rc7 with gcc-4.7.2 (Debian), the 
following warning is printed:

  HOSTCC  scripts/kconfig/zconf.tab.o
In file included from scripts/kconfig/zconf.tab.c:2537:0:
/usr/local/src/linux-git/scripts/kconfig/menu.c: In function ‘get_symbol_str’:
/usr/local/src/linux-git/scripts/kconfig/menu.c:586:18: warning: ‘jump’ may be 
used uninitialized in this function [-Wmaybe-uninitialized]
/usr/local/src/linux-git/scripts/kconfig/menu.c:547:19: note: ‘jump’ was 
declared here

The following patch seems to fix that:

 Signed-off-by: Christian Kujau 

diff --git a/scripts/kconfig/menu.c b/scripts/kconfig/menu.c
index c1d5320..23b1827 100644
--- a/scripts/kconfig/menu.c
+++ b/scripts/kconfig/menu.c
@@ -544,7 +544,7 @@ static void get_prompt_str(struct gstr *r, struct property 
*prop,
 {
int i, j;
struct menu *submenu[8], *menu, *location = NULL;
-   struct jump_key *jump;
+   struct jump_key *jump = NULL;
 
str_printf(r, _("Prompt: %s\n"), _(prop->text));
menu = prop->menu->parent;


Christian.
-- 
BOFH excuse #177:

sticktion
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 >

1 - 100 of 304 matches

Mail list logo