[Bug 1349711] Re: Machine lockup in btrfs-transaction

2017-01-02 Thread Neal McBurnett
The two upstream report links above via gmane now say "ArchivedAt Nothing found - bye" http://web.archive.org has no record of them. Are updated links available? Did any discussion take place there? I'm seeing INFO: task btrfs-transacti:979 blocked for more than 120 seconds. and wondering if it

[Bug 1349711] Re: Machine lockup in btrfs-transaction

2016-04-09 Thread Dario Bertini
I got "the connection used to fetch this resource is insecure" and I never receive a response when sending the POST request to upload the attachment on launchpad.net... I'll try again later -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to

[Bug 1349711] Re: Machine lockup in btrfs-transaction

2016-04-09 Thread Dario Bertini
With btrfs-tools 4.0-2, and kernel 4.2.0-34-generic, I got a kthread pegging up 1 core at 100%, and the whole system was almost completely unusable. this persisted for several minutes. Then it quieted down, and then the problem reappeared... even after rebooting. I had 2GB of free space. I now

[Bug 1349711] Re: Machine lockup in btrfs-transaction

2015-01-11 Thread Tom Van Braeckel
Peter, would you consider re-formatting your disk with recent btrfs- tools, and using a 3.18+ Linux kernel? I think it's fair to assume that or try if the problem would be resolved. And that would help you move forward as well. Consider how unstable older versions are warned to be: WARNING! -

[Bug 1349711] Re: Machine lockup in btrfs-transaction

2015-01-09 Thread TomaszChmielewski
Personally for me, 3.18.x kernel was the first one where btrfs finally behaves stable (so far). -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1349711 Title: Machine lockup in btrfs-transaction To

[Bug 1349711] Re: Machine lockup in btrfs-transaction

2015-01-09 Thread Christian Reis
I'll agree with Tomasz -- running the 3.18.1 kernel I've had much better experience with btrfs than with Trusty's stock 3.13. ** Bug watch added: Linux Kernel Bug Tracker #90401 http://bugzilla.kernel.org/show_bug.cgi?id=90401 ** Also affects: linux via

[Bug 1349711] Re: Machine lockup in btrfs-transaction

2014-08-07 Thread Stefan Bader
So the way I read the thread, there is the (apparently better known for developers than it is currently documented) basic problem with btrfs that it can run out of space rather unexpectedly. I was a bit surprised as well to read that 500MB (while looking like a whole lot of space coming from

[Bug 1349711] Re: Machine lockup in btrfs-transaction

2014-08-07 Thread Peter Waller
That's my understanding too, except in one of the scenarios I observed 100% SYS CPU for long stretches even when there was a significant amount (~50GB) of the device unused. However, if it was a soft lockup it was for 8 hours, during which the machine was totally unresponsive to HTTP requests,

[Bug 1349711] Re: Machine lockup in btrfs-transaction

2014-08-05 Thread Peter Waller
Repost of what I sent to the mailing list just now: My current interpretation of this problem is that it is some pathological condition caused by not rebalancing and being nearly out of space for allocating more metadata and hence it is rarely being seen by anyone else (because most users are

[Bug 1349711] Re: Machine lockup in btrfs-transaction

2014-08-02 Thread Peter Waller
The production machine hasn't had a lockup since moving to 3.15.7-031507-generic (it's been up for 4 days) even though we could reproduce the lockup on a new machine with that kernel using a snapshot of the old volume. Another twist is that on the productino machine I'm now reliably seeing No

[Bug 1349711] Re: Machine lockup in btrfs-transaction

2014-08-01 Thread Stefan Bader
It is hard to say for sure. I would not know the details of EC2 guest setup. The block device could be local but more likely is some form of network attachment (iscsi, ndb). And btrfs is still a newer kid on the block. So there still might be surprises just because of that. You description was

[Bug 1349711] Re: Machine lockup in btrfs-transaction

2014-08-01 Thread Peter Waller
btrfs was created with `mkfs.btrfs /dev/mapper/vg-lv`. It isn't a hard requirement except that it's a pain to migrate since that requires downtime to move the files. Something I'd rather not do unless absolutely necessary. The machine freezes are inconvenient but represent a few minutes downtime

[Bug 1349711] Re: Machine lockup in btrfs-transaction

2014-08-01 Thread Stefan Bader
Hm, odd. To get a similar output to btrfs fi df mount, I have to add -m dup (otherwise I have no DUP lines for system and metadata). And especially the metadata-dup was relatively high in use 4.65GiB of 5.50GiB on the output you showed us yesterday. -- You received this bug notification because

[Bug 1349711] Re: Machine lockup in btrfs-transaction

2014-08-01 Thread Peter Waller
The filesystem may have been originally created on an older version of BTRFS from Ubuntu Saucy, which I suppose may not have detected the SSD? -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1349711

[Bug 1349711] Re: Machine lockup in btrfs-transaction

2014-08-01 Thread Stefan Bader
Maybe different default options. I don't think a PV guest will know the difference of the backing real device. Would those not all be just blkfront pv block devices? Probably another detail we might be interested in here. If the device name for the pv was xvd* then its a pv disk (not using ec2

[Bug 1349711] Re: Machine lockup in btrfs-transaction

2014-08-01 Thread Peter Waller
smb: Yeah, the system the filesystem was created on was PV, the device name was xvd*. Now it's on HVM with xvd*. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1349711 Title: Machine lockup in

[Bug 1349711] Re: Machine lockup in btrfs-transaction

2014-07-31 Thread Peter Waller
I've got a way to rapidly reproduce the error now. I can do it reliably with a turnaround time of 5-10 minutes. I've reproduced the crash on the new Kernel, so it has now been observed on both 3.13.0-32-generic and 3.15.7-031507-generic. I'll try 3.16 next. I've also discovered this new stack

[Bug 1349711] Re: Machine lockup in btrfs-transaction

2014-07-31 Thread Peter Waller
Now reproduced on 3.16. I'm out of things to try for now. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1349711 Title: Machine lockup in btrfs-transaction To manage notifications about this bug go

[Bug 1349711] Re: Machine lockup in btrfs-transaction

2014-07-31 Thread Peter Waller
This gist contains a stack trace every 10 seconds taken with `echo l /proc/sysrq-trigger` whilst the machine was spinning in the kernel but still responsive. https://gist.github.com/pwaller/c7dd0f4807459acedcdf The machine remained responsive for 5-10 minutes before becoming totally

[Bug 1349711] Re: Machine lockup in btrfs-transaction

2014-07-31 Thread Peter Waller
** Tags added: kernel-bug-exists-upstream ** Changed in: linux (Ubuntu) Status: Incomplete = Confirmed -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1349711 Title: Machine lockup in

[Bug 1349711] Re: Machine lockup in btrfs-transaction

2014-07-31 Thread Chris J Arges
From IRC, an example of a testcase: 1) create around ~300GB of small files (sqlite files for example), put the files into a list sqlite-files.txt 2) Start the copy: cat sqlite-files.txt | xargs -n1 -I{} -P2 sudo sh -c 'rm -f {}.new; cat {} {}.new; echo {}' 3) When it hangs, identify where it

[Bug 1349711] Re: Machine lockup in btrfs-transaction

2014-07-31 Thread Chris J Arges
Peter also tested on a single core machine and was able to reproduce the issue. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1349711 Title: Machine lockup in btrfs-transaction To manage

[Bug 1349711] Re: Machine lockup in btrfs-transaction

2014-07-31 Thread Chris J Arges
So at this point it would be good to have a clear idea of how to get into this state to better understand the issue. Could you write a detailed description of how you setup your machine/volume, then others can try to run a similar test here and see if we can reproduce the same results? Thanks,

[Bug 1349711] Re: Machine lockup in btrfs-transaction

2014-07-31 Thread Peter Waller
Hm, I'm not sure I can give a thorough description since I don't understand enough about the exact workload myself. It is a fairly arbitrary workload generated by our users. In the end, it boils down to creating, reading and writing many (~20,000) sqlite files of size 16kb - 12GB across many

[Bug 1349711] Re: Machine lockup in btrfs-transaction

2014-07-31 Thread Peter Waller
(otherwise unloaded test machines) On a dual core machine, 100% system CPU usage with zero writes is seen on one core for 5-10 minutes, spending time in BTRFS threads. On a single thread machine 100% system CPU is used and I haven't yet been able to cause it to hang entirely. I do observe almost

[Bug 1349711] Re: Machine lockup in btrfs-transaction

2014-07-30 Thread Peter Waller
I found an additional stack trace from a previous machine lockup. [1093202.136107] INFO: task kworker/u30:1:31455 blocked for more than 120 seconds. [1093202.141596] Tainted: GF3.13.0-30-generic #54-Ubuntu [1093202.146201] echo 0 /proc/sys/kernel/hung_task_timeout_secs

[Bug 1349711] Re: Machine lockup in btrfs-transaction

2014-07-29 Thread Peter Waller
I've also started a thread on linux-btrfs: http://thread.gmane.org/gmane.comp.file-systems.btrfs/37224 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1349711 Title: Machine lockup in

[Bug 1349711] Re: Machine lockup in btrfs-transaction

2014-07-29 Thread Peter Waller
@brad-figg, apologies I missed your response. Is there a way to generate the output without automatically uploading it? I would like to review it first. I tried `apport-cli --save` but that doesn't do anything unless there are any crash files that I can tell. -- You received this bug

[Bug 1349711] Re: Machine lockup in btrfs-transaction

2014-07-29 Thread Joseph Salisbury
Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v3.16 kernel[0]. If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'. If the mainline kernel does not fix

[Bug 1349711] Re: Machine lockup in btrfs-transaction

2014-07-29 Thread Peter Waller
One thing I am unsure of is that the bug did not manifest for at least 12 days running originally. So I'm not sure it is going to be possible to reliably decide that it is fixed by moving to a particular kernel. What is the standard here? -- You received this bug notification because you are a

[Bug 1349711] Re: Machine lockup in btrfs-transaction

2014-07-29 Thread Peter Waller
The crashes became more frequent. The approximate time was 12 days running, then ~2 days running, then 6 hours, then 1 hour. I since moved to 3.15.7-031507-generic. One thing I have observed is that (EXT4 filesystem) /var/log/nginx/access.log contained ~2KB of NULL characters in place of any