Re: timeout 120 teuthology-killl is highly recommended

2015-07-21 Thread Yuri Weinstein
I was thinking of teuthology-nuke thou ! Thx YuriW - Original Message - From: Yuri Weinstein ywein...@redhat.com To: Loic Dachary l...@dachary.org Cc: Ceph Development ceph-devel@vger.kernel.org Sent: Tuesday, July 21, 2015 9:33:26 AM Subject: Re: timeout 120 teuthology-killl is highly

Re: teuthology rados runs for next

2015-07-21 Thread Loic Dachary
Ok ! teuthology-kill -m multi -r teuthology-2015-07-18_21:00:09-rados-next-distro-basic-multi teuthology-kill -m multi -r teuthology-2015-07-20_21:00:09-rados-next-distro-basic-multi I observed that the older

Re: timeout 120 teuthology-killl is highly recommended

2015-07-21 Thread Yuri Weinstein
Loic I don't use teuthology-kill simultaneously only sequentially. As far as run time, just as a note, when we use 'stale' arg and it invokes ipmitool interface it does take awhile to finish. Thx YuriW - Original Message - From: Loic Dachary l...@dachary.org To: Ceph Development

9.0.2 test/perf_local.cc on non-x86 architectures

2015-07-21 Thread Deneau, Tom
I was trying to do an rpmbuild of v9.0.2 for aarch64 and got the following error: test/perf_local.cc: In function 'double div32()': test/perf_local.cc:396:31: error: impossible constraint in 'asm' cc); Probably should have an if defined (__i386__) around it. -- Tom

Re: [Documentation] Hardware recommandation : RAM and PGLog

2015-07-21 Thread David Casier AEVOO
OK, I just understand the need of transactions for the trim takes place after changing settings. What is the risk to have too low a value for the parameter osd_min_pg_log_entries (not osd_max_pg_log_entries in degraded environment) ? David. On 07/20/2015 03:13 PM, Sage Weil wrote: On Sun,

Re: local teuthology testing

2015-07-21 Thread Loic Dachary
Hi, Since July 18th teuthology no longer uses chef, this issue has been resolved ! Using ansible requires configuration ( http://dachary.org/?p=3752 explains it shortly, maybe there is something in the documentation but I did not pay enough attention to be sure ). At the end of

timeout 120 teuthology-killl is highly recommended

2015-07-21 Thread Loic Dachary
Hi Ceph, Today I did something wrong and that blocked the lab for a good half hour. a) I ran two teuthology-kill simultaneously and that makes them deadlock each other b) I let them run unattended only to come back to the terminal 30 minutes later and see them stuck. Sure, two

Re: timeout 120 teuthology-killl is highly recommended

2015-07-21 Thread Gregory Farnum
On Tue, Jul 21, 2015 at 5:13 PM, Loic Dachary l...@dachary.org wrote: Hi Ceph, Today I did something wrong and that blocked the lab for a good half hour. a) I ran two teuthology-kill simultaneously and that makes them deadlock each other b) I let them run unattended only to come back to

Re: timeout 120 teuthology-killl is highly recommended

2015-07-21 Thread Loic Dachary
Greg Yuri : I stand corrected, I should have been less affirmative on a topic I know little about. Thanks ! On 21/07/2015 18:33, Yuri Weinstein wrote: Loic I don't use teuthology-kill simultaneously only sequentially. As far as run time, just as a note, when we use 'stale' arg and it

Re: The design of the eviction improvement

2015-07-21 Thread Matt W. Benjamin
Thanks for the explanations, Greg. - Gregory Farnum g...@gregs42.com wrote: On Tue, Jul 21, 2015 at 3:15 PM, Matt W. Benjamin m...@cohortfs.com wrote: Hi, Couple of points. 1) a successor to 2Q is MQ (Li et al). We have an intrusive MQ LRU implementation with 2 levels

Re: MVC in ceph-deploy.

2015-07-21 Thread Travis Rhoden
Hi Owen, I think the primary concern I have that I want to discuss more about is cluster state discovery. I'm worried about how this scales. Normally when I think about MVC, I think of a long-running application or something with a persistent data store for the model (or both). ceph-deploy is

Re: upstream/firefly exporting the same snap 2 times results in different exports

2015-07-21 Thread Stefan Priebe
Am 21.07.2015 um 16:32 schrieb Jason Dillaman: Any chance that the snapshot was just created prior to the first export and you have a process actively writing to the image? Sadly not. I executed those commands exactly as i've posted manually at bash. I can reproduce this at 5 different

Ceph Tech Talk next week

2015-07-21 Thread Patrick McGarry
Hey cephers, Just a reminder that the Ceph Tech Talk on CephFS that was scheduled for last month (and cancelled due to technical difficulties) has been rescheduled for this month's talk. It will be happening next Thurs at 17:00 UTC (1p EST) on our Blue Jeans conferencing system. If you have any

Re: upstream/firefly exporting the same snap 2 times results in different exports

2015-07-21 Thread Jason Dillaman
Does this still occur if you export the images to the console (i.e. rbd export cephstor/disk-116@snap - dump_file)? Would it be possible for you to provide logs from the two rbd export runs on your smallest VM image? If so, please add the following to the [client] section of your

Re: hdparm -W redux, bug in _check_disk_write_cache for RHEL6?

2015-07-21 Thread Dan van der Ster
On Tue, Jul 21, 2015 at 4:20 PM, Ilya Dryomov idryo...@gmail.com wrote: This one, I think: commit ab0a9735e06914ce4d2a94ffa41497dbc142fe7f Author: Christoph Hellwig h...@lst.de Date: Thu Oct 29 14:14:04 2009 +0100 blkdev: flush disk cache on -fsync Thanks, that looks relevant! Looks

RE: The design of the eviction improvement

2015-07-21 Thread Wang, Zhiqiang
-Original Message- From: Sage Weil [mailto:sw...@redhat.com] Sent: Tuesday, July 21, 2015 9:29 PM To: Wang, Zhiqiang Cc: sj...@redhat.com; ceph-devel@vger.kernel.org Subject: RE: The design of the eviction improvement On Tue, 21 Jul 2015, Wang, Zhiqiang wrote: -Original

Re: The design of the eviction improvement

2015-07-21 Thread Matt W. Benjamin
Hi, - Zhiqiang Wang zhiqiang.w...@intel.com wrote: Hi Matt, -Original Message- From: Matt W. Benjamin [mailto:m...@cohortfs.com] Sent: Tuesday, July 21, 2015 10:16 PM To: Wang, Zhiqiang Cc: sj...@redhat.com; ceph-devel@vger.kernel.org; Sage Weil Subject: Re: The design

Subscription to the ceph-devel mailing list

2015-07-21 Thread Surabhi Bhalothia
-- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

RE: The design of the eviction improvement

2015-07-21 Thread Wang, Zhiqiang
Hi Matt, -Original Message- From: Matt W. Benjamin [mailto:m...@cohortfs.com] Sent: Tuesday, July 21, 2015 10:16 PM To: Wang, Zhiqiang Cc: sj...@redhat.com; ceph-devel@vger.kernel.org; Sage Weil Subject: Re: The design of the eviction improvement Hi, Couple of points. 1) a

Re: upstream/firefly exporting the same snap 2 times results in different exports

2015-07-21 Thread Stefan Priebe
Am 21.07.2015 um 21:46 schrieb Josh Durgin: On 07/21/2015 12:22 PM, Stefan Priebe wrote: Am 21.07.2015 um 19:19 schrieb Jason Dillaman: Does this still occur if you export the images to the console (i.e. rbd export cephstor/disk-116@snap - dump_file)? Would it be possible for you to

Re: upstream/firefly exporting the same snap 2 times results in different exports

2015-07-21 Thread Stefan Priebe
So this is really this old bug? http://tracker.ceph.com/issues/9806 Stefan Am 21.07.2015 um 21:46 schrieb Josh Durgin: On 07/21/2015 12:22 PM, Stefan Priebe wrote: Am 21.07.2015 um 19:19 schrieb Jason Dillaman: Does this still occur if you export the images to the console (i.e. rbd export

Re: upstream/firefly exporting the same snap 2 times results in different exports

2015-07-21 Thread Stefan Priebe
Am 21.07.2015 um 19:19 schrieb Jason Dillaman: Does this still occur if you export the images to the console (i.e. rbd export cephstor/disk-116@snap - dump_file)? Would it be possible for you to provide logs from the two rbd export runs on your smallest VM image? If so, please add the

Re: upstream/firefly exporting the same snap 2 times results in different exports

2015-07-21 Thread Josh Durgin
On 07/21/2015 12:22 PM, Stefan Priebe wrote: Am 21.07.2015 um 19:19 schrieb Jason Dillaman: Does this still occur if you export the images to the console (i.e. rbd export cephstor/disk-116@snap - dump_file)? Would it be possible for you to provide logs from the two rbd export runs on your

Re: Ceph Tech Talk next week

2015-07-21 Thread Gregory Farnum
On Tue, Jul 21, 2015 at 6:09 PM, Patrick McGarry pmcga...@redhat.com wrote: Hey cephers, Just a reminder that the Ceph Tech Talk on CephFS that was scheduled for last month (and cancelled due to technical difficulties) has been rescheduled for this month's talk. It will be happening next

Re: upstream/firefly exporting the same snap 2 times results in different exports

2015-07-21 Thread Josh Durgin
Yes, I'm afraid it sounds like it is. You can double check whether the watch exists on an image by getting the id of the image from 'rbd info $pool/$image | grep block_name_prefix': block_name_prefix: rbd_data.105674b0dc51 The id is the hex number there. Append that to 'rbd_header.' and you

quick way to rebuild deb packages

2015-07-21 Thread Bartłomiej Święcki
Hi all, I'm currently working on a test environment for ceph where we're using deb files to deploy new version on test cluster. To make this work efficiently I'd have to quckly build deb packages. I tried dpkg-buildpackages -nc which should keep the results of previous build but it ends up in

Re: upstream/firefly exporting the same snap 2 times results in different exports

2015-07-21 Thread Jason Dillaman
That sounds very odd. Could you verify via 'rados listwatchers' on an in-use rbd image's header object that there's still a watch established? How can i do this exactly? You need to determine the RBD header object name. For format 1 images (default for Firefly), the image header object

RE: local teuthology testing

2015-07-21 Thread Zhou, Yuan
Loic, thanks for the notes! Will try the new code and report out the issue I met. Thanks, -yuan -Original Message- From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Loic Dachary Sent: Tuesday, July 21, 2015 11:48 PM To: shin...@linux.com;

Re: rados/thrash on OpenStack

2015-07-21 Thread Loic Dachary
Hi Kefu, The following runs on OpenStack and the next branch http://integration.ceph.dachary.org:8081/ubuntu-2015-07-21_00:04:04-rados-next---basic-openstack/ and 15 out of the 16 dead jobs (timed out after 3 hours) are from rados/thrash. A rados suite run on next dated a few days ago in the

Re: rados/thrash on OpenStack

2015-07-21 Thread Loic Dachary
Note however that only one of the dead (timed out) job has an assert (looks like it's because the file system is not as it should, which is expected since there are no attached disks to the instances, therefore no way for the job to mkfs the file system of choice). All others timed out just

Re: dmcrypt with luks keys in hammer

2015-07-21 Thread David Disseldorp
Hi, On Mon, 20 Jul 2015 15:21:50 -0700 (PDT), Sage Weil wrote: On Mon, 20 Jul 2015, Wyllys Ingersoll wrote: No luck with ceph-disk-activate (all or just one device). $ sudo ceph-disk-activate /dev/sdv1 mount: unknown filesystem type 'crypto_LUKS' ceph-disk: Mounting filesystem

Re: About Fio backend with ObjectStore API

2015-07-21 Thread Haomai Wang
Hi Casey, I check your commits and know what you fixed. I cherry-picked your new commits but I still met the same problem. It's strange that it alwasys hit segment fault when entering _fio_setup_ceph_filestore_data, gdb tells td-io_ops is NULL but when I up the stack, the td-io_ops is not null.

hdparm -W redux, bug in _check_disk_write_cache for RHEL6?

2015-07-21 Thread Dan van der Ster
Hi, Following the sf.net corruption report I've been checking our config w.r.t data consistency. AFAIK the two main recommendations are: 1) don't mount FileStores with nobarrier 2) disable write-caching (hdparm -W 0 /dev/sdX) when using block dev journals and your kernel is 2.6.33

teuthology rados runs for next

2015-07-21 Thread Loic Dachary
Hi Sam, I noticed today that http://pulpito.ceph.com/?suite=radosbranch=next is lagging three days behind. Do we want to keep all the runs or should we kill the older ones ? I suppose there would be value in having the results for all of them but given the current load in the sepia lab it also

Re: teuthology rados runs for next

2015-07-21 Thread Sage Weil
On Tue, 21 Jul 2015, Loic Dachary wrote: Hi Sam, I noticed today that http://pulpito.ceph.com/?suite=radosbranch=next is lagging three days behind. Do we want to keep all the runs or should we kill the older ones ? I suppose there would be value in having the results for all of them but

Re: hdparm -W redux, bug in _check_disk_write_cache for RHEL6?

2015-07-21 Thread Sage Weil
On Tue, 21 Jul 2015, Dan van der Ster wrote: Hi, Following the sf.net corruption report I've been checking our config w.r.t data consistency. AFAIK the two main recommendations are: 1) don't mount FileStores with nobarrier 2) disable write-caching (hdparm -W 0 /dev/sdX) when using

Re: dmcrypt with luks keys in hammer

2015-07-21 Thread Sage Weil
On Tue, 21 Jul 2015, David Disseldorp wrote: Hi, On Mon, 20 Jul 2015 15:21:50 -0700 (PDT), Sage Weil wrote: On Mon, 20 Jul 2015, Wyllys Ingersoll wrote: No luck with ceph-disk-activate (all or just one device). $ sudo ceph-disk-activate /dev/sdv1 mount: unknown filesystem type

RE: The design of the eviction improvement

2015-07-21 Thread Sage Weil
On Tue, 21 Jul 2015, Wang, Zhiqiang wrote: -Original Message- From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Sage Weil Sent: Tuesday, July 21, 2015 6:38 AM To: Wang, Zhiqiang Cc: sj...@redhat.com; ceph-devel@vger.kernel.org

upstream/firefly exporting the same snap 2 times results in different exports

2015-07-21 Thread Stefan Priebe - Profihost AG
Hi, i remember there was a bug before in ceph not sure in which release where exporting the same rbd snap multiple times results in different raw images. Currently running upstream/firefly and i'm seeing the same again. # rbd export cephstor/disk-116@snap dump1 # sleep 10 # rbd export

local teuthology testing

2015-07-21 Thread Zhou, Yuan
Hi David/Loic, I was also trying to make some local Teuthology clusters here. The biggest issue I met is in the ceph-qa-chef - there're lots of hardcoded URL related with the sepia lab. I have to trace the code and change them line by line. Can you please kindly share me how did you get this

Re: The design of the eviction improvement

2015-07-21 Thread Matt W. Benjamin
Hi, Couple of points. 1) a successor to 2Q is MQ (Li et al). We have an intrusive MQ LRU implementation with 2 levels currently, plus a pinned queue, that addresses stuff like partitioning (sharding), scan resistance, and coordination w/lookup tables. We might extend/re-use it. 2) I'm a

Re: hdparm -W redux, bug in _check_disk_write_cache for RHEL6?

2015-07-21 Thread Ilya Dryomov
On Tue, Jul 21, 2015 at 4:54 PM, Sage Weil s...@newdream.net wrote: On Tue, 21 Jul 2015, Dan van der Ster wrote: Hi, Following the sf.net corruption report I've been checking our config w.r.t data consistency. AFAIK the two main recommendations are: 1) don't mount FileStores with

Re: The design of the eviction improvement

2015-07-21 Thread Gregory Farnum
On Tue, Jul 21, 2015 at 3:15 PM, Matt W. Benjamin m...@cohortfs.com wrote: Hi, Couple of points. 1) a successor to 2Q is MQ (Li et al). We have an intrusive MQ LRU implementation with 2 levels currently, plus a pinned queue, that addresses stuff like partitioning (sharding), scan

Re: dmcrypt with luks keys in hammer

2015-07-21 Thread Milan Broz
On 07/21/2015 01:14 PM, David Disseldorp wrote: A race condition (or other issue) with udev seems likely given that its rather random which ones come up and which ones don't A race condition during creation or activation? If it's activation I would expect ceph-disk activate ... to work

Re: dmcrypt with luks keys in hammer

2015-07-21 Thread Wyllys Ingersoll
ceph-disk activate-all does not fix the problem for non-systemd users. Once they are into the temporary-cryptsetup-PID state, they have to be manually cleared and remounted as follows: 1. cryptsetup close all of the ones in the temporary-cryptsetup state 2. find the UUID for each block device

Re: upstream/firefly exporting the same snap 2 times results in different exports

2015-07-21 Thread Jason Dillaman
Any chance that the snapshot was just created prior to the first export and you have a process actively writing to the image? -- Jason Dillaman Red Hat dilla...@redhat.com http://www.redhat.com - Original Message - From: Stefan Priebe - Profihost AG s.pri...@profihost.ag To: