Re: new OSD re-using old OSD id fails to boot

2015-12-09 Thread David Zafman
to let me know. thanks!! vicente 2015-12-09 10:50 GMT+08:00 Sage Weil <sw...@redhat.com>: On Tue, 8 Dec 2015, David Zafman wrote: Remember I really think we want a disk replacement feature that would retain the OSD id so that it avoids unnecessary data movement. See tracke

Re: new OSD re-using old OSD id fails to boot

2015-12-08 Thread David Zafman
Remember I really think we want a disk replacement feature that would retain the OSD id so that it avoids unnecessary data movement. See tracker http://tracker.ceph.com/issues/13732 David On 12/5/15 8:49 AM, Loic Dachary wrote: Hi Sage, The problem described at "new OSD re-using old OSD

Re: 答复: [ceph-users] How long will the logs be kept?

2015-12-07 Thread David Zafman
dout() is used for an OSD to log information about what it is doing locally and might become very chatty. It is saved on the local nodes disk only. clog is the cluster log and is used for major events that should be known by the administrator (see ceph -w). Clog should be used sparingly

Re: Error handling during recovery read

2015-12-04 Thread David Zafman
I can't remember the details now, but I know that recovery needed additional work. If it were a simple fix I would have done it when implementing that code. I found this bug related to recovery and ec errors (http://tracker.ceph.com/issues/13493) BUG #13493: osd: for ec, cascading crash

Re: OSD replacement feature

2015-11-23 Thread David Zafman
:00 David Zafman <dzaf...@redhat.com>: There are two reasons for having a ceph-disk replace feature. 1. To simplify the steps required to replace a disk 2. To allow a disk to be replaced proactively without causing any data movement. Hi David, It good to without causing any data movement w

Re: OSD replacement feature

2015-11-20 Thread David Zafman
There are two reasons for having a ceph-disk replace feature. 1. To simplify the steps required to replace a disk 2. To allow a disk to be replaced proactively without causing any data movement. So keeping the osd id the same is required and is what motivated the feature for me. David On

Re: pg scrub check problem

2015-10-28 Thread David Zafman
Initiating a manual deep-scrub like you are doing should always run. The command you are running doesn't report any information it just initiates a background process. If you follow the command with ceph -w you'll see what is happening: After I corrupted one of my replicas I see this. $

Re: pg scrub check problem

2015-10-28 Thread David Zafman
Good point. In my previous response I did "echo garbage > ./foo__head_7FC1F406__1" to corrupt a replica. David On 10/28/15 5:13 PM, Sage Weil wrote: Becuse you *just* wrote the object, and the FileStore caches open file handles. Vim renames a new inode over the old one so the open

Re: wip-addr

2015-10-12 Thread David Zafman
I don't understand how encode/decode of entity_addr_t is changing without versioning in the encode/decode. This means that this branch is changing the ceph-objectstore-tool export format if CEPH_FEATURE_MSG_ADDR2 is part of the features. So we could bump super_header::super_ver if the

Re: [ceph-users] O_DIRECT on deep-scrub read

2015-10-07 Thread David Zafman
There would be a benefit to doing fadvise POSIX_FADV_DONTNEED after deep-scrub reads for objects not recently accessed by clients. I see the NewStore objectstore sometimes using the O_DIRECT flag for writes. This concerns me because the open(2) man pages says: "Applications should avoid

Tracker 12577 repair won't fix replica with bad digest

2015-08-03 Thread David Zafman
Sage, I restored the branch wip-digest-repair which merged post-hammer in pull request #4365. Do you think that 4365 fixes the reported bug #12577? I cherry-picked the 9 commits off of hammer-backports-next as pull request #5458 and assigned to Loic. David -- To unsubscribe from this

Re: ceph-objectstore-tool import failures

2015-07-07 Thread David Zafman
at least aren't valuable to keep around. -Sam - Original Message - From: Sage Weil sw...@redhat.com To: Samuel Just sj...@redhat.com Cc: David Zafman dzaf...@redhat.com, ceph-devel@vger.kernel.org Sent: Tuesday, July 7, 2015 10:22:32 AM Subject: Re: ceph-objectstore-tool import failures On Tue

Re: ceph-objectstore-tool import failures

2015-07-06 Thread David Zafman
and assume that after replay the clear_temp_objects() will clean them up? David On 7/6/15 1:28 PM, Sage Weil wrote: On Fri, 19 Jun 2015, David Zafman wrote: This ghobject_t which has a pool of -3 is part of the export. This caused the assert: Read -3/1c/temp_recovering_1.1c_33'50_39_head/head

Re: deleting objects from a pool

2015-06-26 Thread David Zafman
Regards, Igor. -Original Message- From: David Zafman [mailto:dzaf...@redhat.com] Sent: Friday, June 26, 2015 3:46 AM To: Podoski, Igor; Deneau, Tom; Dałek, Piotr; ceph-devel Subject: Re: deleting objects from a pool If you have rados bench data around, you'll need to run cleanup a second

Re: deleting objects from a pool

2015-06-25 Thread David Zafman
If you have rados bench data around, you'll need to run cleanup a second time because the first time the benchmark_last_metadata object will be consulted to find what objects to remove. Also, using cleanup this way will only remove objects from the default namespace unless a namespace is

Re: ceph-objectstore-tool import failures

2015-06-19 Thread David Zafman
Have not seen this as an assert before. Given the code below in do_import() of master branch the assert is impossible (?). if (!curmap.have_pg_pool(pgid.pgid.m_pool)) { cerr Pool pgid.pgid.m_pool no longer exists std::endl; // Special exit code for this error, used by test

Re: ceph-objectstore-tool import failures

2015-06-19 Thread David Zafman
or recreate it on import with special handling. David On 6/19/15 7:38 PM, David Zafman wrote: Have not seen this as an assert before. Given the code below in do_import() of master branch the assert is impossible (?). if (!curmap.have_pg_pool(pgid.pgid.m_pool)) { cerr Pool

rsyslogd

2015-06-18 Thread David Zafman
Greg, Have you changed anything (log rotation related?) that would uninstall or cause rsyslog to not be able to start? I'm sometimes seeing machines fail with this error probably in teuthology/nuke.py reset_syslog_dir(). CommandFailedError: Command failed on plana94 with status 1: 'sudo

Re: 'Racing read got wrong version' during proxy write testing

2015-06-03 Thread David Zafman
I'm wonder if this issue could be the cause of #11511. Could a proxy write have raced with the fill_in_copy_get() so object_info_t size doesn't correspond with the size of the object in the filestore? David On 6/3/15 6:22 PM, Wang, Zhiqiang wrote: Making the 'copy get' op to be a cache

Re: should we prepare to release firefly v0.80.10 ?

2015-04-21 Thread David Zafman
In early march I ran rados:thrash on the firefly backport of the ceph-objectstore-tool changes (wip-cot-firefly). We considered it passed, even though an obscure segfault was seen: bug #11141: Segmentation Violation: ceph-objectstore-tool doing --op list-pgs David On 4/21/15 8:52 AM,

Re: regenerating man pages

2015-03-17 Thread David Zafman
I found that I could not build the docs on Ubuntu 14.10 with the proper packages installed. Kefu is looking into Asphyxiate which is very tempermental. I installed an Ubuntu 11.10 in order to generate docs. David On 3/17/15 10:11 AM, Sage Weil wrote: On Tue, 17 Mar 2015, Josh Durgin

Hammer incompat bits and ceph-objectstore-tool

2015-03-17 Thread David Zafman
. During upgrade testing it is interesting that one node has the transaction hints feature, but other nodes still running firefly don't. Is this a case where we don't have to wait for all OSDs to update before the cluster can start handling OP_COLL_HINT operations? David Zafman

Building documentation

2015-03-09 Thread David Zafman
in use old-releases.ubuntu.com to install additional packages. Just like gitbuilder-doc the admin/build-doc command runs without errors. I assume other distributions with more up to date packages will see the same problem. I filed bug #11077 with the sphinx log attached. David Zafman

Clocks out of sync

2015-02-20 Thread David Zafman
On 2 of my rados thrash runs clocks out of sync. Is this an occasional issue or did we have an infrastructure problem? On burnupi19 and burnupi25: 2015-02-20 12:52:52.636017 mon.1 10.214.134.14:6789/0 177 : cluster [WRN] message from mon.0 was stamped 0.501458s in the future, clocks not

Disk failing plana74

2015-02-20 Thread David Zafman
A recent test run had an EIO on the following disk: plana74 /dev/sdb The machine is locked right now. David Zafman Senior Developer -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http

sage-2015-02-15_07:44:23-rados-hammer-distro-basic-multi failures

2015-02-16 Thread David Zafman
: (1) Operation not permitted David Zafman -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

LTTNG

2015-02-03 Thread David Zafman
not behave properly David Zafman Senior Developer http://www.redhat.com -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: 'Immutable bit' on pools to prevent deletion

2015-01-17 Thread David Zafman
The most secure way would be one in which you can only create pools with WORM set and can't ever change the WORM state of a pool. I like this simple/secure approach as a first cut. David On 1/17/15 11:09 AM, Alex Elsayed wrote: Sage Weil wrote: On Fri, 16 Jan 2015, Alex Elsayed wrote:

Some gitbuilders not working

2015-01-08 Thread David Zafman
We are seeing gitbuilder failures. This is what I saw on one. error: Failed build dependencies: xmlstarlet is needed by ceph-1:0.90-821.g680fe3c.el7.x86_64 David Zafman Senior Developer http://www.redhat.com -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body

Some gitbuilders not working

2015-01-08 Thread David Zafman
We are seeing gitbuilder failures. This is what I saw on one. error: Failed build dependencies: xmlstarlet is needed by ceph-1:0.90-821.g680fe3c.el7.x86_64 -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More

ceph-objectstore-tool and make check

2014-12-19 Thread David Zafman
of the tool will always be executed. David Zafman Senior Developer http://www.redhat.com -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Pull requests : speed up the reviews

2014-11-09 Thread David Zafman
time to remind people to dedicate time to code reviews. David Zafman Senior Developer http://www.inktank.com On Nov 9, 2014, at 4:08 AM, Joao Eduardo Luis j...@redhat.com wrote: On 11/08/2014 05:32 PM, Loic Dachary wrote: Hi Ceph, In the past few weeks the number of pending pull

Re: Can pid be reused ?

2014-10-22 Thread David Zafman
I just realized what it is. The way killall is used when stopping a vstart cluster, is to kill all processes by name! You can't stop vstarted tests running in parallel. David Zafman Senior Developer http://www.inktank.com On Oct 21, 2014, at 7:55 PM, Loic Dachary l...@dachary.org wrote

Re: Can pid be reused ?

2014-10-22 Thread David Zafman
On Oct 22, 2014, at 3:43 PM, Sage Weil s...@newdream.net wrote: On Wed, 22 Oct 2014, David Zafman wrote: I just realized what it is. The way killall is used when stopping a vstart cluster, is to kill all processes by name! You can't stop vstarted tests running in parallel. Ah. FWIW

Re: vstart.sh crashes MON with --paxos-propose-interval=0.01 and one MDS

2014-10-16 Thread David Zafman
I have this change in my branch so that test/ceph_objectstore_tool.py works again after that change from John. I wonder if this would fix your case too: commit 18937cf49be616d32b4e2d0b6deef2882321fbe4 Author: David Zafman dzaf...@redhat.com Date: Tue Oct 14 18:45:41 2014 -0700 vstart.sh

Re: make check failures

2014-10-08 Thread David Zafman
After updating my master branch make check” passes now. David Zafman Senior Developer http://www.inktank.com On Oct 7, 2014, at 11:28 PM, Loic Dachary l...@dachary.org wrote: [cc'ing the list in case someone else experiences problems with make check] Hi David, Yesterday you mentioned

wip-libcommon-rebase

2014-08-29 Thread David Zafman
are expressed or implied about the correctness or suitability of this branch for future use. David Zafman Senior Developer http://www.inktank.com http://www.redhat.com -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More

Testing intermediate code for improved namespace handling

2014-08-28 Thread David Zafman
-N ns1 ls ns1-obj5 ns1-obj4 ns1-obj10 ns1-obj2 ns1-obj9 ns1-obj3 ns1-obj6 ns1-obj1 ns1-obj8 ns1-obj7 David Zafman Senior Developer http://www.inktank.com http://www.redhat.com -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord

Building a tool which links with librados

2014-08-21 Thread David Zafman
: main (ceph_objectstore_tool.cc:1849) David Zafman Senior Developer http://www.inktank.com http://www.redhat.com -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo

Re: Building a tool which links with librados

2014-08-21 Thread David Zafman
The import-rados feature (#8276) uses librados so in my wip-8231 branch I now link with librados. It is hard to reproduce, but I’ll play with that commit and branch. David Zafman Senior Developer http://www.inktank.com http://www.redhat.com On Aug 21, 2014, at 4:56 PM, Sage Weil sw

Re: [RFC] add rocksdb support

2014-06-13 Thread David Zafman
: $ ./autogen.sh $ ./configure $ make David Zafman Senior Developer http://www.inktank.com http://www.redhat.com On Jun 13, 2014, at 11:51 AM, Sushma Gurram sushma.gur...@sandisk.com wrote: Hi Xinxin, I tried to compile the wip-rocksdb branch, but the src/rocksdb directory seems

mon_command

2014-03-19 Thread David Zafman
to manipulate erasure coded pools. David Zafman Senior Developer http://www.inktank.com -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: 6685 backfill head/snapdir issue brain dump

2014-02-20 Thread David Zafman
Another way to look at this is to enumerate the recovery cases: primary starts with head and no snapdir: A Recovery sets last_backfill_started to head and sends head object where needed head (1.b case while backfills in flight - 1.a when done) snapdir (2) B

wip-libcephfs-emp-rb

2013-10-07 Thread David Zafman
]: 2013-10-04 10:39:02.072487 7f57fa316780 -1 *** Caught signal (Segmentation fault) ** 2013-10-04T10:39:02.074 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err:[10.214.132.22]: in thread 7f57fa316780 David Zafman Senior Developer http://www.inktank.com -- To unsubscribe from this list: send

Re: xattr limits

2013-10-04 Thread David Zafman
Here is the test script: xattr-test.sh Description: Binary data David Zafman Senior Developer http://www.inktank.com On Oct 3, 2013, at 11:02 PM, Loic Dachary l...@dachary.org wrote: Hi David, Would you mind attaching the script to the mail for completness ? It's a useful thing

RESEND: xattr issue with 3.11 kernel

2013-10-04 Thread David Zafman
done rm src.$$ exit 0 David Zafman Senior Developer http://www.inktank.com -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More

Re: [ceph-users] v0.67.4 released

2013-10-04 Thread David Zafman
` is needed to interpret this. terminate called after throwing an instance of 'ceph::FailedAssertion' Aborted David Zafman Senior Developer http://www.inktank.com On Oct 4, 2013, at 4:55 PM, Sage Weil s...@inktank.com wrote: This point release fixes an important performance issue with radosgw

xattr limits

2013-10-03 Thread David Zafman
I want to record with the ceph-devel archive results from testing limits of xattrs for Linux filesystems used with Ceph. Script that creates xattrs with name user.test1, user.test2, …. on a single file 3.10 linux kernel ext4 value bytes number of entries 1 148 16

Re: 4 failed, 298 passed in dzafman-2013-09-23_17:50:06-rados-wip-5862-testing-basic-plana

2013-09-24 Thread David Zafman
executable` is needed to interpret this. David Zafman Senior Developer http://www.inktank.com On Sep 24, 2013, at 12:03 PM, Sage Weil s...@inktank.com wrote: On Tue, 24 Sep 2013, David Zafman wrote: Rados suite test run results for wip-5862. 2 scrub mismatch from mon (known problem). 2

Re: [ceph-users] cuttlefish countdown -- OSD doesn't get marked out

2013-04-26 Thread David Zafman
take responsibility for holding the data assigned to that rack. Though I didn't look at the data movement, I'm confident that it will work. You can simply mark your OSDs out manually to verify that missing replicas are replaced. David Zafman Senior Developer http://www.inktank.com On Apr 26

Re: [ceph-users] cuttlefish countdown -- OSD doesn't get marked out

2013-04-26 Thread David Zafman
defined. David Zafman Senior Developer http://www.inktank.com On Apr 26, 2013, at 6:44 AM, Mike Dawson mike.daw...@scholarstack.com wrote: David / Martin, I can confirm this issue. At present I am running monitors only with 100% of my OSD processes shutdown down. For the past couple hours

Re: [ceph-users] cuttlefish countdown -- OSD doesn't get marked out

2013-04-25 Thread David Zafman
I filed tracker bug 4822 and have wip-4822 with a fix. My manual testing shows that it works. I'm building a teuthology test. Given your osd tree has a single rack it should always mark OSDs down after 5 minutes by default. David Zafman Senior Developer http://www.inktank.com On Apr 25

.gitignore issues

2013-02-11 Thread David Zafman
# src/tpbench # src/xattr_bench nothing added to commit but untracked files present (use git add to track) David Zafman Senior Developer david.zaf...@inktank.com -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org

Re: [PATCH 0/2] two small patches for CEPH wireshark plugin

2013-01-28 Thread David Zafman
You could look at the wip-wireshark-zafman branch. I rebased it and force pushed it. It has changes to the wireshark.patch and a minor change I needed to get it to build. I'm surprised the recent checkin didn't include the change to packet-ceph.c which I needed to get it to build. David

master branch issue in ceph.git

2013-01-17 Thread David Zafman
active+remapped, 5 active+degraded; 0 bytes data, 798 GB used, 3050 GB / 4055 GB avail mdsmap e2: 0/0/0 up David Zafman Senior Developer david.zaf...@inktank.com -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More

Fwd: Interfaces proposed changes

2013-01-07 Thread David Zafman
I sent this proposal out to the developers that own the FSAL CEPH portion of Nfs-Ganesha. They have changes to Ceph that expose additional interfaces for this. This is our initial cut at improving the interfaces. David Zafman Senior Developer david.zaf...@inktank.com Begin forwarded

Re: [PATCH REPOST 0/4] rbd: four minor patches

2013-01-03 Thread David Zafman
I reviewed these. Reviewed-by: David Zafman david.zaf...@inktank.com David Zafman Senior Developer david.zaf...@inktank.com On Jan 3, 2013, at 11:04 AM, Alex Elder el...@inktank.com wrote: I'm re-posting my patch backlog, in chunks that may or may not match how they got posted before

testing branch of ceph-client repo was force pushed

2012-12-08 Thread David Zafman
I amended the last 5 commits which I committed to the testing branch last night. Please update your repositories accordingly. David-- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at

Re: 0.55 init script Issue?

2012-12-05 Thread David Zafman
Keep in mind that some of the init.d stuff doesn't work with a ceph-deploy installed system. Not clear to me if we need to fix ceph-deploy or for those type of setups only upstart should be used/available. David On Dec 5, 2012, at 11:41 AM, Dan Mick dan.m...@inktank.com wrote: The story as

Re: Hadoop and Ceph client/mds view of modification time

2012-11-27 Thread David Zafman
On Nov 27, 2012, at 9:03 AM, Sage Weil s...@inktank.com wrote: On Tue, 27 Nov 2012, Sam Lang wrote: 3. When a client acquires the cap for a file, have the mds provide its current time as well. As the client updates the mtime, it uses the timestamp provided by the mds and the time

Re: Hadoop and Ceph client/mds view of modification time

2012-11-27 Thread David Zafman
On Nov 27, 2012, at 11:05 AM, Sam Lang sam.l...@inktank.com wrote: On 11/27/2012 12:01 PM, Sage Weil wrote: On Tue, 27 Nov 2012, David Zafman wrote: On Nov 27, 2012, at 9:03 AM, Sage Weil s...@inktank.com wrote: On Tue, 27 Nov 2012, Sam Lang wrote: 3. When a client acquires the cap

Re: Hadoop and Ceph client/mds view of modification time

2012-11-27 Thread David Zafman
On Nov 27, 2012, at 1:14 PM, Sam Lang sam.l...@inktank.com wrote: On 11/27/2012 01:38 PM, David Zafman wrote: On Nov 27, 2012, at 11:05 AM, Sam Lang sam.l...@inktank.com wrote: On 11/27/2012 12:01 PM, Sage Weil wrote: On Tue, 27 Nov 2012, David Zafman wrote: On Nov 27, 2012, at 9:03

Re: getting kernel debug output

2012-10-24 Thread David Zafman
I also added a kcon_most teuthology task which does almost the same thing as ceph/src/script/kcon_most.sh to all or any set of clients. The teuthology version does not raise the console log level. For example: tasks: - ceph: - kclient: - kcon_most: - interactive: On Oct 24, 2012, at 11:14