Re: messenger refactor notes

2013-11-09 Thread Gregory Farnum
On Sat, Nov 9, 2013 at 10:13 AM, Samuel Just sam.j...@inktank.com wrote: Currently, the messenger delivers messages to the Dispatcher implementation from a single thread (See src/msg/DispatchQueue.h/cc). My take away from the performance work so far is that we probably need client IO related

Re: cache tier blueprint (part 2)

2013-11-08 Thread Gregory Farnum
On Thu, Nov 7, 2013 at 6:56 AM, Sage Weil s...@inktank.com wrote: I typed up what I think is remaining for the cache tier work for firefly. Greg, can you take a look? I'm most likely missing a bunch of stuff here.

Re: [ceph-users] radosgw - complete_multipart errors

2013-10-31 Thread Gregory Farnum
On Thu, Oct 31, 2013 at 6:22 AM, Dominik Mostowiec dominikmostow...@gmail.com wrote: Hi, I have strange radosgw error: == 2013-10-26 21:18:29.844676 7f637beaf700 0 setting object tag=_ZPeVs7d6W8GjU8qKr4dsilbGeo6NOgw 2013-10-26 21:18:30.049588 7f637beaf700 0 WARNING: set_req_state_err

Re: [PATCH] ceph: cleanup aborted requests when re-sending requests.

2013-10-23 Thread Gregory Farnum
A little delayed, but Sage just pushed this into our testing repo. Thanks! (Feel free to poke me in future if you know you have patches that have been hanging for a while.) -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Wed, Sep 25, 2013 at 11:25 PM, Yan, Zheng

Re: Removing disks / OSDs

2013-10-22 Thread Gregory Farnum
On Mon, Oct 21, 2013 at 11:13 PM, Loic Dachary l...@dachary.org wrote: On 21/10/2013 18:49, Gregory Farnum wrote: I'm not quite sure what questions you're actually asking here... In general, the OSD is not removed from the system without explicit admin intervention. When it is removed, all

Re: Removing disks / OSDs

2013-10-21 Thread Gregory Farnum
I'm not quite sure what questions you're actually asking here... In general, the OSD is not removed from the system without explicit admin intervention. When it is removed, all traces of it should be zapped (including its key), so it can't reconnect. If it hasn't been removed, then indeed it will

Re: issues when bucket index deep-scrubbing

2013-10-21 Thread Gregory Farnum
the bucket index itself, rather than sharding across buckets in the application. :) -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com Regards Dominik 2013/10/18 Gregory Farnum g...@inktank.com: On Fri, Oct 18, 2013 at 4:01 AM, Dominik Mostowiec dominikmostow...@gmail.com

Re: Removing disks / OSDs

2013-10-21 Thread Gregory Farnum
On Mon, Oct 21, 2013 at 9:57 AM, Loic Dachary l...@dachary.org wrote: On 21/10/2013 18:49, Gregory Farnum wrote: I'm not quite sure what questions you're actually asking here... I guess I was asking if my understanding was correct. In general, the OSD is not removed from the system without

Re: issues when bucket index deep-scrubbing

2013-10-21 Thread Gregory Farnum
limitations in ceph that can affect us? -- Regards Dominik 2013/10/21 Gregory Farnum g...@inktank.com: On Mon, Oct 21, 2013 at 2:26 AM, Dominik Mostowiec dominikmostow...@gmail.com wrote: Hi, Thanks for your response. That is definitely the obvious next step, but it's a non-trivial amount

Re: set mds message priority to MSG_PRIO_HIGH

2013-10-21 Thread Gregory Farnum
On Sat, Oct 19, 2013 at 7:14 AM, Yan, Zheng uker...@gmail.com wrote: On Sat, Oct 19, 2013 at 8:58 PM, hjwsm1989-gmail hjwsm1...@gmail.com wrote: Hi, I'm testing ceph with samba. I have 20 OSD nodes on 4 hosts dy01:1MON, 5 OSDs, 1 samba server dy02: 1MDS, 5OSDs, 1 samba server dy03: 5 OSDs,

Re: [RFC PATCH] ceph: add acl for cephfs

2013-10-18 Thread Gregory Farnum
Isn't the UID/GID mismatch a generic problem when using CephFS? ;) I've got this patch in my queue as well if nobody else beats me to it. -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Thu, Oct 17, 2013 at 5:39 AM, Li Wang liw...@ubuntukylin.com wrote: Hi, I did not

Re: issues when bucket index deep-scrubbing

2013-10-18 Thread Gregory Farnum
On Fri, Oct 18, 2013 at 4:01 AM, Dominik Mostowiec dominikmostow...@gmail.com wrote: Hi, I plan to shard my largest bucket because of issues of deep-scrubbing (when PG which index for this bucket is stored on is deep-scrubbed, it appears many slow requests and OSD grows in memory - after

Re: [PATCH] mds: update backtrace when old format inode is touched

2013-10-16 Thread Gregory Farnum
I came across this patch while going through my email backlog and it looks like we haven't pulled in this patch or anything like it. Did you do something about this problem in a different way? (The patch doesn't apply cleanly so I'll need to update it if this is still what we've got.) -Greg

Re: rados_clone_range for different pgs

2013-10-08 Thread Gregory Farnum
On Tue, Oct 8, 2013 at 7:40 AM, Oleg Krasnianskiy oleg.krasnians...@gmail.com wrote: We use ceph to store huge files stripped into small (4mb) objects. Due to the fact that files can be changed unpredictably (data insertion/modification/deletion in any part of a file), we have to copy parts of

Re: thought on storing bloom (hit) info

2013-10-02 Thread Gregory Farnum
On Wed, Oct 2, 2013 at 5:02 PM, Sage Weil s...@inktank.com wrote: If we make this a special internal object we need to complicate recovery and namespacing to keep is separate from user data. We also need to implement a new API for retrieving, trimming, and so forth. Instead, we could just

Re: thought on storing bloom (hit) info

2013-10-02 Thread Gregory Farnum
On Wed, Oct 2, 2013 at 5:19 PM, Sage Weil s...@inktank.com wrote: On Wed, 2 Oct 2013, Gregory Farnum wrote: On Wed, Oct 2, 2013 at 5:02 PM, Sage Weil s...@inktank.com wrote: If we make this a special internal object we need to complicate recovery and namespacing to keep is separate from user

Re: bloom filter thoughts

2013-09-26 Thread Gregory Farnum
On Thu, Sep 26, 2013 at 8:52 AM, Sage Weil s...@inktank.com wrote: On Thu, 26 Sep 2013, Mark Nelson wrote: On 09/25/2013 07:34 PM, Sage Weil wrote: I spent some time on the plane playing with bloom filters. We're looking at using these on the OSD to (more) efficiently keep track of which

Re: crc32 for erasure code

2013-09-23 Thread Gregory Farnum
On Mon, Sep 23, 2013 at 1:34 AM, Loic Dachary l...@dachary.org wrote: Hi, Unless I'm mistaken, ceph_crc32() is currently used in master via the crc32c() method of bufferlist to: * encode_with_checksum/decode_with_checksum a PGLog entry * Message::decode_message/Message::encode_message a

Re: Object Write Latency

2013-09-20 Thread Gregory Farnum
On Fri, Sep 20, 2013 at 5:27 AM, Andreas Joachim Peters andreas.joachim.pet...@cern.ch wrote: Hi, we made some benchmarks about object read/write latencies on the CERN ceph installation. The cluster has 44 nodes and ~1k disks, all on 10GE and the pool configuration has 3 copies. Client

Re: [ceph-users] About ceph testing

2013-09-18 Thread Gregory Farnum
On Tue, Sep 17, 2013 at 10:07 PM, david zhang zhang.david2...@gmail.com wrote: Hi ceph-users, Previously I sent one mail to ask for help on ceph unit test and function test. Thanks to one of your guys, I got replied about unit test. Since we are planning to use ceph, but with strict quality

Re: Paxos vs Raft

2013-09-14 Thread Gregory Farnum
On Fri, Sep 13, 2013 at 11:39 PM, Loic Dachary l...@dachary.org wrote: Hi, Ceph ( http://ceph.com/ ) relies on a custom implementation of Paxos to provide exabyte scale distributed storage. Like most people recently exposed to Paxos, I struggle to understand it ... but will keep studying

Re: ocfs2 for OSDs?

2013-09-11 Thread Gregory Farnum
On Wed, Sep 11, 2013 at 12:55 PM, David Disseldorp dd...@suse.de wrote: Hi Sage, On Wed, 11 Sep 2013 09:18:13 -0700 (PDT) Sage Weil s...@inktank.com wrote: REFLINKs (inode-based writeable snapshots) This is the one item on this list I see that the ceph-osds could take real advantage of;

Re: Questions about mds locks

2013-08-29 Thread Gregory Farnum
On Wed, Aug 28, 2013 at 4:41 PM, 袁冬 yuandong1...@gmail.com wrote: Hello, everyone. I have some questions about mds locks. I search google and read almost all Sage's papers, but I found no details about mds locks. :( Unfortunately these encompass some of the most complicated and least

Re: Questions about mds locks

2013-08-29 Thread Gregory Farnum
On Thu, Aug 29, 2013 at 6:33 PM, Dong Yuan yuandong1...@gmail.com wrote: It seems that different lock item uses different class with different state machine for different MDRequest process. :) Maybe I should concentrate on a particular lock item first, Can you give me some suggest?

Re: How Might a Full-Text Searching Capability be Integrated with Ceph?

2013-08-29 Thread Gregory Farnum
On Wed, Aug 28, 2013 at 9:56 PM, Kevin Frey kevin.f...@internode.net.au wrote: Hello All, This is my first post to the list, and my question is very general to encourage discussion (perhaps derision). I am the team-leader involved with the development of an application of which one common

Re: [PATCH] enable mds rejoin with active inodes' old parent xattrs

2013-08-23 Thread Gregory Farnum
On Fri, Aug 23, 2013 at 4:00 AM, Alexandre Oliva ol...@gnu.org wrote: On Aug 22, 2013, Yan, Zheng uker...@gmail.com wrote: This is not bug. Only the tail entry of the path encoded in the parent xattrs need to be updated. (the entry for inode's parent directory) Why store the others, if

Re: [PATCH 2/2] client: trim deleted inode

2013-08-23 Thread Gregory Farnum
Looks like this patch hasn't been merged in yet, although its partner to make the MDS notify about deleted inodes was. Any particular reason, or just still waiting for review? :) -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Sat, Jul 20, 2013 at 7:21 PM, Yan, Zheng

Re: [PATCH 3/3] ceph: rework trim caps code

2013-08-23 Thread Gregory Farnum
Did this patch get dropped on purpose? I also don't see it in our testing branch. -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Sun, Aug 4, 2013 at 11:10 PM, Yan, Zheng zheng.z@intel.com wrote: From: Yan, Zheng zheng.z@intel.com The trim caps code that handles

Re: [PATCH] mds: remove waiting lock before merging with neighbours

2013-08-23 Thread Gregory Farnum
Hi David, I'm really sorry it took us so long to get back to you on this. :( However, I've reviewed the patch and, apart from going over the code making me want to strangle myself for structuring it that way, everything looks good. I changed the last paragraph in the commit message very slightly

Re: [ceph-users] Flapping osd / continuously reported as failed

2013-08-19 Thread Gregory Farnum
On Fri, Aug 16, 2013 at 5:47 AM, Mostowiec Dominik dominik.mostow...@grupaonet.pl wrote: Hi, Thanks for your response. It's possible, as deep scrub in particular will add a bit of load (it goes through and compares the object contents). It is possible that the scrubbing blocks access(RW or

Re: [ceph-users] Flapping osd / continuously reported as failed

2013-08-19 Thread Gregory Farnum
On Mon, Aug 19, 2013 at 3:09 PM, Mostowiec Dominik dominik.mostow...@grupaonet.pl wrote: Hi, Yes, it definitely can as scrubbing takes locks on the PG, which will prevent reads or writes while the message is being processed (which will involve the rgw index being scanned). It is possible to

Re: github pull requests, comments and rebase

2013-08-14 Thread Gregory Farnum
On Thu, Aug 8, 2013 at 1:46 PM, Sage Weil s...@inktank.com wrote: On Thu, 8 Aug 2013, Loic Dachary wrote: Hi Sage, During the discussions about continuous integration at the CDS this week ( http://youtu.be/cGosx5zD4FM?t=1h16m05s ) you mentionned that github was able to keep track of the

Re: cephfs set_layout - EINVAL - solved

2013-08-14 Thread Gregory Farnum
On Fri, Aug 9, 2013 at 2:03 AM, Kasper Dieter dieter.kas...@ts.fujitsu.com wrote: OK, I found this nice page: http://ceph.com/docs/next/dev/file-striping/ which explains --stripe_unit --stripe_count --object_size But still I'm not sure about (1) what is the equivalent command on cephfs to

Re: cephfs set_layout - tuning

2013-08-14 Thread Gregory Farnum
On Wed, Aug 14, 2013 at 1:38 PM, Kasper Dieter dieter.kas...@ts.fujitsu.com wrote: On Wed, Aug 14, 2013 at 10:17:24PM +0200, Gregory Farnum wrote: On Fri, Aug 9, 2013 at 2:03 AM, Kasper Dieter dieter.kas...@ts.fujitsu.com wrote: OK, I found this nice page: http://ceph.com/docs/next/dev/file

Re: Blueprint: Add LevelDB support to ceph cluster backend store

2013-07-30 Thread Gregory Farnum
On Tue, Jul 30, 2013 at 3:54 PM, Alex Elsayed eternal...@gmail.com wrote: I posted this as a comment on the blueprint, but I figured I'd say it here: The thing I'd worry about here is that LevelDB's performance (along with that of various other K/V stores) falls off a cliff for large values.

Re: [ceph-users] Flapping osd / continuously reported as failed

2013-07-25 Thread Gregory Farnum
On Thu, Jul 25, 2013 at 12:47 AM, Mostowiec Dominik dominik.mostow...@grupaonet.pl wrote: Hi We found something else. After osd.72 flapp, one PG '3.54d' was recovering long time. -- ceph health details HEALTH_WARN 1 pgs recovering; recovery 1/39821745 degraded (0.000%) pg 3.54d is

Re: a few rados blueprints

2013-07-25 Thread Gregory Farnum
On Thu, Jul 25, 2013 at 4:01 PM, Sage Weil s...@inktank.com wrote: I've added a blueprint for avoiding double-writes when using btrfs: http://wiki.ceph.com/01Planning/02Blueprints/Emperor/osd:_clone_from_journal_on_btrfs This should improve throughput significantly when the journal

Re: a few rados blueprints

2013-07-25 Thread Gregory Farnum
On Thu, Jul 25, 2013 at 4:28 PM, Sage Weil s...@inktank.com wrote: On Thu, 25 Jul 2013, Gregory Farnum wrote: On Thu, Jul 25, 2013 at 4:01 PM, Sage Weil s...@inktank.com wrote: I've added a blueprint for avoiding double-writes when using btrfs: http://wiki.ceph.com/01Planning

Re: [ceph-users] Flapping osd / continuously reported as failed

2013-07-23 Thread Gregory Farnum
On Tue, Jul 23, 2013 at 3:20 PM, Studziński Krzysztof krzysztof.studzin...@grupaonet.pl wrote: On Tue, Jul 23, 2013 at 2:50 PM, Studziński Krzysztof krzysztof.studzin...@grupaonet.pl wrote: Hi, We've got some problem with our cluster - it continuously reports failed one osd and after

Re: ceph file system: extended attributes differ between ceph.ko and ceph-fuse

2013-07-23 Thread Gregory Farnum
On Thu, Jul 18, 2013 at 3:49 AM, Andreas Bluemle andreas.blue...@itxperts.de wrote: Hi, I am looking at ceph filesystem both via the kernel module and ceph-fuse. I am running on CentOS6.4 with - kernel 3.8.13 (for ceph.ko) and - ceph v0.61.4 userland components I encounter an

Re: Set object mtime

2013-07-12 Thread Gregory Farnum
On Thu, Jul 11, 2013 at 6:56 AM, Sylvain Munaut s.mun...@whatever-company.com wrote: Hi, Okay thanks! A call in the C API would be handy. I was wanting to look at creating a tool to sync RADOS between clusters. Is that anything that's in the development plan already? I was just thinking of

Re: Re: [PATCH] ceph:Update the file time after mmap-write.

2013-07-11 Thread Gregory Farnum
On Thu, Jul 11, 2013 at 2:03 AM, Yan, Zheng uker...@gmail.com wrote: On Thu, Jul 11, 2013 at 3:53 PM, majianpeng majianp...@gmail.com wrote: On Thu, Jul 11, 2013 at 9:17 AM, majianpeng majianp...@gmail.com wrote: Although, mmap-write of ceph update the time of file using

Re: 4x write amplification?

2013-07-10 Thread Gregory Farnum
On Tue, Jul 9, 2013 at 7:08 PM, Li Wang liw...@ubuntukylin.com wrote: Hi, We did a simple throughput test on Ceph with 2 OSD nodes configured with one replica policy. For each OSD node, the throughput measured by 'dd' run locally is 117MB/s. Therefore, in theory, the two OSDs could provide

tcmalloc memory leak on squeeze

2013-06-19 Thread Gregory Farnum
Hi Daigo, In the Ceph project we use tcmalloc by default for its much-improved (over the standard) handling of memory fragmentation under our workload. Unfortunately, one of our users seems to have a discovered a memory leak when running the standard Debian squeeze packages:

Re: How many Pipe per Ceph OSD daemon will keep?

2013-06-06 Thread Gregory Farnum
On Thu, Jun 6, 2013 at 12:25 AM, Chen, Xiaoxi xiaoxi.c...@intel.com wrote: Hi, From the code, each pipe (contains a TCP socket) will fork 2 threads, a reader and a writer. We really observe 100+ threads per OSD daemon with 30 instances of rados bench as clients. But this

Re: two osd stack on peereng after start osd to recovery

2013-06-06 Thread Gregory Farnum
We don't have your logs (vger doesn't forward them). Can you describe the situation more completely in terms of what failures occurred and what steps you took? (Also, this should go on ceph-users. Adding that to the recipients list.) -Greg Software Engineer #42 @ http://inktank.com |

Re: How many Pipe per Ceph OSD daemon will keep?

2013-06-06 Thread Gregory Farnum
On Thu, Jun 6, 2013 at 3:37 PM, Chen, Xiaoxi xiaoxi.c...@intel.com wrote: But in ceph_user,Mark, and some users are really discussing some supermicro chassis that can have 24 spindles per 2u or 36/48 spindles per 4U even with 20 osds per node,the thread num will more than 5000,and if take

Re: Operation per second meanining

2013-06-04 Thread Gregory Farnum
On Tue, Jun 4, 2013 at 3:26 AM, Roman Alekseev rs.aleks...@gmail.com wrote: Hello, Please help me to understand op/s (operation per second) value in 'pgmap v71520: 3352 pgs: 3352 active+clean; 212 GB data, 429 GB used, 23444 GB / 23874 GB avail; 89237KB/s wr, 24op/s' line? What does

Re: [ceph-users] Ceph killed by OS because of OOM under high load

2013-06-03 Thread Gregory Farnum
On Mon, Jun 3, 2013 at 8:47 AM, Chen, Xiaoxi xiaoxi.c...@intel.com wrote: Hi, As my previous mail reported some weeks ago ,we are suffering from OSD crash/ OSD Flipping / System reboot and etc, all these unstable issue really stop us from digging further into ceph characterization.

Re: Segmentation faults in ceph-osd

2013-05-21 Thread Gregory Farnum
to each OSD and almost no reads (yet). What interface are you writing with? How many OSD servers are there? -Greg /Emil On 21 May 2013 17:10, Gregory Farnum g...@inktank.com wrote: That looks like an attempt at a 370MB memory allocation. :? What's the memory use like on those nodes, and what's

Re: [ceph-users] mon IO usage

2013-05-21 Thread Gregory Farnum
On Tue, May 21, 2013 at 8:52 AM, Sylvain Munaut s.mun...@whatever-company.com wrote: So, AFAICT, the bulk of the write would be writing out the pgmap to disk every second or so. Is it really needed to write it in full ? It doesn't change all that much AFAICT, so writing incremental changes

Re: Segmentation faults in ceph-osd

2013-05-21 Thread Gregory Farnum
On Tue, May 21, 2013 at 9:01 AM, Emil Renner Berthing c...@esmil.dk wrote: On 21 May 2013 17:55, Gregory Farnum g...@inktank.com wrote: On Tue, May 21, 2013 at 8:44 AM, Emil Renner Berthing c...@esmil.dk wrote: Hi Greg, Here are some more stats on our servers: - each server has 64GB ram

Re: [ceph-users] shared images

2013-05-13 Thread Gregory Farnum
On Mon, May 13, 2013 at 9:10 AM, Harald Rößler harald.roess...@btd.de wrote: Hi Together is there a description of how a shared image works in detail? Can such an image can be used for a shared file system on two virtual machine (KVM) to mount. In my case, write on one machine and read only

Re: [ceph-users] shared images

2013-05-13 Thread Gregory Farnum
On Mon, May 13, 2013 at 11:35 AM, Harald Rößler harald.roess...@btd.de wrote: On Mon, 2013-05-13 at 18:55 +0200, Gregory Farnum wrote: On Mon, May 13, 2013 at 9:10 AM, Harald Rößler harald.roess...@btd.de wrote: Hi Together is there a description of how a shared image works in detail

CephFS standup

2013-05-06 Thread Gregory Farnum
On Friday we decided to kill off the CephFS standup for a while since I'm about to move off the project for a while, and we don't have any other regular reports for it. I'll be moving into the RGW standup next week and will probably float around the core standup this week. Mark is making

Re: Erasure encoding as a storage backend

2013-05-04 Thread Gregory Farnum
On Sat, May 4, 2013 at 11:47 AM, Noah Watkins jayh...@cs.ucsc.edu wrote: On May 4, 2013, at 11:36 AM, Loic Dachary l...@dachary.org wrote: On 05/04/2013 08:27 PM, Noah Watkins wrote: On May 4, 2013, at 10:16 AM, Loic Dachary l...@dachary.org wrote: it would be great to get feedback

Re: [ceph-users] interesting crush rules

2013-05-01 Thread Gregory Farnum
On Wed, May 1, 2013 at 2:44 PM, Sage Weil s...@inktank.com wrote: I added a blueprint for extending the crush rule language. If there are interesting or strange placement policies you'd like to do and aren't able to currently express using CRUSH, please help us out by enumerating them on that

Re: [PATCH] libceph: fix safe completion

2013-04-30 Thread Gregory Farnum
://inktank.com | http://ceph.com -- Forwarded message -- From: Sage Weil s...@inktank.com Date: Thu, Apr 18, 2013 at 1:41 PM Subject: Re: use of osd_client's r_callback To: Gregory Farnum g...@inktank.com Cc: Alex Elder el...@inktank.com, Josh Durgin josh.dur...@inktank.com It is a bit

Re: mount.ceph: modprobe failed

2013-04-29 Thread Gregory Farnum
On Mon, Apr 29, 2013 at 4:49 AM, Dyweni - Ceph-Devel ys3fpfe2y...@dyweni.com wrote: Hello List! I'd like to report the following bug. # ceph -v ceph version 0.60 (f26f7a39021dbf440c28d6375222e21c94fe8e5c) Mount.ceph command still attempts to load the ceph (ceph fs) kernel module even

Re: Ceph FS - File Locking Status

2013-04-26 Thread Gregory Farnum
On Thu, Apr 25, 2013 at 8:01 PM, Dyweni - Ceph-Devel ys3fpfe2y...@dyweni.com wrote: Hi All (again), I'm specific interested in POSIX advisory locks... as referred to by SQLite Version 3 (http://www.sqlite.org/lockingv3.html, section 6). I want to store a set of SQLite database files on a

Re: final mon issues

2013-04-26 Thread Gregory Farnum
Yeah, looking again... (and at your logs, too). -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Fri, Apr 26, 2013 at 12:57 PM, Mike Dawson mike.daw...@scholarstack.com wrote: Oops. Looks like you are already on next. Greg, Perhaps the issue still exists? - Mike On

Re: OSD abstract class

2013-04-25 Thread Gregory Farnum
On Thu, Apr 25, 2013 at 1:17 AM, Loic Dachary l...@dachary.org wrote: Hi, In the context of the implementation of http://wiki.ceph.com/01Planning/02Blueprints/Dumpling/Erasure_encoding_as_a_storage_backend I'm preparing tests to assert that the modifications that will be made to the

Re: OSD abstract class

2013-04-25 Thread Gregory Farnum
On Thu, Apr 25, 2013 at 11:28 AM, Loic Dachary l...@dachary.org wrote: On 04/25/2013 06:45 PM, Gregory Farnum wrote: On Thu, Apr 25, 2013 at 1:17 AM, Loic Dachary l...@dachary.org wrote: Hi, In the context of the implementation of http://wiki.ceph.com/01Planning/02Blueprints/Dumpling

Re: clean shutdown and failover of osd

2013-04-21 Thread Gregory Farnum
On Sat, Apr 20, 2013 at 10:51 PM, James Harper james.har...@bendigoit.com.au wrote: [ This is a good query for ceph-users. ] Well... this is embarrassing. In reading the docs at http://ceph.com/docs/master/start/get-involved/ there was no mention of a users list so I just assumed there

Re: clean shutdown and failover of osd

2013-04-20 Thread Gregory Farnum
[ This is a good query for ceph-users. ] On Sat, Apr 20, 2013 at 10:15 PM, James Harper james.har...@bendigoit.com.au wrote: I'm doing some testing with ceph trying to figure out why my performance is so bad, and have noticed that there doesn't seem to be a way to cleanly stop an osd, or at

Re: [RESEND][PATCH 0/2] fix few root xattr bugs

2013-04-18 Thread Gregory Farnum
Thanks! I merged these into next (going to be Cuttlefish) in commits f379ce37bfdcb3670f52ef47c02787f82e50e612 and 87634d882fda80c4a2e3705c83a38bdfd613763f. -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Wed, Apr 17, 2013 at 11:43 PM, Kuan Kai Chiu big.c...@bigtera.com

Re: [PATCH 1/2] mds: pass proper mask to CInode::get_caps_issued

2013-04-17 Thread Gregory Farnum
This looks good to me, and I got Sage to review the second patch as well. These are both merged into the next branch — thanks! -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Fri, Apr 12, 2013 at 1:11 AM, Yan, Zheng zheng.z@intel.com wrote: From: Yan, Zheng

Re: Using locally replicated OSDs to reduce Ceph replication

2013-04-17 Thread Gregory Farnum
On Wed, Apr 17, 2013 at 3:02 PM, Steve Barber steve.bar...@nist.gov wrote: On Wed, Apr 17, 2013 at 04:49:53PM -0400, Jeff Mitchell wrote in another thread: ... If you set up the OSDs such that each OSD is based off of a ZFS mirror, you get these benefits locally. For some people, especially

Re: Using locally replicated OSDs to reduce Ceph replication

2013-04-17 Thread Gregory Farnum
On Wed, Apr 17, 2013 at 4:02 PM, Steve Barber steve.bar...@nist.gov wrote: On Wed, Apr 17, 2013 at 06:23:43PM -0400, Gregory Farnum wrote: On Wed, Apr 17, 2013 at 3:02 PM, I wrote: In particular, has anyone tried making a big RAID set (of any type) and carving out space (logical volumes

Re: [PATCH] mds: fix setting/removing xattrs on root

2013-04-16 Thread Gregory Farnum
On Mon, Apr 15, 2013 at 3:23 AM, Kuan Kai Chiu big.c...@bigtera.com wrote: MDS crashes while journaling dirty root inode in handle_client_setxattr and handle_client_removexattr. We should use journal_dirty_inode to safely log root inode here. --- src/mds/Server.cc |6 ++ 1 file

Re: [PATCH v2] os/LevelDBStore: tune LevelDB data blocking options to be more suitable for PGStat values

2013-04-16 Thread Gregory Farnum
On Fri, Apr 12, 2013 at 12:41 PM, Jim Schutt jasc...@sandia.gov wrote: Hi Greg, On 04/10/2013 06:39 PM, Gregory Farnum wrote: Jim, I took this patch as a base for setting up config options which people can tune manually and have pushed those changes to wip-leveldb-config. I was out

Re: ceph and efficient access of distributed resources

2013-04-15 Thread Gregory Farnum
is less busy. -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com 发自我的 iPhone 在 2013-4-13,0:20,Gregory Farnum g...@inktank.com 写道: I was in the middle of writing a response to this when Mark's email came in, so I'll just add a few things: On Fri, Apr 12, 2013 at 9:08 AM, Mark

Re: [ceph-users] Scrub shutdown the OSD process

2013-04-15 Thread Gregory Farnum
On Mon, Apr 15, 2013 at 10:19 AM, Olivier Bonvalet ceph.l...@daevel.fr wrote: Le lundi 15 avril 2013 à 10:16 -0700, Gregory Farnum a écrit : Are you saying you saw this problem more than once, and so you completely wiped the OSD in question, then brought it back into the cluster, and now it's

Re: new ceph-0.60 always crash mds

2013-04-12 Thread Gregory Farnum
Have you made any changes from the default settings? That assert indicates the MDS couldn't load some data that it needed; the two most likely things are that the data doesn't exist or that the MDS isn't allowed to access it due to some permissions problem on the pool in question. -Greg Software

Re: ceph and efficient access of distributed resources

2013-04-12 Thread Gregory Farnum
I was in the middle of writing a response to this when Mark's email came in, so I'll just add a few things: On Fri, Apr 12, 2013 at 9:08 AM, Mark Nelson mark.nel...@inktank.com wrote: On 04/11/2013 10:59 PM, Matthias Urlichs wrote: As I understand it, in Ceph one can cluster storage nodes, but

Re: new ceph-0.60 always crash mds

2013-04-12 Thread Gregory Farnum
On Fri, Apr 12, 2013 at 5:19 PM, Drunkard Zhang gongfan...@gmail.com wrote: 2013/4/13 Gregory Farnum g...@inktank.com: Have you made any changes from the default settings? That assert Do you means one of these settings? auth cluster required = cephx auth service required = cephx auth client

Re: [ceph-users] pool-info error in version 0.60

2013-04-11 Thread Gregory Farnum
It's more or less a Ceph bug; the patch fixing this is in the 3.9-rc's (although it should backport trivially if you're willing to build a kernel: 92a49fb0f79f3300e6e50ddf56238e70678e4202). You can look at http://tracker.ceph.com/issues/3793 if you want details. -Greg Software Engineer #42 @

Re: [PATCH] ceph: Use pseudo-random numbers to choose mds

2013-04-11 Thread Gregory Farnum
Yup, that is a change to not use entropy and to short-circuit if we only have one choice. And this function has no side effects so it just needs to return an int. Reviewed-by: Greg Farnum g...@inktank.com Software Engineer #42 @ http://inktank.com | http://ceph.com On Wed, Apr 10, 2013 at 8:05

Re: [PATCH v2] os/LevelDBStore: tune LevelDB data blocking options to be more suitable for PGStat values

2013-04-10 Thread Gregory Farnum
Jim, I took this patch as a base for setting up config options which people can tune manually and have pushed those changes to wip-leveldb-config. Thanks very much for figuring out how to set up the cache et al! For now I restructured quite a bit of the data ingestion, and I took your defaults

Re: [ceph-users] Ceph mon quorum

2013-04-05 Thread Gregory Farnum
On Fri, Apr 5, 2013 at 10:46 AM, Dimitri Maziuk dmaz...@bmrb.wisc.edu wrote: On 04/05/2013 12:32 PM, Gregory Farnum wrote: ... Or just a VM running somewhere that's got a VPN connection to your room-based monitors, yes. Ceph is a strongly consistent system and you're not going to get split

Re: Trouble getting a new file system to start, for v0.59 and newer

2013-04-03 Thread Gregory Farnum
On Wed, Apr 3, 2013 at 10:09 AM, Jim Schutt jasc...@sandia.gov wrote: Hi Sage, On 04/03/2013 09:58 AM, Sage Weil wrote: Hi Jim, What happens if you change 'osd mon ack timeout = 300' (from the default of 30)? I suspect part of the problem is that the mons are just slow enough that the

Re: Trouble getting a new file system to start, for v0.59 and newer

2013-04-03 Thread Gregory Farnum
On Wed, Apr 3, 2013 at 10:14 AM, Gregory Farnum g...@inktank.com wrote: On Wed, Apr 3, 2013 at 10:09 AM, Jim Schutt jasc...@sandia.gov wrote: Hi Sage, On 04/03/2013 09:58 AM, Sage Weil wrote: Hi Jim, What happens if you change 'osd mon ack timeout = 300' (from the default of 30)? I

Re: Trouble getting a new file system to start, for v0.59 and newer

2013-04-03 Thread Gregory Farnum
On Wed, Apr 3, 2013 at 3:40 PM, Jim Schutt jasc...@sandia.gov wrote: On 04/03/2013 12:25 PM, Sage Weil wrote: Sorry, guess I forgot some of the history since this piece at least is resolved now. I'm surprised if 30-second timeouts are causing issues without those overloads you were

Re: [PATCH 00/39] fixes for MDS cluster recovery

2013-04-01 Thread Gregory Farnum
On Mon, Apr 1, 2013 at 1:46 AM, Yan, Zheng zheng.z@intel.com wrote: On 03/17/2013 10:51 PM, Yan, Zheng wrote: From: Yan, Zheng zheng.z@intel.com This serie fixes issues I encountered when running random MDS restart tests. With these patches, my 3 MDS setup that runs fsstress +

Re: [PATCH 05/39] mds: send table request when peer is in proper state.

2013-03-29 Thread Gregory Farnum
On Sun, Mar 17, 2013 at 7:51 AM, Yan, Zheng zheng.z@intel.com wrote: From: Yan, Zheng zheng.z@intel.com Table client/server should send request/reply when the peer is active. Anchor query is an exception, because MDS in rejoin stage may need fetch files before sending rejoin ack, the

Re: [PATCH 06/39] mds: make table client/server tolerate duplicated message

2013-03-29 Thread Gregory Farnum
I believe this patch has been outdated thanks to the tid exchange you're doing now, right? -Greg On Sun, Mar 17, 2013 at 7:51 AM, Yan, Zheng zheng.z@intel.com wrote: From: Yan, Zheng zheng.z@intel.com Anchor client re-sends queries when the anchor server becomes active. So it's

Re: [PATCH 18/39] mds: fix MDS recovery involving cross authority rename

2013-03-29 Thread Gregory Farnum
Yep, this all looks good in your tree now. Reviewed-by: Greg Farnum g...@inktank.com On Thu, Mar 21, 2013 at 8:04 PM, Yan, Zheng zheng.z@intel.com wrote: On 03/22/2013 01:59 AM, Gregory Farnum wrote: On Sun, Mar 17, 2013 at 7:51 AM, Yan, Zheng zheng.z@intel.com wrote: From: Yan, Zheng

Re: [PATCH 22/39] mds: handle linkage mismatch during cache rejoin

2013-03-29 Thread Gregory Farnum
Updated version looks good. Reviewed-by: Greg Farnum g...@inktank.com On Tue, Mar 26, 2013 at 12:21 AM, Yan, Zheng zheng.z@intel.com wrote: Updated update Thanks Yan, Zheng -- From c1d3576556f5ad2849d3079845dc26ef7612e8d3 Mon Sep 17 00:00:00 2001 From: Yan, Zheng

Re: Bad Blocks

2013-03-28 Thread Gregory Farnum
The OSDs expect the underlying filesystem to keep their data clean and fail-crash in order to prevent accidentally introducing corruption into the system. There's some ongoing work to make that a little friendlier, but it's not done yet. -Greg On Wed, Mar 20, 2013 at 11:55 AM, Dyweni - Ceph-Devel

Re: rados_pool_list usage

2013-03-27 Thread Gregory Farnum
On Wednesday, March 27, 2013 at 1:59 AM, Wido den Hollander wrote: Hi, While working with rados_pool_list I stumbled upon what I think is a documentation issue. librados.h tells me this: /** * List objects in a pool * * Gets a list of pool names as NULL-terminated strings. The

Re: [PATCH 22/39] mds: handle linkage mismatch during cache rejoin

2013-03-25 Thread Gregory Farnum
On Thu, Mar 21, 2013 at 8:05 PM, Yan, Zheng zheng.z@intel.com wrote: On 03/22/2013 05:23 AM, Gregory Farnum wrote: On Sun, Mar 17, 2013 at 7:51 AM, Yan, Zheng zheng.z@intel.com wrote: From: Yan, Zheng zheng.z@intel.com For MDS cluster, not all file system namespace operations

Re: crush changes via cli

2013-03-24 Thread Gregory Farnum
On Fri, Mar 22, 2013 at 3:58 PM, Sage Weil s...@inktank.com (mailto:s...@inktank.com) wrote: On Fri, 22 Mar 2013, Gregory Farnum wrote: I suspect users are going to easily get in trouble without a more rigid separation between multi-linked and single-linked buckets. It's probably best

Re: crush changes via cli

2013-03-24 Thread Gregory Farnum
On Sun, Mar 24, 2013 at 5:04 PM, Sage Weil s...@inktank.com wrote: On Sun, 24 Mar 2013, Gregory Farnum wrote: On Fri, Mar 22, 2013 at 3:58 PM, Sage Weil s...@inktank.com (mailto:s...@inktank.com) wrote: On Fri, 22 Mar 2013, Gregory Farnum wrote: I suspect users are going to easily get

Re: crush changes via cli

2013-03-22 Thread Gregory Farnum
On Fri, Mar 22, 2013 at 3:38 PM, Sage Weil s...@inktank.com wrote: There's a branch pending that lets you do the remainder of the most common crush map changes via teh cli. The command set breaks down like so: Updating leaves (devices): ceph osd crush set osd-id weight loc1 [loc2 ...]

Re: [PATCH 22/39] mds: handle linkage mismatch during cache rejoin

2013-03-21 Thread Gregory Farnum
On Sun, Mar 17, 2013 at 7:51 AM, Yan, Zheng zheng.z@intel.com wrote: From: Yan, Zheng zheng.z@intel.com For MDS cluster, not all file system namespace operations that impact multiple MDS use two phase commit. Some operations use dentry link/unlink message to update replica dentry's

Re: [PATCH 08/39] mds: consider MDS as recovered when it reaches clientreply state.

2013-03-21 Thread Gregory Farnum
On Wed, Mar 20, 2013 at 7:22 PM, Yan, Zheng zheng.z@intel.com wrote: On 03/21/2013 02:40 AM, Greg Farnum wrote: The idea of this patch makes sense, but I'm not sure if we guarantee that each daemon sees every map update — if they don't then if an MDS misses the map moving an MDS into

Re: [PATCH 11/39] mds: don't delay processing replica buffer in slave request

2013-03-21 Thread Gregory Farnum
On Wed, Mar 20, 2013 at 9:15 PM, Sage Weil s...@inktank.com wrote: On Thu, 21 Mar 2013, Yan, Zheng wrote: On 03/21/2013 05:19 AM, Greg Farnum wrote: On Sunday, March 17, 2013 at 7:51 AM, Yan, Zheng wrote: From: Yan, Zheng zheng.z@intel.com Replicated objects need to be added into the

Re: [PATCH 13/39] mds: don't send resolve message between active MDS

2013-03-21 Thread Gregory Farnum
On Wed, Mar 20, 2013 at 7:55 PM, Yan, Zheng zheng.z@intel.com wrote: On 03/21/2013 05:56 AM, Gregory Farnum wrote: On Sun, Mar 17, 2013 at 7:51 AM, Yan, Zheng zheng.z@intel.com wrote: From: Yan, Zheng zheng.z@intel.com When MDS cluster is resolving, current behavior is sending

Re: [PATCH 29/39] mds: avoid double auth pin for file recovery

2013-03-21 Thread Gregory Farnum
Went over those mechanisms quickly but a bit more carefully; looks good. Reviewed-by: Greg Farnum g...@inktank.com On Wed, Mar 20, 2013 at 8:20 PM, Gregory Farnum g...@inktank.com wrote: This looks good on its face but I haven't had the chance to dig through the recovery queue stuff yet (it's

Re: [PATCH 21/39] mds: encode dirfrag base in cache rejoin ack

2013-03-21 Thread Gregory Farnum
On Wed, Mar 20, 2013 at 11:41 PM, Yan, Zheng zheng.z@intel.com wrote: On 03/21/2013 07:33 AM, Gregory Farnum wrote: This needs to handle versioning the encoding based on peer feature bits too. On Sun, Mar 17, 2013 at 7:51 AM, Yan, Zheng zheng.z@intel.com wrote: From: Yan, Zheng

<    1   2   3   4   5   6   7   8   9   10   >