Re: [ceph-users] filestore to bluestore: osdmap epoch problem and is the documentation correct?

2018-01-17 Thread Dan Jakubiec
Also worth pointing out something a bit obvious but: this kind of faster/destructive migration should only be attempted if all your pools are at least 3x replicated. For example, if you had a 1x replicated pool you would lose data using this approach. -- Dan > On Jan 11, 2018, at 14:24, Reed

Re: [ceph-users] CephFS log jam prevention

2017-12-05 Thread Dan Jakubiec
To add a little color here... we started an rsync last night to copy about 4TB worth of files to CephFS. Paused it this morning because CephFS was unresponsive on the machine (e.g. can't cat a file from the filesystem). Been waiting about 3 hours for the log jam to clear. Slow requests have s

Re: [ceph-users] stalls caused by scrub on jewel

2016-12-02 Thread Dan Jakubiec
> On Dec 2, 2016, at 10:48, Sage Weil wrote: > > On Fri, 2 Dec 2016, Dan Jakubiec wrote: >> For what it's worth... this sounds like the condition we hit we >> re-enabled scrub on our 16 OSDs (after 6 to 8 weeks of noscrub). They >> flapped for about 30 minu

Re: [ceph-users] stalls caused by scrub on jewel

2016-12-02 Thread Dan Jakubiec
For what it's worth... this sounds like the condition we hit we re-enabled scrub on our 16 OSDs (after 6 to 8 weeks of noscrub). They flapped for about 30 minutes as most of the OSDs randomly hit suicide timeouts here and there. This settled down after about an hour and the OSDs stopped dying.

Re: [ceph-users] How to pick the number of PGs for a CephFS metadata pool?

2016-11-08 Thread Dan Jakubiec
Thanks Greg, makes sense. Our ceph cluster currently has 16 OSDs, each with an 8TB disk. Sounds like 32 PGs at 3x replication might be a reasonable starting point? Thanks, -- Dan > On Nov 8, 2016, at 14:02, Gregory Farnum wrote: > > On Tue, Nov 8, 2016 at 9:37 AM, Dan Jakubi

[ceph-users] How to pick the number of PGs for a CephFS metadata pool?

2016-11-08 Thread Dan Jakubiec
Hello, Picking the number of PGs for the CephFS data pool seems straightforward, but how does one do this for the metadata pool? Any rules of thumb or recommendations? Thanks, -- Dan Jakubiec ___ ceph-users mailing list ceph-users@lists.ceph.com

[ceph-users] Multi-tenancy and sharing CephFS data pools with other RADOS users

2016-11-02 Thread Dan Jakubiec
We currently have one master RADOS pool in our cluster that is shared among many applications. All objects stored in the pool are currently stored using specific namespaces -- nothing is stored in the default namespace. We would like to add a CephFS filesystem to our cluster, and would like to

Re: [ceph-users] CephFS in existing pool namespace

2016-11-02 Thread Dan Jakubiec
Hi John, How does one configure namespaces for file/dir layouts? I'm looking here, but am not seeing any mentions of namespaces: http://docs.ceph.com/docs/jewel/cephfs/file-layouts/ Thanks, -- Dan > On Oct 28, 2016, at 04:11, John Spray wrote: > > On Thu, Oct 27, 2016 at 9:43 PM, Reed Di

Re: [ceph-users] Surviving a ceph cluster outage: the hard way

2016-10-24 Thread Dan Jakubiec
Thanks Kostis, great read. We also had a Ceph disaster back in August and a lot of this experience looked familiar. Sadly, in the end we were not able to recover our cluster but glad to hear that you were successful. LevelDB corruptions were one of our big problems. Your note below about r

Re: [ceph-users] Recovery/Backfill Speedup

2016-10-05 Thread Dan Jakubiec
ng how many pg's are backfilling and the load on machines and network. kind regards Ronny Aasen ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Dan Jakubiec VP Development Focus VQ

[ceph-users] Is rados_write_op_* any more efficient than issuing the commands individually?

2016-09-06 Thread Dan Jakubiec
Hello, I need to issue the following commands on millions of objects: rados_write_full(oid1, ...) rados_setxattr(oid1, "attr1", ...) rados_setxattr(oid1, "attr2", ...) Would it make it any faster if I combined all 3 of these into a single rados_write_op and issued them "together" as a single cal

Re: [ceph-users] OSD daemon randomly stops

2016-09-03 Thread Dan Jakubiec
Hi Brad, thank you very much for the response: > On Sep 3, 2016, at 17:05, Brad Hubbard wrote: > > > > On Sun, Sep 4, 2016 at 6:21 AM, Dan Jakubiec <mailto:dan.jakub...@gmail.com>> wrote: > >> 2016-09-03 16:12:44.124033 7fec728c9700 15 >> f

Re: [ceph-users] OSD daemon randomly stops

2016-09-03 Thread Dan Jakubiec
Hi Samuel, Here is another assert, but this time with debug filestore = 20. Does this reveal anything? 2016-09-03 16:12:44.122451 7fec728c9700 20 list_by_hash_bitwise prefix 08F3 2016-09-03 16:12:44.123046 7fec728c9700 20 list_by_hash_bitwise prefix 08F30042 2016-09-03 16:12:44.123068 7fec728c97

Re: [ceph-users] How to abandon PGs that are stuck in "incomplete"?

2016-09-03 Thread Dan Jakubiec
there is no command to removed the old OSD, I think our next step will be to bring up a new/real/empty OSD.8 and see if that will clear the log jam. But seems like there should be a tool to deal with this kind of thing? Thanks, -- Dan > On Sep 2, 2016, at 15:01, Dan Jakubiec wrote: &g

[ceph-users] Can someone explain the strange leftover OSD devices in CRUSH map -- renamed from osd.N to deviceN?

2016-09-02 Thread Dan Jakubiec
A while back we removed two damaged OSDs from our cluster, osd.0 and osd.8. They are now gone from most Ceph commands, but are still showing up in the CRUSH map with weird device names: ... # devices device 0 device0 device 1 osd.1 device 2 osd.2 device 3 osd.3 device 4 osd.4 device 5 osd.5 de

[ceph-users] How to abandon PGs that are stuck in "incomplete"?

2016-09-02 Thread Dan Jakubiec
Re-packaging this question which was buried in a larger, less-specific thread from a couple of days ago. Hoping this will be more useful here. We have been working on restoring our Ceph cluster after losing a large number of OSDs. We have all PGs active now except for 80 PGs that are stuck in

Re: [ceph-users] Slow Request on OSD

2016-09-01 Thread Dan Jakubiec
Thanks you for all the help Wido: > On Sep 1, 2016, at 14:03, Wido den Hollander wrote: > > You have to mark those OSDs as lost and also force create the incomplete PGs. > This might be the root of our problems. We didn't mark the parent OSD as "lost" before we removed it. Now ceph won't le

Re: [ceph-users] Slow Request on OSD

2016-09-01 Thread Dan Jakubiec
Thanks Wido. Reed and I have been working together to try to restore this cluster for about 3 weeks now. I have been accumulating a number of failure modes that I am hoping to share with the Ceph group soon, but have been holding off a bit until we see the full picture clearly so that we can p

Re: [ceph-users] librados Java support for rados_lock_exclusive()

2016-08-25 Thread Dan Jakubiec
> > You are more then welcome to send a Pull Request though! > https://github.com/ceph/rados-java/pulls > > Wido > >> Op 24 augustus 2016 om 21:58 schreef Dan Jakubiec : >> >> >> Hello, >> >> Is anyone planning to implement support for

[ceph-users] librados Java support for rados_lock_exclusive()

2016-08-24 Thread Dan Jakubiec
Hello, Is anyone planning to implement support for Rados locks in the Java API anytime soon? Thanks, -- Dan J ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] How can we repair OSD leveldb?

2016-08-17 Thread Dan Jakubiec
Hi Wido, Thank you for the response: > On Aug 17, 2016, at 16:25, Wido den Hollander wrote: > > >> Op 17 augustus 2016 om 17:44 schreef Dan Jakubiec : >> >> >> Hello, we have a Ceph cluster with 8 OSD that recently lost power to all 8 >> mach

[ceph-users] How can we repair OSD leveldb?

2016-08-17 Thread Dan Jakubiec
Hello, we have a Ceph cluster with 8 OSD that recently lost power to all 8 machines. We've managed to recover the XFS filesystems on 7 of the machines, but the OSD service is only starting on 1 of them. The other 5 machines all have complaints similar to the following: 2016-08-17 09:32