[ceph-users] CephFS Recovery/Internals Questions

2019-08-02 Thread Pierre Dittes
Hi, we had some major up with our CephFS. Long story short..no Journal backup and journal was truncated. Now..I still see a metadata pool with all objects and datapool is fine, from what I know neither was corrupted. Last mount attempt showed a blank FS though. What are the proper steps now to

[ceph-users] Lifecycle and dynamic resharding

2019-08-02 Thread Sean Purdy
Hi, A while back I reported a bug in luminous where lifecycle on a versioned bucket wasn't removing delete markers. I'm interested in this phrase in the pull request: "you can't expect lifecycle to work with dynamic resharding enabled." Why not? https://github.com/ceph/ceph/pull/29122 https:

Re: [ceph-users] Adventures with large RGW buckets

2019-08-02 Thread Harald Staub
Right now our main focus is on the Veeam use case (VMWare backup), used with an S3 storage tier. Currently we host a bucket with 125M objects and one with 100M objects. As Paul stated, searching common prefixes can be painful. We had some cases that did not work (taking too much time, radosgw

Re: [ceph-users] Urgent Help Needed (regarding rbd cache)

2019-08-02 Thread Muhammad Junaid
Thanks Oliver and all others. This was really helpful. Regards. Muhammad Junaid On Thu, Aug 1, 2019 at 5:25 PM Oliver Freyermuth < freyerm...@physik.uni-bonn.de> wrote: > Hi together, > > Am 01.08.19 um 08:45 schrieb Janne Johansson: > > Den tors 1 aug. 2019 kl 07:31 skrev Muhammad Junaid < > ju

Re: [ceph-users] Lifecycle and dynamic resharding

2019-08-02 Thread Abhishek Lekshmanan
"Sean Purdy" writes: > Hi, > > A while back I reported a bug in luminous where lifecycle on a versioned > bucket wasn't removing delete markers. > > I'm interested in this phrase in the pull request: > > "you can't expect lifecycle to work with dynamic resharding enabled." the luminous backport

Re: [ceph-users] Balancer in HEALTH_ERR

2019-08-02 Thread Smith, Eric
Great – what does ceph health detail show ? I’m guessing you most likely need to remove the OSDs from CEPH006 (As well as CEPH006) to get Ceph to move the data where it needs to be. OSD removal process is here: https://docs.ceph.com/docs/master/rados/operations/add-or-rm-osds/ Eric From: EDH -

Re: [ceph-users] Adventures with large RGW buckets [EXT]

2019-08-02 Thread Lars Marowsky-Bree
On 2019-08-01T15:20:19, Matthew Vernon wrote: > One you don't mention is that multipart uploads break during resharding - so > if our users are filling up a bucket with many writers uploading multipart > objects, some of these will fail (rather than blocking) when the bucket is > resharded. Is t

Re: [ceph-users] Adventures with large RGW buckets [EXT]

2019-08-02 Thread Matthew Vernon
Hi, On 02/08/2019 13:23, Lars Marowsky-Bree wrote: > On 2019-08-01T15:20:19, Matthew Vernon wrote: > >> One you don't mention is that multipart uploads break during resharding - so >> if our users are filling up a bucket with many writers uploading multipart >> objects, some of these will fail (

Re: [ceph-users] bluestore write iops calculation

2019-08-02 Thread vitalif
1. For 750 object write request , data written directly into data partition and since we use EC 4+1 there will be 5 iops across the cluster for each obejct write . This makes 750 * 5 = 3750 iops don't forget about the metadata and the deferring of small writes. deferred write queue + metadata,

[ceph-users] backfilling causing a crash in osd.

2019-08-02 Thread response
Hi All, Ive got a strange situation hopefully someone can help with. We have a backfill occuring, that never completes, the destination osd of the recovery predictably crashes. Outing the destination osd so another osd takes the backfill causes a different osd in the cluster then to crash, boo

Re: [ceph-users] Adventures with large RGW buckets

2019-08-02 Thread Paul Emmerich
On Thu, Aug 1, 2019 at 10:48 PM Gregory Farnum wrote: > > On Thu, Aug 1, 2019 at 12:06 PM Eric Ivancich wrote: > I expect RGW could do this, but unfortunately deleting namespaces at > the RADOS level is not practical. People keep asking and maybe in some > future world it will be cheaper, but a n

Re: [ceph-users] bluestore write iops calculation

2019-08-02 Thread Paul Emmerich
On Fri, Aug 2, 2019 at 2:51 PM wrote: > > > 1. For 750 object write request , data written directly into data > > partition and since we use EC 4+1 there will be 5 iops across the > > cluster for each obejct write . This makes 750 * 5 = 3750 iops > > don't forget about the metadata and the deferri

Re: [ceph-users] Adventures with large RGW buckets

2019-08-02 Thread Josh Durgin
On 8/2/19 3:04 AM, Harald Staub wrote: Right now our main focus is on the Veeam use case (VMWare backup), used with an S3 storage tier. Currently we host a bucket with 125M objects and one with 100M objects. As Paul stated, searching common prefixes can be painful. We had some cases that did

[ceph-users] Built-in HA?

2019-08-02 Thread Volodymyr Litovka
Dear colleagues, at the moment, we use Ceph in routed environment (OSPF, ECMP) and everything is ok, reliability is high and there is nothing to complain about. But for hardware reasons (to be more precise - RDMA offload), we are faced with the need to operate Ceph directly on physical interfaces

Re: [ceph-users] bluestore write iops calculation

2019-08-02 Thread vitalif
where small means 32kb or smaller going to BlueStore, so <= 128kb writes from the client. Also: please don't do 4+1 erasure coding, see older discussions for details. Can you point me to the discussion abort the problems of 4+1? It's not easy to google :) -- Vitaliy Filippov __

Re: [ceph-users] Adventures with large RGW buckets [EXT]

2019-08-02 Thread J. Eric Ivancich
A few interleaved responses below On 8/1/19 10:20 AM, Matthew Vernon wrote: > Hi, > > On 31/07/2019 19:02, Paul Emmerich wrote: > > Some interesting points here, thanks for raising them :) > We've had some problems with large buckets (from around the 70Mobject > mark). > > One you don't m

Re: [ceph-users] Ceph Scientific Computing User Group

2019-08-02 Thread Mike Perez
We have scheduled the next meeting on the community calendar for August 28 at 14:30 UTC. Each meeting will then take place on the last Wednesday of each month. Here's the pad to collect agenda/notes: https://pad.ceph.com/p/Ceph_Science_User_Group_Index -- Mike Perez (thingee) On Tue, Jul 23, 20

[ceph-users] compat weight reset

2019-08-02 Thread Reed Dier
Hi all, I am trying to find a simple way that might help me better distribute my data, as I wrap up my Nautilus upgrades. Currently rebuilding some OSD's with bigger block.db to prevent BlueFS spillover where it isn't difficult to do so, and I'm once again struggling with unbalanced distributi

Re: [ceph-users] bluestore write iops calculation

2019-08-02 Thread Maged Mokhtar
On 02/08/2019 08:54, nokia ceph wrote: Hi Team, Could you please help us in understanding the write iops inside ceph cluster . There seems to be mismatch in iops between theoretical and what we see in disk status. Our platform 5 node cluster 120 OSDs, with each node having 24 disks HDD ( d

Re: [ceph-users] How to add 100 new OSDs...

2019-08-02 Thread Robert LeBlanc
On Fri, Jul 26, 2019 at 1:02 PM Peter Sabaini wrote: > On 26.07.19 15:03, Stefan Kooman wrote: > > Quoting Peter Sabaini (pe...@sabaini.at): > >> What kind of commit/apply latency increases have you seen when adding a > >> large numbers of OSDs? I'm nervous how sensitive workloads might react > >

Re: [ceph-users] Problems understanding 'ceph-features' output

2019-08-02 Thread Robert LeBlanc
On Tue, Jul 30, 2019 at 2:06 AM Janne Johansson wrote: > Someone should make a webpage where you can enter that hex-string and get > a list back. > Providing a minimum bitmap would allow someone to do so, and someone like me to do it manually until then. Robert LeBlanc PGP Finge

Re: [ceph-users] bluestore write iops calculation

2019-08-02 Thread Nathan Fish
Any EC pool with m=1 is fragile. By default, min_size = k+1, so you'd immediately stop IO the moment you lose a single OSD. min_size can be lowered to k, but that can cause data loss and corruption. You should set m=2 at a minimum. 4+2 doesn't take much more space than 4+1, and it's far safer. On