[ceph-users] Re: Elasticsearch Sync module bug ?

2020-05-11 Thread Cervigni, Luca (Pawsey, Kensington WA)
Thanks for the reply I moved the 6.8.8 and the JSON parsing error seems gone. I am trying to do a simple request though I have very weird results: --- Request URL = https://xxx.xxx.xxx.xxx:/test2?query=name%3D%3Dfile-020 {'Host': 'xxx.xxx.xxx', 'Content-length': '0', 'X-Amz-Content-SHA256':

[ceph-users] data increase after multisite syncing

2020-05-11 Thread Zhenshi Zhou
Hi, I deployed a multisite in order to sync data from a mimic cluster zone to a nautilus cluster zone. The data sync well at present. However, I check the cluster status and I find something strange. The data in my new cluster seems larger than that in old ones. The data is far from full synced

[ceph-users] Need help on cache tier monitoring

2020-05-11 Thread icy chan
Hi, I had configured a cache tier with below parameters: cache_target_dirty_ratio: 0.1 cache_target_dirty_high_ratio: 0.7 cache_target_full_ratio: 0.9 The cache tier did improved the performance much. And I am targeted to keep the cache tier with only 10% of dirty data. Remaining data ( 80% )

[ceph-users] nfs migrate to rgw

2020-05-11 Thread Zhenshi Zhou
Hi all, We have several nfs servers providing file storage. There is a nginx in front of nfs servers in order to serve the clients. The files are mostly small files and nearly about 30TB in total. I'm gonna use ceph rgw as the storage. I wanna know if it's appropriate to do so. The data

[ceph-users] Re: Yet another meltdown starting

2020-05-11 Thread Frank Schilder
For assessing the criticality of the MGR beacon loop of doom outage, during my somewhat desperate attempts to get this under control, I saw this here: - [root@gnosis ~]# ceph status cluster: id: --- health: HEALTH_WARN no active mgr

[ceph-users] Re: Yet another meltdown starting

2020-05-11 Thread Frank Schilder
For everyone who does not want to read the details below: I run now with (dramatically?) increased beacon grace periods for OSD (3600s) and MGR (90s) beacons and am wondering what the downside of this is and if there are better tuning parameters for my issues. --- Hi Lenz, I'm wondering

[ceph-users] 1 pg unknown (from cephfs data pool)

2020-05-11 Thread Marc Roos
I had a 1x replicated fs data test pool, when a osd died I had '1 pg stale+active+clean of cephfs'[1] after a cluster reboot this turned into '1 pg unknown' ceph pg repair did not fix anything (for stale and unknown state) I recreated the pg with: ceph osd force-create-pg pg.id

[ceph-users] Re: MDS_CACHE_OVERSIZED warning

2020-05-11 Thread Patrick Donnelly
Hello Jesper, On Thu, Apr 16, 2020 at 4:06 AM wrote: > > Hi. > > I have a cluster that has been running for close to 2 years now - pretty > much with the same setting, but over the past day I'm seeing this warning. > > (and the cache seem to keep growing) - Can I figure out which clients is >

[ceph-users] Recover datas from pg incomplete

2020-05-11 Thread Francois Legrand
Hi, After a major crash in which we lost few osds, we are stucked with incomplete pgs. At first, peering was blocked with peering_blocked_by_history_les_bound. Thus we set osd_find_best_info_ignore_history_les true for all osds involved in the pg and set the primary osd down to force

[ceph-users] Re: Yet another meltdown starting

2020-05-11 Thread Frank Schilder
OK, the command finally executed and it looks like the cluster is running stable for now. However, I'm afraid that 90s might not be sustainable. Questions: Can I leave the beacon_grace at 90s? Is there a better parameter to set? Why is the MGR getting overloaded on a rather small cluster with

[ceph-users] Yet another meltdown starting

2020-05-11 Thread Frank Schilder
Hi all, another client-load induced meltdown. It is just starting and I hope we get it under control. This time, its the MGRs failing under the load. It looks like thay don't manage to get their beacons to the mons and are kicked out as unresponsive. However, the processes are fine and up. Its

[ceph-users] Unable to reshard bucket

2020-05-11 Thread Timothy Geier
Hello all, I'm having an issue with a bucket that refuses to be resharded..for the record, the cluster was recently upgraded from 13.2.4 to 13.2.10. # radosgw-admin reshard add --bucket foo --num-shards 3300 ERROR: the bucket is currently undergoing resharding and cannot be added to the

[ceph-users] Erasure coded pool queries

2020-05-11 Thread Biswajeet Patra
Hi, I have created an erasure coded pool and the below default parameters related to stripe sizes are present. "osd_pool_erasure_code_stripe_width": "4096" --> 4KB "rgw_obj_stripe_size": "4194304" --> 4MB Let say the k+m values are 10+5 for the erasure pool, and we upload an object of let say,

[ceph-users] Re: Yet another meltdown starting

2020-05-11 Thread Lenz Grimmer
Hi Frank, On 5/11/20 3:03 PM, Frank Schilder wrote: > OK, the command finally executed and it looks like the cluster is > running stable for now. However, I'm afraid that 90s might not be > sustainable. > > Questions: Can I leave the beacon_grace at 90s? Is there a better > parameter to set?

[ceph-users] Re: Cluster network and public network

2020-05-11 Thread Frank Schilder
Hi Anthony and Phil, since my meltdown case was mentioned and I might have a network capacity issue, here a question about why having separate VLANS for private and public network might have its merits: In our part of the ceph cluster that was overloaded (our cluster has 2 sites logically

[ceph-users] Re: Write Caching to hot tier not working as expected

2020-05-11 Thread Steve Hughes
Thank you Eric. That 'sounds like' exactly my issue. Though I'm surprised to bump into something like that on such a small system and at such low bandwidth. But the information I can find on those parameters is sketchy to say the least. Can you point me at some doco that explains what they

[ceph-users] Re: Write Caching to hot tier not working as expected

2020-05-11 Thread steveh
Interestingly, I have found that if I limit the rate at which data is written the tiering behaves as expected. I'm using a robocopy job from a Windows VM to copy large files from my existing storage array to a test Ceph volume. By using the /IPG parameter I can roughly control the rate at

[ceph-users] Re: adding block.db to OSD

2020-05-11 Thread Stefan Priebe - Profihost AG
Am 11.05.20 um 13:25 schrieb Igor Fedotov: > Hi Stefan, > > I don't have specific preferences, hence any public storage you prefer. > > Just one note - I presume you collected the logs for the full set of 10 > runs. Which is redundant, could you please collect detailed logs (one > per OSD) for

[ceph-users] Re: adding block.db to OSD

2020-05-11 Thread Igor Fedotov
Hi Stefan, I don't have specific preferences, hence any public storage you prefer. Just one note - I presume you collected the logs for the full set of 10 runs. Which is redundant, could you please collect detailed logs (one per OSD) for single shot runs. Sorry for the unclear previous

[ceph-users] Write Caching to hot tier not working as expected

2020-05-11 Thread steveh
Hi all, I'm a newbie to Ceph. I'm an MSP and a small-scale cloud hoster. I'm intending to use Ceph as production storage for a small-scale private hosting cloud. We run ESXi as our HVs so we want to present Ceph as iSCSI. We've got Ceph Nautilus running on a 3-node cluster. Each node

[ceph-users] ceph-volume/batch fails in non-interactive mode

2020-05-11 Thread MichaƂ Nasiadka
Hello, I stumbled across that weird behaviour today - after initially creating disks and DB LVs on NVMe (using ceph-ansible) - second run of ceph-volume ends up with an error: --> RuntimeError: 4 devices were filtered in non-interactive mode, bailing out What is interesting - the same command

[ceph-users] radosgw Swift bulk upload

2020-05-11 Thread Martin Zurowietz
Hi, I'm interested in the Swift extract archive/bulk upload feature [1]. First I thought that this feature is not implemented in radosgw, as it is not mentioned anywhere in the docs [2] and some test requests of mine were not successful. However, while browsing the issue tracker, I found some

[ceph-users] Re: Cluster network and public network

2020-05-11 Thread Wido den Hollander
On 5/8/20 12:13 PM, Willi Schiegel wrote: > Hello Nghia, > > I once asked a similar question about network architecture and got the > same answer as Martin wrote from Wido den Hollander: > > There is no need to have a public and cluster network with Ceph. Working > as a Ceph consultant I've

[ceph-users] Re: adding block.db to OSD

2020-05-11 Thread Stefan Priebe - Profihost AG
Hi Igor, where to post the logs? Am 06.05.20 um 09:23 schrieb Stefan Priebe - Profihost AG: > Hi Igor, > > Am 05.05.20 um 16:10 schrieb Igor Fedotov: >> Hi Stefan, >> >> so (surprise!) some DB access counters show a significant difference, e.g. >> >> "kv_flush_lat": { >>