Thanks.
I found that both minimum and active set are very large in my cluster, is it
expected?
By the way, I do snapshot for each image half an hour,and keep snapshots for
two days.
Journal status:
minimum_set: 671839
active_set: 1197917
registered clients:
[id=, commit_position=[positi
/From the balancer module's code for v 12.2.7 I noticed [1] these lines /which
reference [2] these 2 config options for upmap. You might try using
more max iterations or a smaller max deviation to see if you can get a
better balance in your cluster. I would try to start with [3] these
commands/va
On 11/6/18 6:03 AM, Hayashida, Mami wrote:
> WOW. With you two guiding me through every step, the 10 OSDs in
> question are now added back to the cluster as Bluestore disks!!! Here
> are my responses to the last email from Hector:
>
> 1. I first checked the permissions and they looked like this
>From what I observed, however, until I made that last change in the UDEV
rule, I simply could not get those OSDs started. I will try converting the
next 10 OSDs (osd.70-79) tomorrow, following all the steps you have shown
me in in this email thread, and will report back to you guys if/where I
en
On 2018/11/5 下午7:08, Mykola Golub wrote:
On Mon, Nov 05, 2018 at 06:14:09PM +0800, Dengke Du wrote:
-1 osd.0 20 class rbd open got (2) No such file or directory
So rbd cls was not loaded. Look at the directory, returned by this
command:
ceph-conf --name osd.0 -D | grep osd_class_dir
Yes
Gotcha. Yah I think we are going continue the scanning to build a new
metadata pool. I am making some progress on a script to extract files from
the data store. Just need to find the exact format of the xattr's and the
object hierarchy for large files. If I end up taking the script to the
finish li
With cppool you got bunch of useless zero-sized objects because unlike
"export", cppool does not copy omap data which actually holds all the inodes
info.
I suggest truncating journals only for an effort of reducing downtime followed
by immediate backup of available files to a fresh fs. After res
Workload is mixed.
We ran a rados cpool to backup the metadata pool.
So your thinking that truncating journal and purge queue (we are luminous)
with a reset could bring us online missing just data from that day. (most
when the issue started)
If so we could continue our scan into our recovery par
What was your recent workload? There are chances not to lose much if it was
mostly read ops. If such, you must backup your metadata pool via "rados export"
in order to preserve omap data, then try truncating journals (along with purge
queue if supported by your ceph version), wiping session tabl
That was our original plan. So we migrated to bigger disks and have space
but recover dentry uses up all our memory (128 GB) and crashes out.
On Mon, Nov 5, 2018 at 7:23 PM Sergey Malinin wrote:
> I had the same problem with multi-mds. I solved it by freeing up a little
> space on OSDs, doing "r
I had the same problem with multi-mds. I solved it by freeing up a little space
on OSDs, doing "recover dentries", truncating the journal, and then "fs reset".
After that I was able to revert to single-active MDS and kept on running for a
year until it failed on 13.2.2 upgrade :))
> On 6.11.20
Our metadata pool went from 700 MB to 1 TB in size in a few hours. Used all
space on OSD and now 2 ranks report damage. The recovery tools on the
journal fail as they run out of memory leaving us with the option of
truncating the journal and loosing data or recovering using the scan tools.
Any ide
What kind of damage have you had? Maybe it is worth trying to get MDS to start
and backup valuable data instead of doing long running recovery?
> On 6.11.2018, at 02:59, Rhian Resnick wrote:
>
> Sounds like I get to have some fun tonight.
>
> On Mon, Nov 5, 2018, 6:39 PM Sergey Malinin
Thanks both Matt and Eric,
That is really interesting. I do tend to use "mc" since it can handle
multiple keys readily (eg when a user reports a problem). It was
noticable that when getting the recursive listing of the "slow" bucket
using mc, the output did appear in a "chunked" manner, consis
inode linkage (i.e. folder hierarchy) and file names are stored in omap data of
objects in metadata pool. You can write a script that would traverse through
all the metadata pool to find out file names correspond to objects in data pool
and fetch required files via 'rados get' command.
> On 6.1
Yes, 'rados -h'.
> On 6.11.2018, at 02:25, Rhian Resnick wrote:
>
> Does a tool exist to recover files from a cephfs data partition? We are
> rebuilding metadata but have a user who needs data asap.
> ___
> ceph-users mailing list
> ceph-users@lists.
Does a tool exist to recover files from a cephfs data partition? We are
rebuilding metadata but have a user who needs data asap.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Using "noop" makes sense only with ssd/nvme drives. "noop" is a simple fifo and
using it with HDDs can result in unexpected blocking of useful IO in case when
the queue is poisoned with burst of IO requests like background purge, which
would become foreground in such case.
> On 5.11.2018, at 2
It depends on store backend. Bluestore has it's own scheduler which works
properly only with CFQ, while filestore configuration is narrowed to setting
OSD IO threads' scheduling class and priority just like using 'ionice' utility.
> On 5.11.2018, at 21:45, Bastiaan Visser wrote:
>
>
> There
Hi,
I just did some testing to confirm, and can report, with "mc ls -r"
appears to be definitely inducing latency related to Unix path
emulation.
Matt
On Mon, Nov 5, 2018 at 3:10 PM, J. Eric Ivancich wrote:
> I did make an inquiry and someone here does have some experience w/ the
> mc command -
On Mon, Nov 5, 2018 at 4:21 PM Hayashida, Mami wrote:
>
> Yes, I still have the volume log showing the activation process for ssd0/db60
> (and 61-69 as well). I will email it to you directly as an attachment.
In the logs, I see that ceph-volume does set the permissions correctly:
[2018-11-02
Correct, it's just the the ceph-kvstore-tool for Luminous doesn't have the
ability to migrate between them. It exists in Jewel 10.2.11 and in Mimic,
but it doesn't exist in Luminous. There's no structural difference in the
omap backend so I'm planning to just use a Mimic version of the tool to
up
Not sure I understand that, but starting Luminous, the filestore omap backend
is rocksdb by default.
From: David Turner
Date: Monday, November 5, 2018 at 3:25 PM
To: Pavan Rallabhandi
Cc: ceph-users
Subject: EXT: Re: [ceph-users] Any backfill in our cluster makes the cluster
unusable and take
Yes, I still have the volume log showing the activation process for
ssd0/db60 (and 61-69 as well). I will email it to you directly as an
attachment.
On Mon, Nov 5, 2018 at 4:14 PM, Alfredo Deza wrote:
> On Mon, Nov 5, 2018 at 4:04 PM Hayashida, Mami
> wrote:
> >
> > WOW. With you two guidin
On Mon, Nov 5, 2018 at 4:04 PM Hayashida, Mami wrote:
>
> WOW. With you two guiding me through every step, the 10 OSDs in question are
> now added back to the cluster as Bluestore disks!!! Here are my responses to
> the last email from Hector:
>
> 1. I first checked the permissions and they lo
WOW. With you two guiding me through every step, the 10 OSDs in question
are now added back to the cluster as Bluestore disks!!! Here are my
responses to the last email from Hector:
1. I first checked the permissions and they looked like this
root@osd1:/var/lib/ceph/osd/ceph-60# ls -l
total 56
We simply use the "noop" scheduler on our nand-based ceph cluster
On 11/05/2018 09:33 PM, solarflow99 wrote:
> I'm interested to know about this too.
>
>
> On Mon, Nov 5, 2018 at 10:45 AM Bastiaan Visser wrote:
>
>>
>> There are lots of rumors around about the benefit of changing
>> io-schedu
I'm interested to know about this too.
On Mon, Nov 5, 2018 at 10:45 AM Bastiaan Visser wrote:
>
> There are lots of rumors around about the benefit of changing
> io-schedulers for OSD disks.
> Even some benchmarks can be found, but they are all more than a few years
> old.
> Since ceph is movin
Digging into the code a little more, that functionality was added in
10.2.11 and 13.0.1, but it still isn't anywhere in the 12.x.x Luminous
version. That's so bizarre.
On Sat, Nov 3, 2018 at 11:56 AM Pavan Rallabhandi <
prallabha...@walmartlabs.com> wrote:
> Not exactly, this feature was support
I did make an inquiry and someone here does have some experience w/ the
mc command -- minio client. We're curious how "ls -r" is implemented
under mc. Does it need to get a full listing and then do some path
parsing to produce nice output? If so, it may be playing a role in the
delay as well.
Eric
On 11/6/18 3:31 AM, Hayashida, Mami wrote:
> 2018-11-05 12:47:01.075573 7f1f2775ae00 -1
> bluestore(/var/lib/ceph/osd/ceph-60) _open_db add block
> device(/var/lib/ceph/osd/ceph-60/block.db) returned: (13) Permission denied
Looks like the permissions on the block.db device are wrong. As far as
The numbers you're reporting strike me as surprising as well. Which version are
you running?
In case you're not aware, listing of buckets is not a very efficient operation
given that the listing is required to return with objects in lexical order.
They are distributed across the shards via a ha
On Sun, Nov 4, 2018 at 11:59 PM Wei Jin wrote:
>
> Hi, Jason,
>
> I have a question about rbd mirroring. When enable mirroring, we observed
> that there are a lot of objects prefix with journal_data, thus it consumes a
> lot of disk space.
>
> When will these journal objects be deleted? And are
On 11/6/18 3:21 AM, Alfredo Deza wrote:
> On Mon, Nov 5, 2018 at 11:51 AM Hector Martin wrote:
>>
>> Those units don't get triggered out of nowhere, there has to be a
>> partition table with magic GUIDs or a fstab or something to cause them
>> to be triggered. The better way should be to get rid o
There are lots of rumors around about the benefit of changing io-schedulers for
OSD disks.
Even some benchmarks can be found, but they are all more than a few years old.
Since ceph is moving forward with quite a pace, i am wondering what the common
practice is to use as io-scheduler on OSD's.
I already ran the "ceph-volume lvm activate --all " command right after I
prepared (using "lvm prepare") those OSDs. Do I need to run the "activate"
command again?
On Mon, Nov 5, 2018 at 1:24 PM, Alfredo Deza wrote:
> On Mon, Nov 5, 2018 at 12:54 PM Hayashida, Mami
> wrote:
> >
> > I commen
On Mon, Nov 5, 2018 at 12:54 PM Hayashida, Mami wrote:
>
> I commented out those lines and, yes, I was able to restart the system and
> all the Filestore OSDs are now running. But when I cannot start converted
> Bluestore OSDs (service). When I look up the log for osd.60, this is what I
> se
On Mon, Nov 5, 2018 at 11:51 AM Hector Martin wrote:
>
> Those units don't get triggered out of nowhere, there has to be a
> partition table with magic GUIDs or a fstab or something to cause them
> to be triggered. The better way should be to get rid of that instead of
> overriding the ceph-disk s
scan_extents should not saturate links much because it doesn't read entire
objects but only performs rados' stat() call on them which returns several
bytes of data.
You can learn about the scan progress by monitoring pool stats via 'rados df'
or daemon metrics.
> On 5.11.2018, at 20:02, Rhian
Hi Zachary,
Thanks for contributing this mirror to the community! It has now been added:
https://ceph.com/get/
On Tue, Oct 30, 2018 at 8:30 AM Zachary Muller
wrote:
>
> Hi all,
>
> We are GigeNET, a datacenter based in Arlington Heights, IL (close to
> Chicago). We are starting to mirror ceph a
+ ceph-users
-- Forwarded message --
From: Neha Ojha
Date: Mon, Nov 5, 2018 at 9:50 AM
Subject: pg log hard limit upgrade bug
To: Ceph Development
Cc: Nathan Cutler , Yuri Weinstein
, Josh Durgin
Hi All,
We have discovered an issue with the pg log hard limit
patches(https://
I commented out those lines and, yes, I was able to restart the system and
all the Filestore OSDs are now running. But when I cannot start converted
Bluestore OSDs (service). When I look up the log for osd.60, this is what
I see:
2018-11-05 12:47:00.756794 7f1f2775ae00 0 set uid:gid to 64045:6
On 11/6/18 2:01 AM, Hayashida, Mami wrote:
> I did find in /etc/fstab entries like this for those 10 disks
>
> /dev/sdh1 /var/lib/ceph/osd/ceph-60 xfs noatime,nodiratime 0 0
>
> Should I comment all 10 of them out (for osd.{60-69}) and try rebooting
> again?
Yes. Anything that references any
What type of bandwidth did you see during the recovery process? We are
seeing around 2 Mbps on each box running 20 processes each.
On Mon, Nov 5, 2018 at 11:31 AM Sergey Malinin wrote:
> Although I was advised not to use caching during recovery, I didn't notice
> any improvements after disabling
I did find in /etc/fstab entries like this for those 10 disks
/dev/sdh1 /var/lib/ceph/osd/ceph-60 xfs noatime,nodiratime 0 0
Should I comment all 10 of them out (for osd.{60-69}) and try rebooting
again?
On Mon, Nov 5, 2018 at 11:54 AM, Hayashida, Mami
wrote:
> I was just going to write tha
I was just going to write that the "ln" command did not solve the problem.
When I rebooted the node, it again went into an emergency mode and I got
exactly the same errors (systemd[1]: Timed out waiting for device
dev-sdh1.device.;-- Subject: Unit dev-sdh1.device has failed...). I will
look into /
Those units don't get triggered out of nowhere, there has to be a
partition table with magic GUIDs or a fstab or something to cause them
to be triggered. The better way should be to get rid of that instead of
overriding the ceph-disk service instances, I think.
Given dev-sdh1.device is trying to s
Alright. Thanks -- I will try this now.
On Mon, Nov 5, 2018 at 11:36 AM, Alfredo Deza wrote:
> On Mon, Nov 5, 2018 at 11:33 AM Hayashida, Mami
> wrote:
> >
> > But I still have 50 other Filestore OSDs on the same node, though.
> Wouldn't doing it all at once (by not identifying the osd-id) be
On Mon, Nov 5, 2018 at 11:33 AM Hayashida, Mami wrote:
>
> But I still have 50 other Filestore OSDs on the same node, though. Wouldn't
> doing it all at once (by not identifying the osd-id) be a problem for those?
> I have not migrated data out of those 50 OSDs yet.
Sure, like I said, if you
But I still have 50 other Filestore OSDs on the same node, though.
Wouldn't doing it all at once (by not identifying the osd-id) be a problem
for those? I have not migrated data out of those 50 OSDs yet.
On Mon, Nov 5, 2018 at 11:31 AM, Alfredo Deza wrote:
> On Mon, Nov 5, 2018 at 11:24 AM Haya
Although I was advised not to use caching during recovery, I didn't notice any
improvements after disabling it.
> On 5.11.2018, at 17:32, Rhian Resnick wrote:
>
> We are running cephfs-data-scan to rebuild metadata. Would changing the cache
> tier mode of our cephfs data partition improve per
On Mon, Nov 5, 2018 at 11:24 AM Hayashida, Mami wrote:
>
> Thank you for all of your replies. Just to clarify...
>
> 1. Hector: I did unmount the file system if what you meant was unmounting
> the /var/lib/ceph/osd/ceph-$osd-id for those disks (in my case osd.60-69)
> before running the ceph-
Thank you for all of your replies. Just to clarify...
1. Hector: I did unmount the file system if what you meant was unmounting
the /var/lib/ceph/osd/ceph-$osd-id for those disks (in my case osd.60-69)
before running the ceph-volume lvm zap command
2. Alfredo: so I can at this point run the "l
On Mon, Nov 5, 2018 at 10:43 AM Hayashida, Mami wrote:
>
> Additional info -- I know that /var/lib/ceph/osd/ceph-{60..69} are not
> mounted at this point (i.e. mount | grep ceph-60, and 61-69, returns
> nothing.). They don't show up when I run "df", either.
>
> On Mon, Nov 5, 2018 at 10:15 AM,
On 11/6/18 1:08 AM, Hector Martin wrote:
> On 11/6/18 12:42 AM, Hayashida, Mami wrote:
>> Additional info -- I know that /var/lib/ceph/osd/ceph-{60..69} are not
>> mounted at this point (i.e. mount | grep ceph-60, and 61-69, returns
>> nothing.). They don't show up when I run "df", either.
> This
On 11/6/18 12:42 AM, Hayashida, Mami wrote:
> Additional info -- I know that /var/lib/ceph/osd/ceph-{60..69} are not
> mounted at this point (i.e. mount | grep ceph-60, and 61-69, returns
> nothing.). They don't show up when I run "df", either.
This is expected. ceph-volume with BlueStore does no
On Mon, 5 Nov 2018, 21:13 Hayashida, Mami, wrote:
> Additional info -- I know that /var/lib/ceph/osd/ceph-{60..69} are not
> mounted at this point (i.e. mount | grep ceph-60, and 61-69, returns
> nothing.). They don't show up when I run "df", either.
>
ceph-volume command automatically mount ce
Additional info -- I know that /var/lib/ceph/osd/ceph-{60..69} are not
mounted at this point (i.e. mount | grep ceph-60, and 61-69, returns
nothing.). They don't show up when I run "df", either.
On Mon, Nov 5, 2018 at 10:15 AM, Hayashida, Mami
wrote:
> Well, over the weekend the whole server w
Well, over the weekend the whole server went down and is now in the
emergency mode. (I am running Ubuntu 16.04). When I run "journalctl -p
err -xb" I see that
systemd[1]: Timed out waiting for device dev-sdh1.device.
-- Subject: Unit dev-sdh1.device has failed
-- Defined-By: systemd
-- Support
On Fri, Nov 2, 2018 at 5:04 PM Hayashida, Mami wrote:
>
> I followed all the steps Hector suggested, and almost everything seems to
> have worked fine. I say "almost" because one out of the 10 osds I was
> migrating could not be activated even though everything up to that point
> worked just a
We are running cephfs-data-scan to rebuild metadata. Would changing the
cache tier mode of our cephfs data partition improve performance? If so
what should we switch to?
Thanks
Rhian
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.cep
On Mon, Nov 05, 2018 at 06:14:09PM +0800, Dengke Du wrote:
> -1 osd.0 20 class rbd open got (2) No such file or directory
So rbd cls was not loaded. Look at the directory, returned by this
command:
ceph-conf --name osd.0 -D | grep osd_class_dir
if it contains libcls_rbd.so. And if the list re
On Mon, Nov 5, 2018 at 9:45 AM Erwin Bogaard wrote:
>
> Hi,
>
>
>
> Is there any way to determine the activity per cephfs client?
>
> For example, is there a way to get the requests/sec, bytes/sec,
> connections/sec, or any other relevant performance parameters per client?
Currently, you're limi
On Sun, Nov 4, 2018 at 10:24 PM Bryan Henderson wrote:
>
> >OSD write errors are not usual events: any issues with the underlying
> >storage are expected to be handled by RADOS, and write operations to
> >an unhealthy cluster should block, rather than returning an error. It
> >would not be correc
On 2018/11/5 下午5:15, Mykola Golub wrote:
On Mon, Nov 05, 2018 at 03:19:29PM +0800, Dengke Du wrote:
Hi all
ceph: 13.2.2
When run command:
rbd create libvirt-pool/dimage --size 10240
Error happen:
rbd: create error: 2018-11-04 23:54:56.224 7ff22e7fc700 -1
librbd::image::CreateRequ
Hi,
Is there any way to determine the activity per cephfs client?
For example, is there a way to get the requests/sec, bytes/sec,
connections/sec, or any other relevant performance parameters per client?
As ceph is multi host/multi client, it’s hard to trace all activity of all
clients on al
On Sat, Nov 3, 2018 at 10:41 AM wrote:
>
> Hi.
>
> I tried to enable the "new smart balancing" - backend are on RH luminous
> clients are Ubuntu 4.15 kernel.
>
> As per: http://docs.ceph.com/docs/mimic/rados/operations/upmap/
> $ sudo ceph osd set-require-min-compat-client luminous
> Error EPERM:
On Mon, Nov 05, 2018 at 03:19:29PM +0800, Dengke Du wrote:
> Hi all
>
> ceph: 13.2.2
>
> When run command:
>
> rbd create libvirt-pool/dimage --size 10240
>
> Error happen:
>
> rbd: create error: 2018-11-04 23:54:56.224 7ff22e7fc700 -1
> librbd::image::CreateRequest: 0x55e4fc8bf620 han
68 matches
Mail list logo