Hi,
osd_scrub_auto_repair still defaults to false and I was wondering how we
think about enabling this feature by default.
Would we say it's safe to enable this with BlueStore?
Wido
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.cep
On 08/24/2018 06:11 AM, Fyodor Ustinov wrote:
> Hi!
>
> I have fresh ceph cluster. 12 host and 3 osd on each host (one - hdd and two
> - ssd). Each host located in own rack.
>
> I make such crush configuration on fresh ceph installation:
>
>sudo ceph osd crush add-bucket R-26-3-1 rack
>
We recently upgrade to luminous (you can see the device-classes in the output).
So it should be possible to have one single root, no fake hosts and just use
the device-class.
We added some hosts/osds recently which back a new pools, so we also created a
new hierarchy and crush rules for those.
Hi!
I have fresh ceph cluster. 12 host and 3 osd on each host (one - hdd and two -
ssd). Each host located in own rack.
I make such crush configuration on fresh ceph installation:
sudo ceph osd crush add-bucket R-26-3-1 rack
sudo ceph osd crush add-bucket R-26-3-2 rack
sudo ceph osd cr
On Thu, Aug 23, 2018 at 7:47 PM Bryan Henderson
wrote:
> I've been reading MDS log code, and I have a question: why does it "probe
> for
> the end of the log" after reading the log header when starting up?
>
> As I understand it, the log header says the log had been written up to
> Location X ("w
I've been reading MDS log code, and I have a question: why does it "probe for
the end of the log" after reading the log header when starting up?
As I understand it, the log header says the log had been written up to
Location X ("write_pos") the last time the log was committed, but the
end-probe co
On Thu, Aug 23, 2018 at 4:11 PM Cody wrote:
> Hi everyone,
>
> As a newbie, I am grateful for receiving many helps from people on
> this mailing list. I would like to quickly test my understanding on
> 'step choose|chooseleaf' and wish you could point out any of my
> mistakes.
>
> Suppose I have
Thanks. Unfortunately even my version of hammer is too old on 0.94.5. I think
my only route to address this issue is to figure out the upgrade, at the very
least to 0.94.10. The biggest issue again is the deployment tool originally
used is set on 0.94.5 and pretty convoluted and no longer receiv
Thanks for the info. I was investigating bluestore as well. My host dont
go unresponsive but I do see parallel io slow down.
On Thu, Aug 23, 2018, 8:02 PM Andras Pataki
wrote:
> We are also running some fairly dense nodes with CentOS 7.4 and ran into
> similar problems. The nodes ran filestore
We are also running some fairly dense nodes with CentOS 7.4 and ran into
similar problems. The nodes ran filestore OSDs (Jewel, then Luminous).
Sometimes a node would be so unresponsive that one couldn't even ssh to
it (even though the root disk was a physically separate drive on a
separate c
Hi everyone,
As a newbie, I am grateful for receiving many helps from people on
this mailing list. I would like to quickly test my understanding on
'step choose|chooseleaf' and wish you could point out any of my
mistakes.
Suppose I have a following topology:
root=default
Great discussion!
On Thu, Aug 23, 2018 at 12:38 PM, Gregory Farnum wrote:
> And the recording from this session is now available at
> https://www.youtube.com/watch?v=0WHHTjdgarQ
>
> We didn't come to any conclusions but I think we've got a good
> understanding of the motivations and concerns, al
On 08/23/2018 02:52 PM, Robert Stanford wrote:
>
> I just installed a new luminous cluster. When I run this command:
> ceph mgr module enable dashboard
>
> I get this response:
> all mgr daemons do not support module 'dashboard'
>
> All daemons are Luminous (I confirmed this by runing ceph ver
I just installed a new luminous cluster. When I run this command:
ceph mgr module enable dashboard
I get this response:
all mgr daemons do not support module 'dashboard'
All daemons are Luminous (I confirmed this by runing ceph version).
Why would this error appear?
Thank you
R
_
Sending back, forgot the plain text for ceph-devel.
Sorry about that.
On Thu, Aug 23, 2018 at 9:57 PM Adrien Gillard wrote:
>
> We are running CentOS 7.5 with upstream Ceph packages, no remote syslog, just
> default local logging.
>
> After looking a bit deeper into pprof, --alloc_space seems t
We are running CentOS 7.5 with upstream Ceph packages, no remote syslog,
just default local logging.
After looking a bit deeper into pprof, --alloc_space seems to represent
allocations that happened since the program started which goes along with
the quick deallocation of the memory. --inuse_space
Yes I've reviewed all the logs from monitor and host. I am not
getting useful errors (or any) in dmesg or general messages.
I have 2 ceph clusters, the other cluster is 300 SSD and i never have
issues like this. That's why Im looking for help.
On Thu, Aug 23, 2018 at 3:22 PM Alex Gorbachev w
On Wed, Aug 22, 2018 at 11:39 PM Tyler Bishop
wrote:
>
> During high load testing I'm only seeing user and sys cpu load around 60%...
> my load doesn't seem to be anything crazy on the host and iowait stays
> between 6 and 10%. I have very good `ceph osd perf` numbers too.
>
> I am using 10.2.1
Did all that .. even tried to change port
Also selinux and firewalld are disabled
Thanks for taking the trouble to suggest something
Steven
On Thu, 23 Aug 2018 at 13:46, John Spray wrote:
> On Thu, Aug 23, 2018 at 5:18 PM Steven Vacaroaia wrote:
> >
> > Hi All,
> >
> > I am trying to enable p
On Thu, Aug 23, 2018 at 5:18 PM Steven Vacaroaia wrote:
>
> Hi All,
>
> I am trying to enable prometheus plugin with no success due to "no socket
> could be created"
>
> The instructions for enabling the plugin are very straightforward and simple
>
> Note
> My ultimate goal is to use Prometheus w
On Thu, Aug 23, 2018 at 10:21 AM Cody wrote:
> So, is it okay to say that compared to the 'firstn' mode, the 'indep'
> mode may have the least impact on a cluster in an event of OSD
> failure? Could I use 'indep' for replica pool as well?
>
You could, but shouldn't. Imagine if the primary OSD fa
I installed a new Ceph cluster with Luminous, after a long time working
with Jewel. I created my RGW pools the same as always (pool create
default.rgw.buckets.data etc.), but they don't show up in ceph df with
Luminous. Has the command changed?
Thanks
R
___
So, is it okay to say that compared to the 'firstn' mode, the 'indep'
mode may have the least impact on a cluster in an event of OSD
failure? Could I use 'indep' for replica pool as well?
Thank you!
Regards,
Cody
On Wed, Aug 22, 2018 at 7:12 PM Gregory Farnum wrote:
>
> On Wed, Aug 22, 2018 at 1
On Thu, Aug 23, 2018 at 8:42 AM Adrien Gillard
wrote:
> With a bit of profiling, it seems all the memory is allocated to
> ceph::logging::Log::create_entry (see below)
>
> Shoould this be normal ? Is it because some OSDs are down and it logs the
> results of its osd_ping ?
>
Hmm, is that where
On Thu, Aug 23, 2018 at 12:26 AM mj wrote:
> Hi,
>
> Thanks John and Gregory for your answers.
>
> Gregory's answer worries us. We thought that with a 3/2 pool, and one PG
> corrupted, the assumption would be: the two similar ones are correct,
> and the third one needs to be adjusted.
>
> Can we
On Thu, Aug 23, 2018 at 3:01 PM William Lawton
wrote:
>
> Hi John.
>
> Just picking up this thread again after coming back from leave. Our ceph
> storage project has progressed and we are now making sure that the active MON
> and MDS are kept on separate nodes which has helped reduce the inciden
On Thu, Aug 23, 2018 at 11:32 AM, Hervé Ballans
wrote:
> Le 23/08/2018 à 16:13, Alfredo Deza a écrit :
>
> What you mean is that, at this stage, I must directly declare the UUID paths
> in value of --block.db (i.e. replace /dev/nvme0n1p1 with its PARTUUID), that
> is ?
>
> No, this all looks corre
And the recording from this session is now available at
https://www.youtube.com/watch?v=0WHHTjdgarQ
We didn't come to any conclusions but I think we've got a good
understanding of the motivations and concerns, along with several
options and some of the constraints of each.
-Greg
On Tue, Aug 21, 2
Hi All,
I am trying to enable prometheus plugin with no success due to "no socket
could be created"
The instructions for enabling the plugin are very straightforward and
simple
Note
My ultimate goal is to use Prometheus with Cephmetrics
Some of you suggested to deploy ceph-exporter but why do we
With a bit of profiling, it seems all the memory is allocated to
ceph::logging::Log::create_entry (see below)
Shoould this be normal ? Is it because some OSDs are down and it logs the
results of its osd_ping ?
The debug level of the OSD is below also.
Thanks,
Adrien
$ pprof /usr/bin/ceph-osd
Le 23/08/2018 à 16:13, Alfredo Deza a écrit :
What you mean is that, at this stage, I must directly declare the UUID paths
in value of --block.db (i.e. replace /dev/nvme0n1p1 with its PARTUUID), that
is ?
No, this all looks correct. How does the ceph-volume.log and
ceph-volume-systemd.log look w
I really have hard time to grasp and manage to Delete a bucket that is freaking
me out!
Even though all s3 clients claim that the bucket is empty except 2 multi part
uploads that I am not able to get rid off.
radosgw-admin bucket check --bucket=whatever
[
"_multipart_DISK_P/collection_1/a
On Thu, Aug 23, 2018 at 10:56 AM sat wrote:
>
> Hi,
>
>
> I'm trying to make a one-way RBD mirroed cluster between two Ceph clusters.
> But it
> hasn't worked yet. It seems to sucecss, but after making an RBD image from
> local cluster,
> it's considered as "unknown".
>
> ```
> $ sudo rbd --clus
Hi,
I'm trying to make a one-way RBD mirroed cluster between two Ceph clusters. But
it
hasn't worked yet. It seems to sucecss, but after making an RBD image from
local cluster,
it's considered as "unknown".
```
$ sudo rbd --cluster local create rbd/local.img --size=1G
--image-feature=exclusiv
Hi All,
I think my problem was that I had quotas set at multiple levels of a
subtree, and maybe some were conflicting. (E.g. Parent said quota=1GB,
child said quota=200GB.) I could not reproduce the problem, but setting
quotas only on the user's subdirectory and not elsewhere along the way
After upgrading to luminous, we see the exact same behaviour, with OSDs
eating as much as 80/90 GB of memory.
We'll try some memory profiling but at this point we're a bit lost. Is
there any specific logs that could help us ?
On Thu, Aug 23, 2018 at 2:34 PM Adrien Gillard
wrote:
> Well after a
On Thu, Aug 23, 2018 at 9:56 AM, Hervé Ballans
wrote:
> Le 23/08/2018 à 15:20, Alfredo Deza a écrit :
>
> Thanks Alfredo for your reply. I'm using the very last version of Luminous
> (12.2.7) and ceph-deploy (2.0.1).
> I have no problem in creating my OSD, that's work perfectly.
> My issue only co
Hi John.
Just picking up this thread again after coming back from leave. Our ceph
storage project has progressed and we are now making sure that the active MON
and MDS are kept on separate nodes which has helped reduce the incidence of
delayed client reconnects on ceph node failure. We've also
Le 23/08/2018 à 15:20, Alfredo Deza a écrit :
Thanks Alfredo for your reply. I'm using the very last version of Luminous
(12.2.7) and ceph-deploy (2.0.1).
I have no problem in creating my OSD, that's work perfectly.
My issue only concerns the problem of the mount names of the NVMe partitions
whic
On Thu, Aug 23, 2018 at 9:12 AM, Hervé Ballans
wrote:
> Le 23/08/2018 à 12:51, Alfredo Deza a écrit :
>>
>> On Thu, Aug 23, 2018 at 5:42 AM, Hervé Ballans
>> wrote:
>>>
>>> Hello all,
>>>
>>> I would like to continue a thread that dates back to last May (sorry if
>>> this
>>> is not a good practi
Le 23/08/2018 à 12:51, Alfredo Deza a écrit :
On Thu, Aug 23, 2018 at 5:42 AM, Hervé Ballans
wrote:
Hello all,
I would like to continue a thread that dates back to last May (sorry if this
is not a good practice ?..)
Thanks David for your usefil tips on this thread.
In my side, I created my OS
Hello,
I've a Ceph cluster working on a subnet where clients on same subnet can
connect without problem, but now I need to connect some clients that are on
other subnet and I'm getting a connection timeout error.
Both subnets are connected and I've disabled the FW for testing if maybe is
blocker,
Hello,
when we started with ceph we wanted to mix different disk-types per host. Since
that was before device-classes were available we followed the advice to create
a multi root-hierachy and disk-type-specific hosts.
So currently the osd tree looks kind of like this
-8 218.21320 root
Well after a few hours, still nothing new in the behaviour. With half of
the OSDs (so 6 per host) up and peering and the nodown flag set to limit
the creation of new maps, all the memory is consumed and OSDs get killed by
OOM killer.
We observe a lot of threads being created for each OSDs (roughly
Hi Mark, others,
I took my info from following page:
https://ceph.com/geen-categorie/ceph-manually-repair-object/
where is written: "Of course the above works well when you have 3
replicas when it is easier for Ceph to compare two versions against
another one."
Based on that info, I assumed
On 23/08/2018 12:47, Ernesto Puerta wrote:
@Willem, given your comments come from a technical ground, let's
address those technically. As you say, dashboard_v2 is already in
Mimic and will be soon in Nautilus when released, so for FreeBSD the
issue will anyhow be there. Let's look for a technica
On Thu, Aug 23, 2018 at 5:42 AM, Hervé Ballans
wrote:
> Hello all,
>
> I would like to continue a thread that dates back to last May (sorry if this
> is not a good practice ?..)
>
> Thanks David for your usefil tips on this thread.
> In my side, I created my OSDs with ceph-deploy (in place of ceph
Thanks all for sharing your views, and thanks to Lenz & Kai for the
clarifications.
For those, like David, not familiar with dashboard_v2 (or even with
dashboard_v1), you may check this short clip
(https://youtu.be/m5i3x4eR6k4), which goes through the dashboard_v2 as
per this first backport (featu
On 23/08/2018 11:22, Lenz Grimmer wrote:
On 08/22/2018 08:57 PM, David Turner wrote:
My initial reaction to this PR/backport was questioning why such a
major update would happen on a dot release of Luminous. Your
reaction to keeping both dashboards viable goes to support that.
Should we really
On Thu, 2018-08-23 at 09:26 +0200, mj wrote:
> Gregory's answer worries us. We thought that with a 3/2 pool, and one
> PG
> corrupted, the assumption would be: the two similar ones are
> correct,
> and the third one needs to be adjusted.
>
> Can we determine from this output, if I created corrup
On 22/08/2018 19:42, Ernesto Puerta wrote:
Thanks for your feedback, Willem!
The old dashboard does not need any package fetch while
building/installing. Something that is not very handy when building
FreeBSD packages. And I haven't gotten around to determining how to > get
around that.
I th
Hi,
I think it does have positive effect on the messages. Cause I get fewer
messages than before.
that's nice. I also receive definitely less cache pressure messages
than before.
I also started to play around with the client side cache
configuration. I halved the client object cache size f
Hello all,
I would like to continue a thread that dates back to last May (sorry if
this is not a good practice ?..)
Thanks David for your usefil tips on this thread.
In my side, I created my OSDs with ceph-deploy (in place of ceph-volume)
[1], but this is exactly the same context as this ment
On 08/22/2018 08:57 PM, David Turner wrote:
> My initial reaction to this PR/backport was questioning why such a
> major update would happen on a dot release of Luminous. Your
> reaction to keeping both dashboards viable goes to support that.
> Should we really be backporting features into a dot
An hour ago host5 started to report the OSDs on host4 as down (still
no clue why), resulting in slow requests. This time no flapping
occured, the cluster recovered a couple of minutes later. No other
OSDs reported that, only those two on host5. There's nothing in the
logs of the reporting o
I need bucket without index for 5000 objects, how to properly create
a indexless bucket in next to indexed buckets? This is "default radosgw"
Luminous instance.
I was take a look to cli, as far as I understand I will need to create
placement rule via "zone placement add" and add this key t
Hi,
Thanks John and Gregory for your answers.
Gregory's answer worries us. We thought that with a 3/2 pool, and one PG
corrupted, the assumption would be: the two similar ones are correct,
and the third one needs to be adjusted.
Can we determine from this output, if I created corruption in o
57 matches
Mail list logo