Hello
I have a little ceph cluster with 3 nodes, each with 3x1TB HDD and
1x240GB SSD. I created this cluster after Luminous release, so all OSDs
are Bluestore. In my crush map I have two rules, one targeting the SSDs
and one targeting the HDDs. I have 4 pools, one using the SSD rule and
the o
Hi,
might be http://tracker.ceph.com/issues/22464
Can you check the OSD log file to see if the reported checksum is 0x6706be76?
Paul
> Am 28.02.2018 um 11:43 schrieb Marco Baldini - H.S. Amiata
> :
>
> Hello
>
> I have a little ceph cluster with 3 nodes, each with 3x1TB HDD and 1x240GB
> S
Hi
I read the bugtracker issue and it seems a lot like my problem, even if
I can't check the reported checksum because I don't have it in my logs,
perhaps it's because of debug osd = 0/0 in ceph.conf
I just raised the OSD log level
ceph tell osd.* injectargs --debug-osd 5/5
I'll check OSD l
Hi
After some days with debug_osd 5/5 I found [ERR] in different days,
different PGs, different OSDs, different hosts. This is what I get in
the OSD logs:
*OSD.5 (host 3)*
2018-03-01 20:30:02.702269 7fdf4d515700 2 osd.5 pg_epoch: 16486 pg[9.1c( v
16486'51798 (16431'50251,16486'51798] local-
Hi,
yeah, the cluster that I'm seeing this on also has only one host that
reports that specific checksum. Two other hosts only report the same error
that you are seeing.
Could you post to the tracker issue that you are also seeing this?
Paul
2018-03-05 12:21 GMT+01:00 Marco Baldini - H.S. Amiat
> candidate had a read error
speaks for itself - while scrubbing it coudn't read data.
I had similar issue, and it was just OSD dying - errors and relocated
sectors in SMART, just replaced the disk. But in your case it seems that
errors are on different OSDs? Are your OSDs all healthy?
You can use
Hi
I just posted in the ceph tracker with my logs and my issue
Let's hope this will be fixed
Thanks
Il 05/03/2018 13:36, Paul Emmerich ha scritto:
Hi,
yeah, the cluster that I'm seeing this on also has only one host that
reports that specific checksum. Two other hosts only report the same
Hi and thanks for reply
The OSDs are all healthy, in fact after a ceph pg repair the ceph
health is back to OK and in the OSD log I see repair ok, 0 fixed
The SMART data of the 3 OSDs seems fine
*OSD.5*
# ceph-disk list | grep osd.5
/dev/sdd1 ceph data, active, cluster ceph, osd.5, block
> always solved by ceph pg repair
That doesn't necessarily means that there's no hardware issue. In my case
repair also worked fine and returned cluster to OK state every time, but in
time faulty disk fail another scrub operation, and this repeated multiple
times before we replaced that disk.
One
Hi
I monitor dmesg in each of the 3 nodes, no hardware issue reported. And
the problem happens with various different OSDs in different nodes, for
me it is clear it's not an hardware problem.
Thanks for reply
Il 05/03/2018 21:45, Vladimir Prokofev ha scritto:
> always solved by ceph pg re
On Tue, Mar 6, 2018 at 5:26 PM, Marco Baldini - H.S. Amiata <
mbald...@hsamiata.it> wrote:
> Hi
>
> I monitor dmesg in each of the 3 nodes, no hardware issue reported. And
> the problem happens with various different OSDs in different nodes, for me
> it is clear it's not an hardware problem.
>
If
debug_osd that is... :)
On Tue, Mar 6, 2018 at 7:10 PM, Brad Hubbard wrote:
>
>
> On Tue, Mar 6, 2018 at 5:26 PM, Marco Baldini - H.S. Amiata <
> mbald...@hsamiata.it> wrote:
>
>> Hi
>>
>> I monitor dmesg in each of the 3 nodes, no hardware issue reported. And
>> the problem happens with various
12 matches
Mail list logo