Hi Christian, Hi Robert,
thank you for your replies!
I was already expecting something like this. But I am seriously worried
about that!
Just assume that this is happening at night. Our shift has not
necessarily enough knowledge to perform all the steps in Sebasien's
article. And if we always hav
Hello,
Could you have another EC run with differing block sizes like described
here:
http://lists.opennebula.org/pipermail/ceph-users-ceph.com/2014-October/043949.html
and look for write amplification?
I'd suspect that by the very nature of EC and the addition local checksums
it (potentially) wr
Hello,
I can only nod emphatically to what Robert said, don't issue repairs
unless you
a) don't care about the data or
b) have verified that your primary OSD is good.
See this for some details on how establish which replica(s) are actually
good or not:
http://www.sebastien-han.fr/blog/2015/04/
Patrick,
At the moment, you do not have any problems related to the slow query.
2015-05-12 8:56 GMT+03:00 Patrik Plank :
> So ok, understand.
>
> But what can I do if the scrubbing process hangs by one page since last
> night:
>
>
> root@ceph01:~# ceph health detail
> HEALTH_OK
>
> root@ceph01:~
So ok, understand.
But what can I do if the scrubbing process hangs by one page since last night:
root@ceph01:~# ceph health detail
HEALTH_OK
root@ceph01:~# ceph pg dump | grep scrub
pg_stat objects mip degr misp unf bytes log disklog
state state_stamp v r
Scrubbing greatly affects the I / O and can slow queries on OSD. For more
information, look in the 'ceph health detail' and 'ceph pg dump | grep
scrub'
2015-05-12 8:42 GMT+03:00 Patrik Plank :
> Hi,
>
>
> is that the reason for the Health Warn or the scrubbing notification?
>
>
>
> thanks
>
> re
Hi, Patrik.
You must configure the priority of the I / O for scrubbing.
http://dachary.org/?p=3268
2015-05-12 8:03 GMT+03:00 Patrik Plank :
> Hi,
>
>
> the ceph cluster shows always the scrubbing notifications, although he do
> not scrub.
>
> And what does the "Health Warn" mean.
>
> Does an
Hi,
the ceph cluster shows always the scrubbing notifications, although he do not
scrub.
And what does the "Health Warn" mean.
Does anybody have an idea why the warning is displayed.
How can I solve this?
cluster 78227661-3a1b-4e56-addc-c2a272933ac2
health HEALTH_WARN 6 requests are
Greetings,
We have been testing a full SSD Ceph cluster for a few weeks now and still
testing. One of the outcome(We will post a full report on our test soon but
for now this email will only be for replicas) is that as soon as you put more
than 1 copy of the cluster, it kills the performance b
It's the wip-rgw-orphans branch.
- Original Message -
> From: "Daniel Hoffman"
> To: "Yehuda Sadeh-Weinraub"
> Cc: "Ben" , "David Zafman" ,
> "ceph-users"
> Sent: Monday, May 11, 2015 4:30:11 PM
> Subject: Re: [ceph-users] Shadow Files
>
> Thanks.
>
> Can you please let me know the s
Thanks.
Can you please let me know the suitable/best git version/tree to be pulling
to compile and use this feature/patch?
Thanks
On Tue, May 12, 2015 at 4:38 AM, Yehuda Sadeh-Weinraub
wrote:
>
>
> --
>
> *From: *"Daniel Hoffman"
> *To: *"Yehuda Sadeh-Weinraub"
>
Agree that 99+% of the inconsistent PG's I see correlate directly to disk flern.
Check /var/log/kern.log*, /var/log/messages*, etc. and I'll bet you find errors
correlating.
-- Anthony
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lis
On Fri, May 8, 2015 at 1:34 AM, Yan, Zheng wrote:
> On Fri, May 8, 2015 at 11:15 AM, Dexter Xiong wrote:
>> I tried "echo 3 > /proc/sys/vm/drop_caches" and dentry_pinned_count dropped.
>>
>> Thanks for your help.
>>
>
> could you please try the attached patch
I haven't followed the whole convers
On Mon, May 11, 2015 at 1:57 AM, Kenneth Waegeman
wrote:
> Hi all,
>
> I have a few questions about ceph-fuse options:
> - Is the fuse writeback cache being used? How can we see this? Can it be
> turned on with allow_wbcache somehow?
I'm not quite sure what you mean here. ceph-fuse does maintain
Fellow Cephers,
I'm scratching my head on this one. Somehow a bunch of objects were lost in
my cluster, which is currently ceph version 0.87.1
(283c2e7cfa2457799f534744d7d549f83ea1335e).
The symptoms are that "ceph -s" reports a bunch of inconsistent PGs:
cluster 8a2c9e43-9f17-42e0-92fd-88a4
Thanks Loic..
<< inline
Regards
Somnath
-Original Message-
From: Loic Dachary [mailto:l...@dachary.org]
Sent: Monday, May 11, 2015 3:02 PM
To: Somnath Roy
Cc: ceph-users@lists.ceph.com; Ceph Development
Subject: Re: EC backend benchmark
Hi,
[Sorry I missed the body of your questions, here
Hi,
[Sorry I missed the body of your questions, here is my answer ;-]
On 11/05/2015 23:13, Somnath Roy wrote:> Summary :
>
> -
>
>
>
> 1. It is doing pretty good in Reads and 4 Rados Bench clients are saturating
> 40 GB network. With more physical server, it is scaling almost lin
I had an issue with my calamari server, so I built a new one from scratch.
I¹ve been struggling trying to get the new server to start up and see my
ceph cluster. I went so far as to remove salt and diamond from my ceph
nodes and reinstalled again. On my calamari server, it sees the hosts
connect
Loic,
I thought this one didn't go through !
I have sent another mail with attached doc.
This is the data with rados bench .
In case you missed it, could you please share your thoughts on the questions I
posted (way below in the mail, not sure how so many space came along!!) below ?
Thanks & Rega
Hi,
Thanks for sharing :-) Have you published the tools that you used to gather
these results ? It would be great to have a way to reproduce the same measures
in different contexts.
Cheers
On 11/05/2015 23:13, Somnath Roy wrote:
>
>
> Hi Loic and community,
>
>
>
> I have gathered the f
Hi Loic and community,
I have gathered the following data on EC backend (all flash). I have decided to
use Jerasure since space saving is the utmost priority.
Setup:
41 OSDs (each on 8 TB flash), 5 node Ceph cluster. 48 core HT enabled cpu/64 GB
RAM. Tested with Rados Bench clients.
R
We are still laying the foundations for eventual VMware integration and
indeed the Red Hat acquisition has made this more real now.
The first step is iSCSI support and work is ongoing in the kernel to get HA
iSCSI working with LIO and kRBD. See the blueprint and CDS sessions with
Mike Christie for
- Original Message -
> From: "Daniel Hoffman"
> To: "Yehuda Sadeh-Weinraub"
> Cc: "Ben" , "ceph-users"
> Sent: Sunday, May 10, 2015 5:03:22 PM
> Subject: Re: [ceph-users] Shadow Files
> Any updates on when this is going to be released?
> Daniel
> On Wed, May 6, 2015 at 3:51 AM, Yehud
Thanks for the help! We've lowered the number of PGs per pool to 64, so
with 12 pools and a replica count of 3, all 3 OSDs have a full 768 PGs.
If anyone has any concerns or objections (particularly folks from the
Ceph/Redhat team), please let me know.
Thanks again!
On Fri, May 8, 2015 at 1:21 P
- Original Message -
> From: "Daniel Hoffman"
> To: "ceph-users"
> Sent: Sunday, May 10, 2015 10:54:21 PM
> Subject: [ceph-users] civetweb lockups
> Hi All.
> We have a wierd issue where civetweb just locks up, it just fails to respond
> to HTTP and a restart resolves the problem. This
Did not work.
$ ls -l /usr/lib64/|grep liburcu-bp
lrwxrwxrwx 1 root root 19 May 10 05:27 liburcu-bp.so ->
liburcu-bp.so.2.0.0
lrwxrwxrwx 1 root root 19 May 10 05:26 liburcu-bp.so.2 ->
liburcu-bp.so.2.0.0
-rwxr-xr-x 1 root root32112 Feb 25 20:27 liburcu-bp.so.2.0.0
Can you point
Hi,
I'm currently doing benchmark too, and I don't see this behavior
>>I get very nice performance of up to 200k IOPS. However once the volume is
>>written to (ie when I map it using rbd map and dd whole volume with some
>>random data),
>>and repeat the benchmark, random performance drops to ~23k
Personally I would not just run this command automatically because as you
stated, it only copies the primary PGs to the replicas and if the primary
is corrupt, you will corrupt your secondaries.I think the monitor log shows
which OSD has the problem so if it is not your primary, then just issue the
On 05/05/2015 04:13 AM, Yujian Peng wrote:
Emmanuel Florac writes:
Le Mon, 4 May 2015 07:00:32 + (UTC)
Yujian Peng 126.com> écrivait:
I'm encountering a data disaster. I have a ceph cluster with 145 osd.
The data center had a power problem yesterday, and all of the ceph
nodes were down.
Under the OSD directory, you can look where the symlink points. This is
generally called ‘journal’, it should point to a device.
> On 06 May 2015, at 06:54, Patrik Plank wrote:
>
> Hi,
>
> i cant remember on which drive I install which OSD journal :-||
> Is there any command to show this?
>
>
If you use ceph-disk (and I believe ceph-depoly) to create your OSDs, or
you go through the manual steps to set up the partition UUIDs, then yes
udev and the init script will do all the magic. Your disks can be moved to
another box without problems. I've moved disks to different ports on
controller
I had the same problem when doing benchmarks with small block sizes (<8k) to
RBDs. These settings seemed to fix the problem for me.
sudo ceph tell osd.* injectargs '--filestore_merge_threshold 40'
sudo ceph tell osd.* injectargs '--filestore_split_multiple 8'
After you apply the settings give it
Hi Robert,
just to make sure I got it correctly:
Do you mean that the /etc/mtab entries are completely ignored and no
matter what the order
of the /dev/sdX device is Ceph will just mount correctly the osd/ceph-X
by default?
In addition, assuming that an OSD node fails for a reason other than
On 09/05/2015 00:55, Joao Eduardo Luis wrote:
A command being DEPRECATED must be:
- clearly marked as DEPRECATED in usage;
- kept around for at least 2 major releases;
- kept compatible for the duration of the deprecation period.
Once two major releases go by, the command will then ente
>>I tries searching on internet and could not find a el7 package with
>>liburcu-bp.la file, let me know which rpm package has this libtool archive.
Hi, maybe can you try
./install-deps.sh
to install needed dependencies.
- Mail original -
De: "Srikanth Madugundi"
À: "Somnath Roy"
C
Oops... to fast to answer...
G.
On Mon, 11 May 2015 12:13:48 +0300, Timofey Titovets wrote:
Hey! I catch it again. Its a kernel bug. Kernel crushed if i try to
map rbd device with map like above!
Hooray!
2015-05-11 12:11 GMT+03:00 Timofey Titovets :
FYI and history
Rule:
# rules
rule replicat
Timofey,
glad that you 've managed to get it working :-)
Best,
George
FYI and history
Rule:
# rules
rule replicated_ruleset {
ruleset 0
type replicated
min_size 1
max_size 10
step take default
step choose firstn 0 type room
step choose firstn 0 type rack
step choose firstn 0 t
Hey! I catch it again. Its a kernel bug. Kernel crushed if i try to
map rbd device with map like above!
Hooray!
2015-05-11 12:11 GMT+03:00 Timofey Titovets :
> FYI and history
> Rule:
> # rules
> rule replicated_ruleset {
> ruleset 0
> type replicated
> min_size 1
> max_size 10
> step ta
FYI and history
Rule:
# rules
rule replicated_ruleset {
ruleset 0
type replicated
min_size 1
max_size 10
step take default
step choose firstn 0 type room
step choose firstn 0 type rack
step choose firstn 0 type host
step chooseleaf firstn 0 type osd
step emit
}
And after reset
Hi Christian
In my experience, inconsistent PGs are almost always related back to a bad
drive somewhere. They are going to keep happening, and with that many drives
you still need to be diligent/aggressive in dropping bad drives and replacing
them.
If a drive returns an incorrect read, it can
Hi all,
I have a few questions about ceph-fuse options:
- Is the fuse writeback cache being used? How can we see this? Can it be
turned on with allow_wbcache somehow?
- What is the default of the big_writes option? (as seen in
/usr/bin/ceph-fuse --help) . Where can we see this?
If we run cep
Hi all!
We are experiencing approximately 1 scrub error / inconsistent pg every
two days. As far as I know, to fix this you can issue a "ceph pg
repair", which works fine for us. I have a few qestions regarding the
behavior of the ceph cluster in such a case:
1. After ceph detects the scrub error
Nik,
If you increase num_jobs beyond 4 , is it helping further ? Try 8 or so.
Yeah, libsoft* is definitely consuming some cpu cycles , but I don't know how
to resolve that.
Also, acpi_processor_ffh_cstate_enter popped up and consuming lot of cpu. Try
disabling cstate and run cpu in maximum per
43 matches
Mail list logo