Hi,
I have a small number of OSDs running Ubuntu Trusty 14.04 and Ceph
Firefly 0.80.9. Due to stability issues I would like to disable the
btrfs snapshot feature (filestore btrfs snap = false).
Is it possible to apply this change to an existing OSD (stop OSD, change
config, restart OSD), or
> I have a cluster currently on Giant - is Hammer stable/ready for production
> use?
Assume so, upgraded a 0.87-1 to 0.94-1, only thing that came up was that now
Ceph will warn if you got too many PGs (>300/OSD) which it turned out I and
others had. So had too do pool consolidation in order to a
Hello,
On Thu, 23 Apr 2015 09:10:13 +0200 Burkhard Linke wrote:
> Hi,
>
> I have a small number of OSDs running Ubuntu Trusty 14.04 and Ceph
> Firefly 0.80.9. Due to stability issues I would like to disable the
> btrfs snapshot feature (filestore btrfs snap = false).
>
> Is it possible to ap
Am 22.04.2015 um 19:31 schrieb J David:
> On Wed, Apr 22, 2015 at 7:12 AM, Stefan Priebe - Profihost AG
> wrote:
>> Also a reweight-by-utilization does nothing.
>
> As a fellow sufferer from this issue, mostly what I can offer you is
> sympathy rather than actual help. However, this may be bene
> But in the menu, the use case "cephfs only" doesn't exist and I have
> no idea of the %data for each pools metadata and data. So, what is
> the proportion (approximatively) of %data between the "data" pool and
> the "metadata" pool of cephfs in a cephfs-only cluster?
>
> Is it rather metadata=20
Dear folks,
I'm sorry for the strange subject, but that might show my current
confusion too.
From what I know the writes to an OSD are also journaled for speed and
consistency. Currently that is done to the/a filesystem, that's why a
lot of suggestion are to use SSD for journals.
So far, that's
Hi,
I'm hitting this bug again today.
So don't seem to be numa related (I have try to flush linux buffer to be sure).
and tcmalloc is patched (I don't known how to verify that it's ok).
I don't have restarted osd yet.
Maybe some perf trace could be usefulll ?
- Mail original -
De: "ad
Dear Ceph experts,
I'm a very new Ceph user. I made a blunder that I removed some OSDs (and
all files in the related directories) before Ceph finished rebalancing
datas and migrating pgs.
Not to mention the data loss, I meet the problem that:
1) There are always stale pgs showing in ceph sta
What about running multiple clusters on the same host?
There is a separate mail thread about being able to run clusters with different
conf files on the same host.
Will the new systemd service scripts cope with this?
Paul Hewlett
Senior Systems Engineer
Velocix, Cambridge
Alcatel-Lucent
t: +44 1
Hi,
I've noticed that the btrfs file storage is performing some
cleanup/compacting operations during OSD startup.
Before OSD start:
/dev/sdc1 2.8T 2.4T 390G 87% /var/lib/ceph/osd/ceph-58
After OSD start:
/dev/sdc1 2.8T 2.2T 629G 78% /var/lib/ceph/osd/ceph-58
O
Hi, everyone!
Consider N nodes that receive and store some objects, and a node N+1
acting as a central storage. No one of N nodes can see the objects of
each other but the central node can see everything. We would run
independent Ceph storage on each of N nodes and replicate objects to
the central
> On 23/04/2015, at 10.24, Florent B wrote:
>
> I come back with this problem because it persists even after upgrading
> to Hammer.
>
> With CephFS, it does not work, and the only workaround I found does not
> work 100% of time :
I also found issues at reboots also, becaouse starting Ceph fuse
Dear All,
I have multiple disk types (15k & 7k) on each ceph node, which I assign
to different pools, but have a problem as whenever I reboot a node, the
OSD's move in the CRUSH map.
i.e. on host ceph4, before a reboot I have this osd tree
-10 7.68980 root 15k-disk
(snip)
-9 2.19995 h
Hello all,
At this moment we have a scenario where i would like your opinion on.
Scenario:
Currently we have a ceph environment with 1 rack of hardware, this rack
contains a couple of OSD nodes with 4T disks. In a few months time we will
deploy 2 more racks with OSD nodes, these nodes have 6
=== Facts ===
1. RadosGW disabled, rados bench write - 10 x bigger traffic served without
any slow_request.
2. RadosGW enabled - first slow_requests.
3. Traffic via RadosGW – 20-50 slow_requests per minute (~0,1% of IOPS)
4. We compacted leveldb on MONs 48h before first slow_requests. Maybe the
On 22/04/15 20:32, Robert LeBlanc wrote:
> I believe your problem is that you haven't created bootstrap-osd key
> and distributed it to your OSD node in /var/lib/ceph/bootstrap-osd/.
Hi Robert,
Thank you for your reply.
In my original post, steps performed, I did include copying over the
bootstr
Hi,
i will upgrade my existing Hardware (in 3 SC847-cases with 30 HDDs each)
the next days.
Should i buy a SAS-Expander 9300-8i or keep my existing Raid-Contr 9750-4i.
The 9300 will give me real Jbod , the 9750 only single-disk Raids.
Does this make a real difference ? Should i spend or keep m
Hi, everyone!
Consider N nodes that receive and store some objects, and a node N+1
acting as a central storage. No one of N nodes can see the objects of
each other but the central node can see everything. We would run
independent Ceph storage on each of N nodes and replicate objects to
the central
Hi,
I had a similar problem during reboots. It was solved by adding
'_netdev' to the options for the fstab entry. Otherwise the system may
try to mount the cephfs mount point before the network is available.
This solution is for Ubuntu, YMMV.
Best regards,
Burkhard
--
Dr. rer. nat. Burkhard
On Thu, Apr 23, 2015 at 11:18 AM, Jake Grimmett wrote:
> Dear All,
>
> I have multiple disk types (15k & 7k) on each ceph node, which I assign to
> different pools, but have a problem as whenever I reboot a node, the OSD's
> move in the CRUSH map.
I just found out that you can customize the way O
Hi,
On 04/23/2015 11:18 AM, Jake Grimmett wrote:
Dear All,
I have multiple disk types (15k & 7k) on each ceph node, which I
assign to different pools, but have a problem as whenever I reboot a
node, the OSD's move in the CRUSH map.
i.e. on host ceph4, before a reboot I have this osd tree
-
On Thu, Apr 23, 2015 at 11:18 AM, Jake Grimmett wrote:
> Dear All,
>
> I have multiple disk types (15k & 7k) on each ceph node, which I assign to
> different pools, but have a problem as whenever I reboot a node, the OSD's
> move in the CRUSH map.
I just found out that you can customize the way O
Maybe it's tcmalloc related
I thinked to have patched it correctly, but perf show a lot of
tcmalloc::ThreadCache::ReleaseToCentralCache
before osd restart (100k)
--
11.66%ceph-osd libtcmalloc.so.4.1.2 [.]
tcmalloc::ThreadCache::ReleaseToCentralCache
8.51%ce
The 9750-4i may support JBOD mode. If you would like to test, install the
MegaCLI tools. (Warning, this will clear the RAID configuration on your device)
This is the command for switching the RAID controller to JBOD mode:
#megacli -AdpSetProp -EnableJBOD -1 -aAll
Thanks,
Jacob
-Original
Thanks for the testing Alexandre!
If you have the means to compile the same version of ceph with jemalloc,
I would be very interested to see how it does.
In some ways I'm glad it turned out not to be NUMA. I still suspect we
will have to deal with it at some point, but perhaps not today. ;)
hi all,
after i upgraded ceph to 0.94.1 , it complains the following log everytime when
i restart ceph-osd, is there some method to disable this logs?
libust[19291/19291]: Warning: HOME environment variable not set. Disabling
LTTng-UST per-user tracing. (in setup_local_apps() at lttng-ust-comm
>>If you have the means to compile the same version of ceph with jemalloc,
>>I would be very interested to see how it does.
Yes, sure. (I have around 3-4 weeks to do all the benchs)
But I don't know how to do it ?
I'm running the cluster on centos7.1, maybe it can be easy to patch the srpms
to
Thanks Wido ...
It worked.
On Wed, Apr 22, 2015 at 5:33 PM, Wido den Hollander wrote:
>
>
> > Op 22 apr. 2015 om 16:54 heeft 10 minus het
> volgende geschreven:
> >
> > Hi,
> >
> > Is there a recommended way of powering down a ceph cluster and bringing
> it back up ?
> >
> > I have looked t
On 04/22/2015 07:35 PM, Gregory Farnum wrote:
On Wed, Apr 22, 2015 at 8:17 AM, Kenneth Waegeman
wrote:
Hi,
I changed the cluster network parameter in the config files, restarted the
monitors , and then restarted all the OSDs (shouldn't have done that).
Do you mean that you changed the IP a
I have done this via a restart of the OSDs after adding the configuration
option in ceph.conf. It works fine. My Ceph version is 0.80.5.
One thing worth to note is that you'll sooner or later want to remove stale
snap_* subvolumes as leaving them will cause a slow increase of disk usage
on your OS
Sorry, reading too fast. That key isn't from a previous attempt, correct?
But I doubt that is the problem as you would receive an access denied
message in the logs.
Try running Ceph-disk zap and recreate the OSD. Also remove the Auth key
and the osd (ceph osd rm ) then do a ceph-disk prepare. I do
The appearance of these socket closed messages seems to coincide with
the slowdown symptoms. What is the cause?
2015-04-23T14:08:47.111838+00:00 i-65062482 kernel: [ 4229.485489] libceph:
osd1 192.168.160.4:6800 socket closed (con state OPEN)
2015-04-23T14:09:06.961823+00:00 i-65062482 kernel:
Hi Jeff,
I believe these are normal, they are just the connections IDLE timing out to
the OSD's because no traffic has flowed recently. They are probably a
symptom rather than a cause.
Nick
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
>
Hi again,
On my testbed, I have 5 ceph nodes, each containing 23 OSDs (2TB btrfs drives).
For these tests, I've setup a RAID0 on the 23 disks.
For now, I'm not using SSDs as I discovered my vendor apparently decreased
their perfs on purpose...
So : 5 server nodes of which 3 are MONS too.
I also
On 04/22/2015 06:51 PM, Gregory Farnum wrote:
If you look at the "ceph --help" output you'll find some commands for
removing MDSes from the system.
Yes, this works for all but the last mds..
[root@mds01 ~]# ceph mds rm 35632 mds.mds03
Error EBUSY: cannot remove active mds.mds03 rank 0
I sto
A full CRUSH dump would be helpful, as well as knowing which OSDs you took
out. If you didn't take 17 out as well as 15, then you might be OK. If the
OSDs still show up in your CRUSH, then try and remove them from the CRSH
map with 'ceph osd crush rm osd.15'.
If you took out both OSDs, you will ne
On Thu, 23 Apr 2015, HEWLETT, Paul (Paul)** CTR ** wrote:
> What about running multiple clusters on the same host?
>
> There is a separate mail thread about being able to run clusters with
> different conf files on the same host.
> Will the new systemd service scripts cope with this?
As currentl
Hi Frederic,
If you are using EC pools, the primary OSD requests the remaining shards of
the object from the other OSD's, reassembles it and then sends the data to
the client. The entire object needs to be reconstructed even for a small IO
operation, so 4kb reads could lead to quite a large IO
All,
I was hoping for some advice. I have recently built a Ceph cluster on RHEL
6.5 and have configured RGW. I want to test Swift API access, and as a
result have created a user, swift subuser and swift keys as per the output
below:
1. Create user
radosgw-admin user create --ui
If you force CRUSH to put copies in each rack, then you will be limited by
the smallest rack. You can have some sever limitations if you try to keep
your copies to two racks (see the thread titles "CRUSH rule for 3 replicas
across 2 hosts") for some of my explanation about this.
If I were you, I w
Alexandre,
You can configure with --with-jemalloc or ./do_autogen -J to build ceph with
jemalloc.
Thanks & Regards
Somnath
-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
Alexandre DERUMIER
Sent: Thursday, April 23, 2015 4:56 AM
To: Mark Nelso
On Wed, Apr 22, 2015 at 4:30 PM, Nick Fisk wrote:
> I suspect you are hitting problems with sync writes, which Ceph isn't known
> for being the fastest thing for.
There's "not being the fastest thing" and "an expensive cluster of
hardware that performs worse than a single SATA drive." :-(
> I'm
On Wed, Apr 22, 2015 at 4:07 PM, Somnath Roy wrote:
> I am suggesting synthetic workload like fio to run on top of VM to identify
> where the bottleneck is. For example, if fio is giving decent enough output,
> I guess ceph layer is doing fine. It is your client that is not driving
> enough.
A
On Thu, Apr 23, 2015 at 5:20 AM, Kenneth Waegeman
>
> So it is all fixed now, but is it explainable that at first about 90% of
> the OSDS going into shutdown over and over, and only after some time got in
> a stable situation, because of one host network failure ?
>
> Thanks again!
Yes, unless yo
Sounds like you're hitting a known issue that was fixed a while back (although
might not be fixed on the specific version you're running). Can you try
creating a second subuser for the same user, see if that one works?
Yehuda
- Original Message -
> From: "alistair whittle"
> To: ceph-u
On Thu, Apr 23, 2015 at 12:55 AM, Steffen W Sørensen wrote:
>> But in the menu, the use case "cephfs only" doesn't exist and I have
>> no idea of the %data for each pools metadata and data. So, what is
>> the proportion (approximatively) of %data between the "data" pool and
>> the "metadata" pool
On Thu, Apr 23, 2015 at 1:25 AM, Burkhard Linke
wrote:
> Hi,
>
> I've noticed that the btrfs file storage is performing some
> cleanup/compacting operations during OSD startup.
>
> Before OSD start:
> /dev/sdc1 2.8T 2.4T 390G 87% /var/lib/ceph/osd/ceph-58
>
> After OSD start:
> /de
I think you have to "ceph mds fail" the last one up, then you'll be
able to remove it.
-Greg
On Thu, Apr 23, 2015 at 7:52 AM, Kenneth Waegeman
wrote:
>
>
> On 04/22/2015 06:51 PM, Gregory Farnum wrote:
>>
>> If you look at the "ceph --help" output you'll find some commands for
>> removing MDSes f
Can you explain this a bit more? You mean try and create a second subuser for
testuser1 or testuser2?
As an aside, I am running Ceph 0.80.7 as is packaged with ICE 1.2.2. I believe
that is the Firefly release.
-Original Message-
From: Yehuda Sadeh-Weinraub [mailto:yeh...@redhat.com]
I think you're hitting issue #8587 (http://tracker.ceph.com/issues/8587). This
issue has been fixed at 0.80.8, so you might want to upgrade to that version
(available with ICE 1.2.3).
Yehuda
- Original Message -
> From: "alistair whittle"
> To: yeh...@redhat.com
> Cc: ceph-users@lists.
Hello everyone.
I'm new to the list and also just a beginner at using Ceph, and I'd like
to get some advice from you on how to create the right infrastructure
for our scenario.
We'd like to provide storage to three different applications, but each
should have its own "area". Also, ideally we
David,
With the similar 128K profile I am getting ~200MB/s bandwidth with entire OSD
on SSD..I never tested with HDDs, but, it seems you are reaching Ceph's limit
on this. Probably, nothing wrong in your setup !
Thanks & Regards
Somnath
-Original Message-
From: jdavidli...@gmail.com [ma
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> J David
> Sent: 23 April 2015 17:51
> To: Nick Fisk
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Having trouble getting good performance
>
> On Wed, Apr 22, 2015 at 4:30 PM,
BTW, I am not writing any replicas with the below numbers..
Performance will degrade more based on replica numbers..How many replicas are
you writing ?
Thanks & Regards
Somnath
-Original Message-
From: Somnath Roy
Sent: Thursday, April 23, 2015 12:04 PM
To: 'J David'
Cc: ceph-users@lists
Hi!
I have copied two of my pools recently, because old ones has too many pgs.
Both of them contains RBD images, with 1GB and ~30GB of data.
Both pools was copied without errors, RBD images are mountable and seems to be
fine.
CEPH version is 0.94.1
Pavel.
> 7 апр. 2015 г., в 18:29, Kapil Shar
On Thu, Apr 23, 2015 at 3:05 PM, Nick Fisk wrote:
> If you can let us know the avg queue depth that ZFS is generating that will
> probably give a good estimation of what you can expect from the cluster.
How would that be measured?
> I have had a look through the fio runs, could you also try and
On Thu, 23 Apr 2015, Pavel V. Kaygorodov wrote:
> Hi!
>
> I have copied two of my pools recently, because old ones has too many pgs.
> Both of them contains RBD images, with 1GB and ~30GB of data.
> Both pools was copied without errors, RBD images are mountable and seems to
> be fine.
> CEPH vers
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> J David
> Sent: 23 April 2015 20:19
> To: Nick Fisk
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Having trouble getting good performance
>
> On Thu, Apr 23, 2015 at 3:05 PM,
Hi Rafael,
Do you require a shared FS for these applications or would a block device
with a traditional filesystem be suitable?
If it is, then you could create separate pools with a RBD block device in
each.
Just out of interest what is the reason for separation, security or
performance?
Nick
Hey cephers,
Just wanted to let you all know that the OAuth portion of the wiki
login has been removed in favor of stand-alone auth for now. Our plan
longer-term is to replace the wiki with something that will scale with
us a bit better and be more open. If you never set a password you
should stil
On Thu, Apr 23, 2015 at 3:05 PM, Nick Fisk wrote:
> I have had a look through the fio runs, could you also try and run a couple
> of jobs with iodepth=64 instead of numjobs=64. I know they should do the
> same thing, but the numbers with the former are easier to understand.
Maybe it's an issue of
On 04/23/2015 03:22 PM, J David wrote:
On Thu, Apr 23, 2015 at 3:05 PM, Nick Fisk wrote:
I have had a look through the fio runs, could you also try and run a couple
of jobs with iodepth=64 instead of numjobs=64. I know they should do the
same thing, but the numbers with the former are easier
> -Original Message-
> From: jdavidli...@gmail.com [mailto:jdavidli...@gmail.com] On Behalf Of J
> David
> Sent: 23 April 2015 21:22
> To: Nick Fisk
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Having trouble getting good performance
>
> On Thu, Apr 23, 2015 at 3:05 PM,
Hi,
I would like to use the gf-complete library for Erasure coding since it has
some ARM v8 based optimizations. I see that the code is part of my tree, but
not sure if these libraries are included in the final build.
I only see the libec_jerasure*.so in my libs folder after installation.
Are th
Hi Nick,
Thanks for answering.
Each application runs on its own cluster (these are Glassfish clusters,
and are distributed as nodes appA01, appA02,..., appB01, appB02, etc)
and each node on the cluster has to have access to the same files.
Currently we are using NFS for this, but it has its l
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Rafael Coninck Teigão
> Sent: 23 April 2015 22:35
> To: Nick Fisk; ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Serving multiple applications with a single
cluster
>
> Hi Nick,
>
> T
Hi,
The ARMv8 optimizations for gf-complete are in Hammer, not in Firefly. The
libec_jerasure*.so plugin contains gf-complete.
Cheers
On 23/04/2015 23:29, Garg, Pankaj wrote:
> Hi,
>
>
>
> I would like to use the gf-complete library for Erasure coding since it has
> some ARM v8 based optim
Thanks Loic. I was just looking at the source trees for gf-complete and saw
that v2-ceph tag has the optimizations and that's associated with Hammer.
One more question, on Hammer, will the Optimizations kick in automatically for
ARM. Do all of the different techniques have ARM optimizations or d
To update you on the current test in our lab:
1.We tested the Samsung OSD in Recovery mode and the speed was able to maxout
2x 10GbE port(transferring data at 2200+ MB/s during recovery). So for normal
write operation without O_DSYNC writes Samsung drives seem ok.
2.We then tested a couple of d
On Thu, Apr 23, 2015 at 4:23 PM, Mark Nelson wrote:
> If you want to adjust the iodepth, you'll need to use an asynchronous
> ioengine like libaio (you also need to use direct=1)
Ah yes, libaio makes a big difference. With 1 job:
testfile: (g=0): rw=randwrite, bs=128K-128K/128K-128K/128K-128K,
Hi, thank you for your response.
Well, I've not only taken out but also totally removed the both OSDs (by
"ceph osd rm" and delete everything in /var/lib/ceph/osd/)
of that pg (and similar to all other stale pgs.)
The main problem I have is those stale pgs (miss all OSDs I've removed)
not me
Hello,
On Thu, 23 Apr 2015 18:40:38 -0400 Anthony Levesque wrote:
> To update you on the current test in our lab:
>
> 1.We tested the Samsung OSD in Recovery mode and the speed was able to
> maxout 2x 10GbE port(transferring data at 2200+ MB/s during recovery).
> So for normal write operation w
What hosts were those OSDS on? I'm concerned that two OSDS for some of the
PGS were adjacent and if that placed them on the same host, it would be
contrary to your rules and something deeper is wrong.
Did you format the disks that were taken out of the cluster? Can you mount
the partitions and see
You could map the RBD to each host and put a cluster file system like OCFS2
on it so all cluster nodes can read and write at the same time. If these
are VMs, then you can present the RBD in libvirt and the root user would
not have access to mount other RBD in the same pool.
Robert LeBlanc
Sent fr
We are still experiencing a problem with out gateway not properly
clearing out shadow files.
I have done numerous tests where I have:
-Uploaded a file of 1.5GB in size using s3browser application
-Done an object stat on the file to get its prefix
-Done rados ls -p .rgw.buckets | grep to count t
Dear Robert,
Yes, you're right. The two OSDs removed of the PGs are from the same
host and contradict to my rules (that's a reason I removed them).
Unfortunately the partitions of the disk are all formatted so I cannot
recover the data.
However, the command "ceph pg force_create_pg " and res
76 matches
Mail list logo