[ceph-users] Ceph cluster works UNTIL the OSDs are rebooted

2019-11-14 Thread Richard Geoffrion
I had a working ceph cluster running nautilus in a test lab just a few months ago. Now that I'm trying to take ceph live on production hardware, I can't seem to get the cluster to stay up and available even though all three OSDs are UP and IN. I believe the problem is that the OSDs don't

[ceph-users] Ceph Cluster Replication / Disaster Recovery

2019-06-12 Thread DHilsbos
All; I'm testing and evaluating Ceph for the next generation of storage architecture for our company, and so far I'm fairly impressed, but I've got a couple of questions around cluster replication and disaster recovery. First; intended uses. Ceph Object Gateway will be used to support new

Re: [ceph-users] Ceph cluster available to clients with 2 different VLANs ?

2019-05-03 Thread solarflow99
eeded. > > > > Regards > > Manuel > > > > > > *De:* ceph-users *En nombre de *Martin > Verges > *Enviado el:* viernes, 3 de mayo de 2019 11:36 > *Para:* Hervé Ballans > *CC:* ceph-users > *Asunto:* Re: [ceph-users] Ceph cluster available to clients

Re: [ceph-users] Ceph cluster available to clients with 2 different VLANs ?

2019-05-03 Thread EDH - Manuel Rios Fernandez
: Hervé Ballans CC: ceph-users Asunto: Re: [ceph-users] Ceph cluster available to clients with 2 different VLANs ? Hello, configure a gateway on your router or use a good rack switch that can provide such features and use layer3 routing to connect different vlans / ip zones. -- Martin

Re: [ceph-users] Ceph cluster available to clients with 2 different VLANs ?

2019-05-03 Thread Martin Verges
Hello, configure a gateway on your router or use a good rack switch that can provide such features and use layer3 routing to connect different vlans / ip zones. -- Martin Verges Managing director Mobile: +49 174 9335695 E-Mail: martin.ver...@croit.io Chat: https://t.me/MartinVerges croit GmbH,

[ceph-users] Ceph cluster available to clients with 2 different VLANs ?

2019-05-03 Thread Hervé Ballans
Hi all, I have a Ceph cluster on Luminous 12.2.10 with 3 mon and 6 osd servers. My current network settings is a separated public and cluster (private IP) network. I would like my cluster available to clients on another VLAN than the default one (which is the public network on ceph.conf)

Re: [ceph-users] Ceph cluster on AMD based system.

2019-03-05 Thread Christian Balzer
On Tue, 5 Mar 2019 10:39:14 -0600 Mark Nelson wrote: > On 3/5/19 10:20 AM, Darius Kasparavičius wrote: > > Thank you for your response. > > > > I was planning to use a 100GbE or 45GbE bond for this cluster. It was > > acceptable for our use case to lose sequential/larger I/O speed for > > it.

Re: [ceph-users] Ceph cluster on AMD based system.

2019-03-05 Thread Mark Nelson
On 3/5/19 10:20 AM, Darius Kasparavičius wrote: Thank you for your response. I was planning to use a 100GbE or 45GbE bond for this cluster. It was acceptable for our use case to lose sequential/larger I/O speed for it. Dual socket would be and option, but I do not want to touch numa, cgroups

Re: [ceph-users] Ceph cluster on AMD based system.

2019-03-05 Thread Darius Kasparavičius
Thank you for your response. I was planning to use a 100GbE or 45GbE bond for this cluster. It was acceptable for our use case to lose sequential/larger I/O speed for it. Dual socket would be and option, but I do not want to touch numa, cgroups and the rest settings. Most of the time is just

Re: [ceph-users] Ceph cluster on AMD based system.

2019-03-05 Thread Mark Nelson
Hi, I've got a ryzen7 1700 box that I regularly run tests on along with the upstream community performance test nodes that have Intel Xeon E5-2650v3 processors in them.  The Ryzen is 3.0GHz/3.7GHz turbo while the Xeons are 2.3GHz/3.0GHz.  The Xeons are quite a bit faster clock/clock in the

Re: [ceph-users] Ceph cluster on AMD based system.

2019-03-05 Thread Ashley Merrick
arius Kasparaviius [mailto:daz...@gmail.com] > Sent: 05 March 2019 10:50 > To: ceph-users > Subject: [ceph-users] Ceph cluster on AMD based system. > > Hello, > > > I was thinking of using AMD based system for my new nvme based cluster. > In particular I'm looking at > https:/

Re: [ceph-users] Ceph cluster on AMD based system.

2019-03-05 Thread Marc Roos
: 05 March 2019 10:50 To: ceph-users Subject: [ceph-users] Ceph cluster on AMD based system. Hello, I was thinking of using AMD based system for my new nvme based cluster. In particular I'm looking at https://www.supermicro.com/Aplus/system/1U/1113/AS-1113S-WN10RT.cfm and https://www.amd.com/en

Re: [ceph-users] Ceph cluster on AMD based system.

2019-03-05 Thread Paul Emmerich
Not with this particular server, but we've played around with two EPYCs system with 10 NVMe in each and 100 Gbit/s network between them. Make sure to use a recent Linux kernel, but other than that it works fine. Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at

[ceph-users] Ceph cluster on AMD based system.

2019-03-05 Thread Darius Kasparavičius
Hello, I was thinking of using AMD based system for my new nvme based cluster. In particular I'm looking at https://www.supermicro.com/Aplus/system/1U/1113/AS-1113S-WN10RT.cfm and https://www.amd.com/en/products/cpu/amd-epyc-7451 CPU's. Have anyone tried running it on this particular hardware?

Re: [ceph-users] Ceph cluster stability

2019-02-25 Thread Darius Kasparavičius
I think this should give you a bit of isight on using large scale clusters. https://www.youtube.com/watch?v=NdGHE-yq1gU and https://www.youtube.com/watch?v=WpMzAFH6Mc4 . Watch the second video I think it more relates to your problem. On Mon, Feb 25, 2019, 11:33 M Ranga Swami Reddy wrote: > We

Re: [ceph-users] Ceph cluster stability

2019-02-25 Thread M Ranga Swami Reddy
We have taken care all HW recommendations, but missing that ceph mons are VMs with good configuration (4 core, 64G RAM + 500G disk)... Is this ceph-mon configuration might cause issues? On Sat, Feb 23, 2019 at 6:31 AM Anthony D'Atri wrote: > > > ? Did we start recommending that production mons

Re: [ceph-users] Ceph cluster stability

2019-02-22 Thread Anthony D'Atri
? Did we start recommending that production mons run on a VM? I'd be very hesitant to do that, though probably some folks do. I can say for sure that in the past (Firefly) I experienced outages related to mons running on HDDs. That was a cluster of 450 HDD OSDs with colo journals and

Re: [ceph-users] Ceph cluster stability

2019-02-22 Thread M Ranga Swami Reddy
Opps...is this really impact...will righ-away change this and test it. On Fri, Feb 22, 2019 at 5:29 PM Janne Johansson wrote: > > Den fre 22 feb. 2019 kl 12:35 skrev M Ranga Swami Reddy > : >> >> No seen the CPU limitation because we are using the 4 cores per osd daemon. >> But still using

Re: [ceph-users] Ceph cluster stability

2019-02-22 Thread M Ranga Swami Reddy
But ceph recommendation is to use VM (not even the HW node recommended). will try to change the mon disk as SSD and HW node. On Fri, Feb 22, 2019 at 5:25 PM Darius Kasparavičius wrote: > > If your using hdd for monitor servers. Check their load. It might be > the issue there. > > On Fri, Feb

Re: [ceph-users] Ceph cluster stability

2019-02-22 Thread Janne Johansson
Den fre 22 feb. 2019 kl 12:35 skrev M Ranga Swami Reddy < swamire...@gmail.com>: > No seen the CPU limitation because we are using the 4 cores per osd daemon. > But still using "ms_crc_data = true and ms_crc_header = true". Will > disable these and try the performance. > I am a bit sceptical to

Re: [ceph-users] Ceph cluster stability

2019-02-22 Thread Darius Kasparavičius
If your using hdd for monitor servers. Check their load. It might be the issue there. On Fri, Feb 22, 2019 at 1:50 PM M Ranga Swami Reddy wrote: > > ceph-mon disk with 500G with HDD (not journals/SSDs). Yes, mon use > folder on FS on a disk > > On Fri, Feb 22, 2019 at 5:13 PM David Turner

Re: [ceph-users] Ceph cluster stability

2019-02-22 Thread M Ranga Swami Reddy
ceph-mon disk with 500G with HDD (not journals/SSDs). Yes, mon use folder on FS on a disk On Fri, Feb 22, 2019 at 5:13 PM David Turner wrote: > > Mon disks don't have journals, they're just a folder on a filesystem on a > disk. > > On Fri, Feb 22, 2019, 6:40 AM M Ranga Swami Reddy > wrote:

Re: [ceph-users] Ceph cluster stability

2019-02-22 Thread David Turner
Mon disks don't have journals, they're just a folder on a filesystem on a disk. On Fri, Feb 22, 2019, 6:40 AM M Ranga Swami Reddy wrote: > ceph mons looks fine during the recovery. Using HDD with SSD > journals. with recommeded CPU and RAM numbers. > > On Fri, Feb 22, 2019 at 4:40 PM David

Re: [ceph-users] Ceph cluster stability

2019-02-22 Thread M Ranga Swami Reddy
ceph mons looks fine during the recovery. Using HDD with SSD journals. with recommeded CPU and RAM numbers. On Fri, Feb 22, 2019 at 4:40 PM David Turner wrote: > > What about the system stats on your mons during recovery? If they are having > a hard time keeping up with requests during a

Re: [ceph-users] Ceph cluster stability

2019-02-22 Thread David Turner
What about the system stats on your mons during recovery? If they are having a hard time keeping up with requests during a recovery, I could see that impacting client io. What disks are they running on? CPU? Etc. On Fri, Feb 22, 2019, 6:01 AM M Ranga Swami Reddy wrote: > Debug setting defaults

Re: [ceph-users] Ceph cluster stability

2019-02-22 Thread M Ranga Swami Reddy
Debug setting defaults are using..like 1/5 and 0/5 for almost.. Shall I try with 0 for all debug settings? On Wed, Feb 20, 2019 at 9:17 PM Darius Kasparavičius wrote: > > Hello, > > > Check your CPU usage when you are doing those kind of operations. We > had a similar issue where our CPU

Re: [ceph-users] Ceph cluster stability

2019-02-22 Thread M Ranga Swami Reddy
No seen the CPU limitation because we are using the 4 cores per osd daemon. But still using "ms_crc_data = true and ms_crc_header = true". Will disable these and try the performance. And using the filestore + leveldB only. filestore_op_threads = 2. Rest of recovery and backfill settings done

Re: [ceph-users] Ceph cluster stability

2019-02-22 Thread M Ranga Swami Reddy
Yep...these are setting already in place. And also followed all recommendations to get performance, but still impacts with osd down..even we have 2000+ osd. And using 3 pools with diff. HW nodes for each pool. One pool's OSD down, also impacts other pools performance... which not expected with

Re: [ceph-users] Ceph cluster stability

2019-02-20 Thread Alexandru Cucu
Hi, I would decrese max active recovery processes per osd and increase recovery sleep. osd recovery max active = 1 (default is 3) osd recovery sleep = 1 (default is 0 or 0.1) osd max backfills defaults to 1 so that should be OK if he's using the default :D Disabling scrubbing during

Re: [ceph-users] Ceph cluster stability

2019-02-20 Thread Darius Kasparavičius
Hello, Check your CPU usage when you are doing those kind of operations. We had a similar issue where our CPU monitoring was reporting fine < 40% usage, but our load on the nodes was high mid 60-80. If it's possible try disabling ht and see the actual cpu usage. If you are hitting CPU limits you

Re: [ceph-users] Ceph cluster stability

2019-02-20 Thread M Ranga Swami Reddy
Thats expected from Ceph by design. But in our case, we are using all recommendation like rack failure domain, replication n/w,etc, still face client IO performance issues during one OSD down.. On Tue, Feb 19, 2019 at 10:56 PM David Turner wrote: > > With a RACK failure domain, you should be

Re: [ceph-users] Ceph cluster stability

2019-02-19 Thread David Turner
With a RACK failure domain, you should be able to have an entire rack powered down without noticing any major impact on the clients. I regularly take down OSDs and nodes for maintenance and upgrades without seeing any problems with client IO. On Tue, Feb 12, 2019 at 5:01 AM M Ranga Swami Reddy

[ceph-users] Ceph cluster stability

2019-02-12 Thread M Ranga Swami Reddy
Hello - I have a couple of questions on ceph cluster stability, even we follow all recommendations as below: - Having separate replication n/w and data n/w - RACK is the failure domain - Using SSDs for journals (1:4ratio) Q1 - If one OSD down, cluster IO down drastically and customer Apps

Re: [ceph-users] Ceph Cluster to OSD Utilization not in Sync

2018-12-21 Thread Pardhiv Karri
Thank You Dwyeni for the quick response. We have 2 Hammer which are due for upgrade to Luminous next month and 1 Luminous 12.2.8. Will try this on Luminous and if it works then will apply the same once the Hammer clusters are upgraded rather than adjusting the weights. Thanks, Pardhiv Karri On

Re: [ceph-users] Ceph Cluster to OSD Utilization not in Sync

2018-12-21 Thread Dyweni - Ceph-Users
Hi, If you are running Ceph Luminous or later, use the Ceph Manager Daemon's Balancer module. (http://docs.ceph.com/docs/luminous/mgr/balancer/). Otherwise, tweak the OSD weights (not the OSD CRUSH weights) until you achieve uniformity. (You should be able to get under 1 STDDEV). I would

[ceph-users] Ceph Cluster to OSD Utilization not in Sync

2018-12-21 Thread Pardhiv Karri
Hi, We have Ceph clusters which are greater than 1PB. We are using tree algorithm. The issue is with the data placement. If the cluster utilization percentage is at 65% then some of the OSDs are already above 87%. We had to change the near_full ratio to 0.90 to circumvent warnings and to get back

Re: [ceph-users] Ceph cluster uses substantially more disk space after rebalancing

2018-11-02 Thread vitalif
If you simply multiply number of objects and rbd object size you will get 7611672*4M ~= 29T and that is what you should see in USED field, and 29/2*3=43.5T of raw space. Unfortunately no idea why they consume less; probably because not all objects are fully written. It seems some objects

Re: [ceph-users] Ceph cluster uses substantially more disk space after rebalancing

2018-11-02 Thread Aleksei Gutikov
If you simply multiply number of objects and rbd object size you will get 7611672*4M ~= 29T and that is what you should see in USED field, and 29/2*3=43.5T of raw space. Unfortunately no idea why they consume less; probably because not all objects are fully written. What ceph version? Can you

Re: [ceph-users] Ceph cluster uses substantially more disk space after rebalancing

2018-11-02 Thread vitalif
Hi again. It seems I've found the problem, although I don't understand the root cause. I looked into OSD datastore using ceph-objectstore-tool and I see that for almost every object there are two copies, like: 2#13:080008d8:::rbd_data.15.3d3e1d6b8b4567.00361a96:28#

Re: [ceph-users] Ceph cluster uses substantially more disk space after rebalancing

2018-10-29 Thread Виталий Филиппов
Is there a way to force OSDs to remove old data? Hi After I recreated one OSD + increased pg count of my erasure-coded (2+1) pool (which was way too low, only 100 for 9 osds) the cluster started to eat additional disk space. First I thought that was caused by the moved PGs using

[ceph-users] Ceph cluster uses substantially more disk space after rebalancing

2018-10-29 Thread Виталий Филиппов
Hi After I recreated one OSD + increased pg count of my erasure-coded (2+1) pool (which was way too low, only 100 for 9 osds) the cluster started to eat additional disk space. First I thought that was caused by the moved PGs using additional space during unfinished backfills. I pinned most of

Re: [ceph-users] CEPH Cluster Usage Discrepancy

2018-10-21 Thread Sergey Malinin
It is just a block size and it has no impact on data safety except that OSDs need to be redeployed in order for them to create bluefs with given block size. > On 21.10.2018, at 19:04, Waterbly, Dan wrote: > > Thanks Sergey! > > Do you know where I can find details on the repercussions of

Re: [ceph-users] CEPH Cluster Usage Discrepancy

2018-10-21 Thread Waterbly, Dan
Thanks Sergey! Do you know where I can find details on the repercussions of adjusting this value? Performance (read/writes), for once, not critical for us, data durability and disaster recovery is our focus. -Dan Get Outlook for iOS On Sun, Oct 21, 2018 at 8:37 AM

Re: [ceph-users] CEPH Cluster Usage Discrepancy

2018-10-21 Thread Sergey Malinin
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-February/024589.html > On 21.10.2018, at 16:12, Waterbly, Dan wrote: > > Awesome! Thanks Serian! > > Do you know where the 64KB comes from? Can that be

Re: [ceph-users] CEPH Cluster Usage Discrepancy

2018-10-21 Thread Waterbly, Dan
Awesome! Thanks Serian! Do you know where the 64KB comes from? Can that be tuned down for a cluster holding smaller objects? Get Outlook for iOS On Sat, Oct 20, 2018 at 10:49 PM -0700, "Serkan Çoban" mailto:cobanser...@gmail.com>> wrote: you have 24M objects, not

Re: [ceph-users] CEPH Cluster Usage Discrepancy

2018-10-20 Thread Serkan Çoban
you have 24M objects, not 2.4M. Each object will eat 64KB of storage, so 24M objects uses 1.5TB storage. Add 3x replication to that, it is 4.5TB On Sat, Oct 20, 2018 at 11:47 PM Waterbly, Dan wrote: > > Hi Jakub, > > No, my setup seems to be the same as yours. Our system is mainly for >

Re: [ceph-users] CEPH Cluster Usage Discrepancy

2018-10-20 Thread Waterbly, Dan
Hi Jakub, No, my setup seems to be the same as yours. Our system is mainly for archiving loads of data. This data has to be stored forever and allow reads, albeit seldom considering the number of objects we will store vs the number of objects that ever will be requested. It just really seems

Re: [ceph-users] CEPH Cluster Usage Discrepancy

2018-10-20 Thread Jakub Jaszewski
Hi Dan, Did you configure block.wal/block.db as separate devices/partition (osd_scenario: non-collocated or lvm for clusters installed using ceph-ansbile playbooks )? I run Ceph version 13.2.1 with non-collocated data.db and have the same situation - the sum of block.db partitions' size is

Re: [ceph-users] CEPH Cluster Usage Discrepancy

2018-10-20 Thread Waterbly, Dan
I get that, but isn’t 4TiB to track 2.45M objects excessive? These numbers seem very high to me. Get Outlook for iOS On Sat, Oct 20, 2018 at 10:27 AM -0700, "Serkan Çoban" mailto:cobanser...@gmail.com>> wrote: 4.65TiB includes size of wal and db partitions too. On

Re: [ceph-users] CEPH Cluster Usage Discrepancy

2018-10-20 Thread Serkan Çoban
4.65TiB includes size of wal and db partitions too. On Sat, Oct 20, 2018 at 7:45 PM Waterbly, Dan wrote: > > Hello, > > > > I have inserted 2.45M 1,000 byte objects into my cluster (radosgw, 3x > replication). > > > > I am confused by the usage ceph df is reporting and am hoping someone can >

[ceph-users] CEPH Cluster Usage Discrepancy

2018-10-20 Thread Waterbly, Dan
Hello, I have inserted 2.45M 1,000 byte objects into my cluster (radosgw, 3x replication). I am confused by the usage ceph df is reporting and am hoping someone can shed some light on this. Here is what I see when I run ceph df GLOBAL: SIZEAVAIL RAW USED %RAW USED

[ceph-users] Ceph cluster "hung" after node failure

2018-08-29 Thread Brett Chancellor
Hi All. I have a ceph cluster that's partially upgraded to Luminous. Last night a host died and since then the cluster is failing to recover. It finished backfilling, but was left with thousands of requests degraded, inactive, or stale. In order to move past the issue, I put the cluster in

Re: [ceph-users] ceph cluster monitoring tool

2018-07-24 Thread Lenz Grimmer
On 07/24/2018 07:02 AM, Satish Patel wrote: > My 5 node ceph cluster is ready for production, now i am looking for > good monitoring tool (Open source), what majority of folks using in > their production? There are several, using Prometheus with the Ceph Exporter Manager module is a popular

Re: [ceph-users] ceph cluster monitoring tool

2018-07-24 Thread Guilherme Steinmüller
Satish, I'm currently working on monasca's roles for openstack-ansible. We have plugins that monitors ceph as well and I use in production. Bellow you can see an example: https://imgur.com/a/6l6Q2K6 Em ter, 24 de jul de 2018 às 02:02, Satish Patel escreveu: > My 5 node ceph cluster is

Re: [ceph-users] ceph cluster monitoring tool

2018-07-24 Thread Matthew Vernon
Hi, On 24/07/18 06:02, Satish Patel wrote: > My 5 node ceph cluster is ready for production, now i am looking for > good monitoring tool (Open source), what majority of folks using in > their production? This does come up from time to time, so it's worth checking the list archives. We use

Re: [ceph-users] ceph cluster monitoring tool

2018-07-24 Thread Marc Roos
Just use collectd to start with. That is easiest with influxdb. However do not expect to much of the support on influxdb. -Original Message- From: Satish Patel [mailto:satish@gmail.com] Sent: dinsdag 24 juli 2018 7:02 To: ceph-users Subject: [ceph-users] ceph cluster monitoring

Re: [ceph-users] ceph cluster monitoring tool

2018-07-24 Thread Robert Sander
On 24.07.2018 07:02, Satish Patel wrote: > My 5 node ceph cluster is ready for production, now i am looking for > good monitoring tool (Open source), what majority of folks using in > their production? Some people already use Prometheus and the exporter from the Ceph Mgr. Some use more

[ceph-users] ceph cluster monitoring tool

2018-07-23 Thread Satish Patel
My 5 node ceph cluster is ready for production, now i am looking for good monitoring tool (Open source), what majority of folks using in their production? ___ ceph-users mailing list ceph-users@lists.ceph.com

Re: [ceph-users] ceph cluster

2018-06-12 Thread Ronny Aasen
On 12. juni 2018 12:17, Muneendra Kumar M wrote: conf file as shown below. If I reconfigure my ipaddress from 10.xx.xx.xx to 192.xx.xx.xx and by changing the public network and mon_host filed in the ceph.conf Will my cluster will work as it is ? Below is my ceph.conf file details. Any

[ceph-users] ceph cluster

2018-06-12 Thread Muneendra Kumar M
Hi , I have created a ceph cluster with 3 osds and everything is running fine. And our public network configuration parameter field was set to 10.xx.xx.0/24 in ceph.conf file as shown below. If I reconfigure my ipaddress from 10.xx.xx.xx to 192.xx.xx.xx and by changing the public network and

Re: [ceph-users] Ceph Cluster with 3 Machines

2018-05-29 Thread David Turner
Using the kernel driver to map RBDs to a host with OSDs is known to cause system locks. The answer to avoiding this is to use rbd-nbd or rbd-fuse instead of the kernel driver if you NEED to map the RBD to the same host as any OSDs. On Tue, May 29, 2018 at 7:34 AM Joshua Collins wrote: > Hi > >

[ceph-users] Ceph Cluster with 3 Machines

2018-05-29 Thread Joshua Collins
Hi I've had a go at setting up a Ceph cluster but I've ran into some issues. I have 3 physical machines to set up a Ceph cluster, and two of these machines will be part of a HA pair using corosync and Pacemaker. I keep running into filesystem lock issues on unmount when I have a machine

Re: [ceph-users] Ceph cluster network bandwidth?

2017-11-20 Thread Anthony Verevkin
> From: "John Spray" > Sent: Thursday, November 16, 2017 11:01:35 AM > > On Thu, Nov 16, 2017 at 3:32 PM, David Turner > wrote: > > That depends on another question. Does the client write all 3 > > copies or > > does the client send the copy to the

Re: [ceph-users] Ceph cluster network bandwidth?

2017-11-16 Thread Blair Bethwaite
What type of SAS disks, spinners or SSD? You really need to specify the sustained write throughput of your OSD nodes if you want to figure out whether your network is sufficient/appropriate. At 3x replication if you want to sustain e.g. 1 GB/s of write traffic from clients then you will need 2

Re: [ceph-users] Ceph cluster network bandwidth?

2017-11-16 Thread John Spray
On Thu, Nov 16, 2017 at 3:32 PM, David Turner wrote: > That depends on another question. Does the client write all 3 copies or > does the client send the copy to the primary OSD and then the primary OSD > sends the write to the secondaries? Someone asked this recently,

Re: [ceph-users] Ceph cluster network bandwidth?

2017-11-16 Thread David Turner
Another ML thread currently happening is "[ceph-users] Cluster network slower than public network" And It has some good information that might be useful for you. On Thu, Nov 16, 2017 at 10:32 AM David Turner wrote: > That depends on another question. Does the client

Re: [ceph-users] Ceph cluster network bandwidth?

2017-11-16 Thread David Turner
That depends on another question. Does the client write all 3 copies or does the client send the copy to the primary OSD and then the primary OSD sends the write to the secondaries? Someone asked this recently, but I don't recall if an answer was given. I'm not actually certain which is the

[ceph-users] Ceph cluster network bandwidth?

2017-11-16 Thread Sam Huracan
Hi, We intend build a new Ceph cluster with 6 Ceph OSD hosts, 10 SAS disks every host, using 10Gbps NIC for client network, object is replicated 3. So, how could I sizing the cluster network for best performance? As i have read, 3x replicate means 3x bandwidth client network = 30 Gbps, is it

Re: [ceph-users] Ceph cluster with SSDs

2017-09-12 Thread Christian Balzer
Please don't remove the ML. I'm not a support channel and if I reply to mails it is so that others hopefully will learn from that. ML re-added. On Mon, 11 Sep 2017 16:30:18 +0530 M Ranga Swami Reddy wrote: > >>> >> Here I have NVMes from Intel. but as the support of these NVMes not > >>> >>

Re: [ceph-users] Ceph cluster with SSDs

2017-08-23 Thread Christian Balzer
On Wed, 23 Aug 2017 16:48:12 +0530 M Ranga Swami Reddy wrote: > On Mon, Aug 21, 2017 at 5:37 PM, Christian Balzer wrote: > > On Mon, 21 Aug 2017 17:13:10 +0530 M Ranga Swami Reddy wrote: > > > >> Thank you. > >> Here I have NVMes from Intel. but as the support of these NVMes not

Re: [ceph-users] Ceph cluster with SSDs

2017-08-23 Thread M Ranga Swami Reddy
On Mon, Aug 21, 2017 at 5:37 PM, Christian Balzer wrote: > On Mon, 21 Aug 2017 17:13:10 +0530 M Ranga Swami Reddy wrote: > >> Thank you. >> Here I have NVMes from Intel. but as the support of these NVMes not >> there from Intel, we decided not to use these NVMes as a journal. > >

Re: [ceph-users] Ceph cluster with SSDs

2017-08-21 Thread Christian Balzer
On Mon, 21 Aug 2017 17:13:10 +0530 M Ranga Swami Reddy wrote: > Thank you. > Here I have NVMes from Intel. but as the support of these NVMes not > there from Intel, we decided not to use these NVMes as a journal. You again fail to provide with specific model numbers... No support from Intel

Re: [ceph-users] Ceph cluster with SSDs

2017-08-21 Thread M Ranga Swami Reddy
Thank you. Here I have NVMes from Intel. but as the support of these NVMes not there from Intel, we decided not to use these NVMes as a journal. Btw, if we split this SSD with multiple OSD (for ex: 1 SSD with 4 or 2 OSDs), is this help any performance numbers? On Sun, Aug 20, 2017 at 9:33 AM,

Re: [ceph-users] Ceph cluster with SSDs

2017-08-20 Thread Christian Balzer
On Mon, 21 Aug 2017 01:48:49 + Adrian Saul wrote: > > SSD make details : SSD 850 EVO 2.5" SATA III 4TB Memory & Storage - MZ- > > 75E4T0B/AM | Samsung > > The performance difference between these and the SM or PM863 range is night > and day. I would not use these for anything you care

Re: [ceph-users] Ceph cluster with SSDs

2017-08-20 Thread Adrian Saul
> SSD make details : SSD 850 EVO 2.5" SATA III 4TB Memory & Storage - MZ- > 75E4T0B/AM | Samsung The performance difference between these and the SM or PM863 range is night and day. I would not use these for anything you care about with performance, particularly IOPS or latency. Their write

Re: [ceph-users] Ceph cluster with SSDs

2017-08-20 Thread Christian Balzer
On Sun, 20 Aug 2017 08:38:54 +0200 Sinan Polat wrote: > What has DWPD to do with performance / IOPS? The SSD will just fail earlier, > but it should not have any affect on the performance, right? > Nothing, I listed BOTH reasons why these are unsuitable. You just don't buy something huge like

Re: [ceph-users] Ceph cluster with SSDs

2017-08-20 Thread Sinan Polat
What has DWPD to do with performance / IOPS? The SSD will just fail earlier, but it should not have any affect on the performance, right? Correct me if I am wrong, just want to learn. > Op 20 aug. 2017 om 06:03 heeft Christian Balzer het volgende > geschreven: > > DWPD

Re: [ceph-users] Ceph cluster with SSDs

2017-08-19 Thread Christian Balzer
Hello, On Sat, 19 Aug 2017 23:22:11 +0530 M Ranga Swami Reddy wrote: > SSD make details : SSD 850 EVO 2.5" SATA III 4TB Memory & Storage - > MZ-75E4T0B/AM | Samsung > And there's your answer. A bit of googling in the archives here would have shown you that these are TOTALLY unsuitable for use

Re: [ceph-users] Ceph cluster with SSDs

2017-08-19 Thread M Ranga Swami Reddy
SSD make details : SSD 850 EVO 2.5" SATA III 4TB Memory & Storage - MZ-75E4T0B/AM | Samsung On Sat, Aug 19, 2017 at 10:44 PM, M Ranga Swami Reddy wrote: > Yes, Its in production and used the pg count as per the pg calcuator @ > ceph.com. > > On Fri, Aug 18, 2017 at 3:30

Re: [ceph-users] Ceph cluster with SSDs

2017-08-19 Thread M Ranga Swami Reddy
I did not only "osd bench". Performed rbd image mapped and DD test on it... here also got very less number with image on SSD pool as compared with image on HDD pool. As per SSD datasheet - they claim 500 MB/s, but I am getting somewhat near 50 MB/s with dd cmd. On Fri, Aug 18, 2017 at 6:32 AM,

Re: [ceph-users] Ceph cluster with SSDs

2017-08-19 Thread M Ranga Swami Reddy
Yes, Its in production and used the pg count as per the pg calcuator @ ceph.com. On Fri, Aug 18, 2017 at 3:30 AM, Mehmet wrote: > Which ssds are used? Are they in production? If so how is your PG Count? > > Am 17. August 2017 20:04:25 MESZ schrieb M Ranga Swami Reddy >

Re: [ceph-users] ceph Cluster attempt to access beyond end of device

2017-08-17 Thread Hauke Homburg
Am 15.08.2017 um 16:34 schrieb ZHOU Yuan: > Hi Hauke, > > It's possibly the XFS issue as discussed in the previous thread. I > also saw this issue in some JBOD setup, running with RHEL 7.3 > > > Sincerely, Yuan > > On Tue, Aug 15, 2017 at 7:38 PM, Hauke Homburg >

Re: [ceph-users] Ceph cluster with SSDs

2017-08-17 Thread Christian Balzer
Hello, On Fri, 18 Aug 2017 00:00:09 +0200 Mehmet wrote: > Which ssds are used? Are they in production? If so how is your PG Count? > What he wrote. W/o knowing which apples you're comparing to what oranges, this is pointless. Also testing osd bench is the LEAST relevant test you can do, as it

Re: [ceph-users] Ceph cluster with SSDs

2017-08-17 Thread Mehmet
Which ssds are used? Are they in production? If so how is your PG Count? Am 17. August 2017 20:04:25 MESZ schrieb M Ranga Swami Reddy : >Hello, >I am using the Ceph cluster with HDDs and SSDs. Created separate pool >for each. >Now, when I ran the "ceph osd bench", HDD's

[ceph-users] Ceph cluster with SSDs

2017-08-17 Thread M Ranga Swami Reddy
Hello, I am using the Ceph cluster with HDDs and SSDs. Created separate pool for each. Now, when I ran the "ceph osd bench", HDD's OSDs show around 500 MB/s and SSD's OSD show around 280MB/s. Ideally, what I expected was - SSD's OSDs should be at-least 40% high as compared with HDD's OSD bench.

Re: [ceph-users] Ceph cluster in error state (full) with raw usage 32% of total capacity

2017-08-16 Thread Mandar Naik
Thanks a lot for the reply. To eliminate issue of root not being present and duplicate entries in crush map I have updated my crush map. Now I have default root and I have crush hierarchy without duplicate entries. I have now created one pool local to host "ip-10-0-9-233" while other pool local

Re: [ceph-users] Ceph cluster in error state (full) with raw usage 32% of total capacity

2017-08-16 Thread Etienne Menguy
have some production data, do a backup first) Étienne From: ceph-users <ceph-users-boun...@lists.ceph.com> on behalf of Mandar Naik <mandar.p...@gmail.com> Sent: Wednesday, August 16, 2017 09:39 To: ceph-users@lists.ceph.com Subject: Re: [ceph

Re: [ceph-users] Ceph cluster in error state (full) with raw usage 32% of total capacity

2017-08-16 Thread Luis Periquito
Not going through the obvious of that crush map is just not looking correct or even sane... or that the policy itself doesn't sound very sane - but I'm sure you'll understand the caveats and issues it may present... what's most probably happening is that a (or several) pool is using those same

Re: [ceph-users] Ceph cluster in error state (full) with raw usage 32% of total capacity

2017-08-16 Thread Mandar Naik
Hi, I just wanted to give a friendly reminder for this issue. I would appreciate if someone can help me out here. Also, please do let me know in case some more information is required here. On Thu, Aug 10, 2017 at 2:41 PM, Mandar Naik wrote: > Hi Peter, > Thanks a lot for

Re: [ceph-users] ceph Cluster attempt to access beyond end of device

2017-08-15 Thread ZHOU Yuan
Hi Hauke, It's possibly the XFS issue as discussed in the previous thread. I also saw this issue in some JBOD setup, running with RHEL 7.3 Sincerely, Yuan On Tue, Aug 15, 2017 at 7:38 PM, Hauke Homburg wrote: > Hello, > > > I found some error in the Cluster with dmes

Re: [ceph-users] ceph Cluster attempt to access beyond end of device

2017-08-15 Thread David Turner
The error found in that thread, iirc, is that the block size of the disk does not match the block size of the FS and is trying to access the rest of a block at the end of a disk. I also remember that the error didn't cause any problems. Why raid 6? Rebuilding a raid 6 seems like your cluster

[ceph-users] ceph Cluster attempt to access beyond end of device

2017-08-15 Thread Hauke Homburg
Hello, I found some error in the Cluster with dmes -T: attempt to access beyond end of device I found the following Post: https://www.mail-archive.com/ceph-users@lists.ceph.com/msg39101.html Is this a Problem with the Size of the Filesystem itself oder "only" eine Driver Bug? I ask becaue we

Re: [ceph-users] Ceph Cluster with Deeo Scrub Error

2017-08-14 Thread Hauke Homburg
Am 04.07.2017 um 17:58 schrieb Etienne Menguy: > rados list-inconsistent-ob Hello, Sorry for my late reply. We installed some new Server and now wie have osd pool default size = 3. At this Point i tried again to repair the with ceph pg rair and ceph pg deep-srub. I tried to delete again rados

Re: [ceph-users] Ceph cluster in error state (full) with raw usage 32% of total capacity

2017-08-10 Thread Mandar Naik
Hi Peter, Thanks a lot for the reply. Please find 'ceph osd df' output here - # ceph osd df ID WEIGHT REWEIGHT SIZE USEAVAIL %USE VAR PGS 2 0.04399 1.0 46056M 35576k 46021M 0.08 0.00 0 1 0.04399 1.0 46056M 40148k 46017M 0.09 0.00 384 0 0.04399 1.0 46056M 43851M

Re: [ceph-users] Ceph cluster in error state (full) with raw usage 32% of total capacity

2017-08-10 Thread Peter Maloney
I think a `ceph osd df` would be useful. And how did you set up such a cluster? I don't see a root, and you have each osd in there more than once...is that even possible? On 08/10/17 08:46, Mandar Naik wrote: > * > > Hi, > > I am evaluating ceph cluster for a solution where ceph could be used >

[ceph-users] Ceph cluster in error state (full) with raw usage 32% of total capacity

2017-08-10 Thread Mandar Naik
*Hi,I am evaluating ceph cluster for a solution where ceph could be used for provisioningpools which could be either stored local to a node or replicated across a cluster. This way ceph could be used as single point of solution for writing both local as well as replicateddata. Local storage helps

Re: [ceph-users] ceph cluster experiencing major performance issues

2017-08-08 Thread Nick Fisk
@sony.com>; Payno, > Victor <victor.pa...@sony.com>; Yip, Rae <rae....@sony.com> > Subject: Re: [ceph-users] ceph cluster experiencing major performance issues > > On 08/08/17 10:50 AM, David Turner wrote: > > Are you also seeing osds marking themselves down for a litt

Re: [ceph-users] ceph cluster experiencing major performance issues

2017-08-08 Thread Mclean, Patrick
On 08/08/17 10:50 AM, David Turner wrote: > Are you also seeing osds marking themselves down for a little bit and > then coming back up? There are 2 very likely problems > causing/contributing to this. The first is if you are using a lot of > snapshots. Deleting snapshots is a very expensive

Re: [ceph-users] ceph cluster experiencing major performance issues

2017-08-08 Thread David Turner
Are you also seeing osds marking themselves down for a little bit and then coming back up? There are 2 very likely problems causing/contributing to this. The first is if you are using a lot of snapshots. Deleting snapshots is a very expensive operation for your cluster and can cause a lot of

[ceph-users] ceph cluster experiencing major performance issues

2017-08-07 Thread Mclean, Patrick
High CPU utilization and inexplicably slow I/O requests We have been having similar performance issues across several ceph clusters. When all the OSDs are up in the cluster, it can stay HEALTH_OK for a while, but eventually performance worsens and becomes (at first intermittently, but eventually

  1   2   3   >