Re: [ceph-users] about rgw region sync

2015-05-12 Thread Craig Lewis
Are you trying to setup replication on one cluster right now? Generally replication is setup between two different clusters, each having one zone. Both clusters are in the same region. I can't think of a reason why two zones in one cluster wouldn't work. It's more complicated to setup though.

Re: [ceph-users] about rgw region sync

2015-05-06 Thread Craig Lewis
System users are the only ones that need to be created in both zones. Non-system users (and their sub-users) should be created in the primary zone. radosgw-agent will replicate them to the secondary zone. I didn't create sub-users for my system users, but I don't think it matters. I can read my

Re: [ceph-users] How to backup hundreds or thousands of TB

2015-05-06 Thread Craig Lewis
This is an older post of mine on this topic: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-April/038484.html. The only thing that's changed since then is that Hammer now supports RadosGW object versioning. A combination of RadosGW replication, versioning, and access control meets my ne

Re: [ceph-users] RadosGW - Hardware recomendations

2015-05-06 Thread Craig Lewis
RadosGW is pretty light compared to the rest of Ceph, but it depends on your use case. RadosGW just needs network bandwidth and a bit of CPU. It doesn't access the cluster network, just the public network. If you have some spare public network bandwidth, you can run on existing nodes. If you p

Re: [ceph-users] Ceph Radosgw multi zone data replication failure

2015-04-27 Thread Craig Lewis
> [root@us-east-1 ceph]# ceph -s --name client.radosgw.us-east-1 > [root@us-east-1 ceph]# ceph -s --name client.radosgw.us-west-1 Are you trying to setup two zones on one cluster? That's possible, but you'll also want to spend some time on your CRUSH map making sure that the two zones are as ind

Re: [ceph-users] cluster not coming up after reboot

2015-04-23 Thread Craig Lewis
On Thu, Apr 23, 2015 at 5:20 AM, Kenneth Waegeman > > So it is all fixed now, but is it explainable that at first about 90% of > the OSDS going into shutdown over and over, and only after some time got in > a stable situation, because of one host network failure ? > > Thanks again! Yes, unless yo

Re: [ceph-users] Odp.: Odp.: CEPH 1 pgs incomplete

2015-04-22 Thread Craig Lewis
> I try out some osd and add to my cluster but recovery after this things > don't rebuild my cluster. > > > -- > *Od:* Craig Lewis > *Wysłane:* 22 kwietnia 2015 20:40 > *Do:* MEGATEL / Rafał Gawron > *Temat:* Re: Odp.: [ceph-users] CEPH 1

Re: [ceph-users] unbalanced OSDs

2015-04-22 Thread Craig Lewis
ceph osd reweight-by-utilization needs another argument to do something. The recommended starting value is 120. Run it again with lower and lower values until you're happy. The value is a percentage, and I'm not sure what happens if you go below 100. If you get into trouble with this (too much

Re: [ceph-users] What is a "dirty" object

2015-04-20 Thread Craig Lewis
On Mon, Apr 20, 2015 at 3:38 AM, John Spray wrote: > > I hadn't noticed that we presented this as nonzero for regular pools > before, it is a bit weird. Perhaps we should show zero here instead for > non-cache-tier pools. > > I have always planned to add a cold EC tier later, once my cluster was

Re: [ceph-users] Managing larger ceph clusters

2015-04-17 Thread Craig Lewis
I'm running a small cluster, but I'll chime in since nobody else has. Cern had a presentation a while ago (dumpling time-frame) about their deployment. They go over some of your questions: http://www.slideshare.net/Inktank_Ceph/scaling-ceph-at-cern My philosophy on Config Management is that it s

Re: [ceph-users] many slow requests on different osds (scrubbing disabled)

2015-04-17 Thread Craig Lewis
I've seen something like this a few times. Once, I lost the battery in my battery backed RAID card. That caused all the OSDs on that host to be slow, which triggered slow request notices pretty much cluster wide. It was only when I histogrammed the slow request notices that I saw most of them we

Re: [ceph-users] Recovering incomplete PGs with ceph_objectstore_tool

2015-04-06 Thread Craig Lewis
In that case, I'd set the crush weight to the disk's size in TiB, and mark the osd out: ceph osd crush reweight osd. ceph osd out Then your tree should look like: -9 *2.72* host ithome 30 *2.72* osd.30 up *0* An OSD can be UP and OUT, which ca

Re: [ceph-users] Rebalance after empty bucket addition

2015-04-06 Thread Craig Lewis
Yes, it's expected. The crush map contains the inputs to the CRUSH hashing algorithm. Every change made to the crush map causes the hashing algorithm to behave slightly differently. It is consistent though. If you removed the new bucket, it would go back to the way it was before you made the ch

Re: [ceph-users] Radosgw multi-region user creation question

2015-04-02 Thread Craig Lewis
You need to create both system users in both zones, with the same access and secret keys. The replication process needs these users to do the replication. Location support isn't currently supported... I think that's targeted for Hammer maybe? http://ceph.com/docs/master/release-notes/ indicates

Re: [ceph-users] Error DATE 1970

2015-04-02 Thread Craig Lewis
No, but I've seen it in RadosGW too. I've been meaning to post about it. I get about ten a day, out of about 50k objects/day. clewis@clewis-mac ~ (-) $ s3cmd ls s3://live32/ | grep '1970-01' | head -1 1970-01-01 00:00 0 s3://live-32/39020f17716a18b39efd8daa96e8245eb2901f353ba1004e724cb56

Re: [ceph-users] Production Ceph :: PG data lost : Cluster PG incomplete, inactive, unclean

2015-04-01 Thread Craig Lewis
[117,118,177] 117 > 33538'376 2015-03-12 13:51:03.984454 28394'62 2015-03-11 13:50:58.196288* > > > [root@pouta-s04 current]# ceph pg map 3.7d0 > osdmap e262813 pg 3.7d0 (3.7d0) -> up [117,118,177] acting [117,118,177] > [root@pouta-s04 current]# > > > *D

Re: [ceph-users] Question Blackout

2015-03-20 Thread Craig Lewis
I'm not a CephFS user, but I have had a few cluster outages. Each OSD has a journal, and Ceph ensures that a write is in all of the journals (primary and replicas) before it acknowledges the write. If an OSD process crashes, it replays the journal on startup, and recovers the write. I've lost po

Re: [ceph-users] Uneven CPU usage on OSD nodes

2015-03-20 Thread Craig Lewis
I would say you're a little light on RAM. With 4TB disks 70% full, I've seen some ceph-osd processes using 3.5GB of RAM during recovery. You'll be fine during normal operation, but you might run into issues at the worst possible time. I have 8 OSDs per node, and 32G of RAM. I've had ceph-osd pr

Re: [ceph-users] RADOS Gateway Maturity

2015-03-20 Thread Craig Lewis
I have found a few incompatibilities, but so far they're all on the Ceph side. One example I remember was having to change the way we delete objects. The function we originally used fetches a list of object versions, and deletes all versions. Ceph is implementing objects versions now (I believe

Re: [ceph-users] Ceiling on number of PGs in a OSD

2015-03-20 Thread Craig Lewis
This isn't a hard limit on the number, but it's recommended that you keep it around 100. Smaller values cause data distribution evenness problems. Larger values cause the OSD processes to use more CPU, RAM, and file descriptors, particularly during recovery. With that many OSDs, you're going to w

Re: [ceph-users] Production Ceph :: PG data lost : Cluster PG incomplete, inactive, unclean

2015-03-20 Thread Craig Lewis
> osdmap e261536: 239 osds: 239 up, 238 in Why is that last OSD not IN? The history you need is probably there. Run ceph pg query on some of the stuck PGs. Look for the recovery_state section. That should tell you what Ceph needs to complete the recovery. If you need more help, post the ou

Re: [ceph-users] PGs issue

2015-03-20 Thread Craig Lewis
This seems to be a fairly consistent problem for new users. The create-or-move is adjusting the crush weight, not the osd weight. Perhaps the init script should set the defaultweight to 0.01 if it's <= 0? It seems like there's a downside to this, but I don't see it. On Fri, Mar 20, 2015 at 1

Re: [ceph-users] Shadow files

2015-03-16 Thread Craig Lewis
Out of curiousity, what's the frequency of the peaks and troughs? RadosGW has configs on how long it should wait after deleting before garbage collecting, how long between GC runs, and how many objects it can GC in per run. The defaults are 2 hours, 1 hour, and 32 respectively. Search http://doc

Re: [ceph-users] query about mapping of Swift/S3 APIs to Ceph cluster APIs

2015-03-16 Thread Craig Lewis
On Sat, Mar 14, 2015 at 3:04 AM, pragya jain wrote: > Hello all! > > I am working on Ceph object storage architecture from last few months. > > I am unable to search a document which can describe how Ceph object > storage APIs (Swift/S3 APIs) are mappedd with Ceph storage cluster APIs > (librado

Re: [ceph-users] RadosGW Direct Upload Limitation

2015-03-16 Thread Craig Lewis
> > > Maybe, but I'm not sure if Yehuda would want to take it upstream or > not. This limit is present because it's part of the S3 spec. For > larger objects you should use multi-part upload, which can get much > bigger. > -Greg > > Note that the multi-part upload has a lower limit of 4MiB per part

Re: [ceph-users] PGs stuck unclean "active+remapped" after an osd marked out

2015-03-16 Thread Craig Lewis
> > > If I remember/guess correctly, if you mark an OSD out it won't > necessarily change the weight of the bucket above it (ie, the host), > whereas if you change the weight of the OSD then the host bucket's > weight changes. > -Greg That sounds right. Marking an OSD out is a ceph osd reweight

Re: [ceph-users] Mapping users to different rgw pools

2015-03-16 Thread Craig Lewis
15, 2015 at 11:53 PM, Sreenath BH wrote: > Thanks. > > Is this possible outside of multi-zone setup. (With only one Zone)? > > For example, I want to have pools with different replication > factors(or erasure codings) and map users to these pools. > > -Sreenath > >

Re: [ceph-users] Mapping users to different rgw pools

2015-03-13 Thread Craig Lewis
Yes, RadosGW has the concept of Placement Targets and Placement Pools. You can create a target, and point it a set of RADOS pools. Those pools can be configured to use different storage strategies by creating different crushmap rules, and assigning those rules to the pool. RGW users can be assig

Re: [ceph-users] Can not list objects in large bucket

2015-03-13 Thread Craig Lewis
By default, radosgw only returns the first 1000 objects. Looks like radosgw-admin has the same limit. Looking at the man page, I don't see any way to page through the list. I must be missing something. The S3 API does have the ability to page through the list. I use the command line tool s3cm

Re: [ceph-users] CEPH Expansion

2015-01-23 Thread Craig Lewis
You've either modified the crushmap, or changed the pool size to 1. The defaults create 3 replicas on different hosts. What does `ceph osd dump | grep ^pool` output? If the size param is 1, then you reduced the replica count. If the size param is > 1, you must've adjusted the crushmap. Either

Re: [ceph-users] CEPH Expansion

2015-01-23 Thread Craig Lewis
It depends. There are a lot of variables, like how many nodes and disks you currently have. Are you using journals on SSD. How much data is already in the cluster. What the client load is on the cluster. Since you only have 40 GB in the cluster, it shouldn't take long to backfill. You may fin

Re: [ceph-users] backfill_toofull, but OSDs not full

2015-01-09 Thread Craig Lewis
What was the osd_backfill_full_ratio? That's the config that controls backfill_toofull. By default, it's 85%. The mon_osd_*_ratio affect the ceph status. I've noticed that it takes a while for backfilling to restart after changing osd_backfill_full_ratio. Backfilling usually restarts for me in

Re: [ceph-users] Slow/Hung IOs

2015-01-09 Thread Craig Lewis
I doesn't seem like the problem here, but I've noticed that slow OSDs have a large fan-out. I have less than 100 OSDs, so every OSD talks to every other OSD in my cluster. I was getting slow notices from all of my OSDs. Nothing jumped out, so I started looking at disk write latency graphs. I no

Re: [ceph-users] Different disk usage on different OSDs

2015-01-08 Thread Craig Lewis
The short answer is that uniform distribution is a lower priority feature of the CRUSH hashing algorithm. CRUSH is designed to be consistent and stable in it's hashing. For the details, you can read Sage's paper ( http://ceph.com/papers/weil-rados-pdsw07.pdf). The goal is that if you make a chan

Re: [ceph-users] Ceph PG Incomplete = Cluster unusable

2015-01-07 Thread Craig Lewis
On Mon, Dec 29, 2014 at 4:49 PM, Alexandre Oliva wrote: > However, I suspect that temporarily setting min size to a lower number > could be enough for the PGs to recover. If "ceph osd pool set > min_size 1" doesn't get the PGs going, I suppose restarting at least one > of the OSDs involved in t

Re: [ceph-users] Any Good Ceph Web Interfaces?

2014-12-23 Thread Craig Lewis
Are you asking because you want to manage a Ceph cluster point and click? Or do you need some shiny to show the boss? I'm using a combination of Chef and Zabbix. I'm not running RHEL though, but I would assume those are available in the repos. It's not as slick as Calamari, and it really doesn'

Re: [ceph-users] Behaviour of a cluster with full OSD(s)

2014-12-23 Thread Craig Lewis
On Tue, Dec 23, 2014 at 3:34 AM, Max Power < mailli...@ferienwohnung-altenbeken.de> wrote: > I understand that the status "osd full" should never be reached. As I am > new to > ceph I want to be prepared for this case. I tried two different scenarios > and > here are my experiences: > For a real

Re: [ceph-users] 1256 OSD/21 server ceph cluster performance issues.

2014-12-22 Thread Craig Lewis
On Mon, Dec 22, 2014 at 2:57 PM, Sean Sullivan wrote: > Thanks Craig! > > I think that this may very well be my issue with osds dropping out but I > am still not certain as I had the cluster up for a small period while > running rados bench for a few days without any status changes. > Mine were

Re: [ceph-users] ceph-deploy & state of documentation [was: OSD & JOURNAL not associated - ceph-disk list ?]

2014-12-22 Thread Craig Lewis
I get the impression that more people on the ML are using a config management system. ceph-deploy questions seem to come from new users following the quick start guide. I know both Puppet and Chef are fairly well represented here. I've seen a few posts about Salt and Ansible, but not much. Cala

Re: [ceph-users] Have 2 different public networks

2014-12-19 Thread Craig Lewis
On Fri, Dec 19, 2014 at 6:19 PM, Francois Lafont wrote: > > > So, indeed, I have to use routing *or* maybe create 2 monitors > by server like this: > > [mon.node1-public1] > host = ceph-node1 > mon addr = 10.0.1.1 > > [mon.node1-public2] > host = ceph-node1 > mon addr = 10.

Re: [ceph-users] Have 2 different public networks

2014-12-19 Thread Craig Lewis
On Fri, Dec 19, 2014 at 4:03 PM, Francois Lafont wrote: > > Le 19/12/2014 19:17, Craig Lewis a écrit : > > > I'm not using mon addr lines, and my ceph-mon daemons are bound to > 0.0.0.0:*. > > And do you have several IP addresses on your server? > Can you contact

Re: [ceph-users] Placement groups stuck inactive after down & out of 1/9 OSDs

2014-12-19 Thread Craig Lewis
min_size 1 crush_ruleset 0 object_hash > rjenkins pg_num 1024 pgp_num 1024 last_change 187 flags hashpspool > stripe_width 0 > root@ceph25:~# ceph pg dump_stuck > ok > > > The more I think about this problem, the less I think there'll be an easy > answer, and it's

Re: [ceph-users] Placement groups stuck inactive after down & out of 1/9 OSDs

2014-12-19 Thread Craig Lewis
That seems odd. So you have 3 nodes, with 3 OSDs each. You should've been able to mark osd.0 down and out, then stop the daemon without having those issues. It's generally best to mark an osd down, then out, and wait until the cluster has recovered completely before stopping the daemon and remov

Re: [ceph-users] Recovering from PG in down+incomplete state

2014-12-19 Thread Craig Lewis
Why did you remove osd.7? Something else appears to be wrong. With all 11 OSDs up, you shouldn't have any PGs stuck in stale or peering. How badly are the clocks skewed between nodes? If it's bad enough, it can cause communication problems between nodes. Ceph will complain if the clocks are m

Re: [ceph-users] Need help from Ceph experts

2014-12-19 Thread Craig Lewis
fer to install Ceph in my local > server. > > Again thanks guys !! > > Kind Regards > Debashish Das > > On Fri, Dec 19, 2014 at 6:08 AM, Robert LeBlanc > wrote: >> >> Thanks, I'll look into these. >> >> On Thu, Dec 18, 2014 at 5:12 PM, Craig Le

Re: [ceph-users] Have 2 different public networks

2014-12-19 Thread Craig Lewis
On Thu, Dec 18, 2014 at 10:47 PM, Francois Lafont wrote: > > Le 19/12/2014 02:18, Craig Lewis a écrit : > > The daemons bind to *, > > Yes but *only* for the OSD daemon. Am I wrong? > > Personally I must provide IP addresses for the monitors > in the /etc/ceph/ceph.c

Re: [ceph-users] Have 2 different public networks

2014-12-18 Thread Craig Lewis
The daemons bind to *, so adding the 3rd interface to the machine will allow you to talk to the daemons on that IP. I'm not really sure how you'd setup the management network though. I'd start by setting the ceph.conf public network on the management nodes to have the public network 10.0.2.0/24,

Re: [ceph-users] Need help from Ceph experts

2014-12-18 Thread Craig Lewis
gt; > Thanks, > Robert LeBlanc > > On Thu, Dec 18, 2014 at 3:43 PM, Craig Lewis > wrote: > >> >> >> On Thu, Dec 18, 2014 at 5:16 AM, Patrick McGarry >> wrote: >>> >>> >>> > 2. What should be the minimum hardware requirement of

Re: [ceph-users] Need help from Ceph experts

2014-12-18 Thread Craig Lewis
On Thu, Dec 18, 2014 at 5:16 AM, Patrick McGarry wrote: > > > > 2. What should be the minimum hardware requirement of the server (CPU, > > Memory, NIC etc) > > There is no real "minimum" to run Ceph, it's all about what your > workload will look like and what kind of performance you need. We have

Re: [ceph-users] OSD Crash makes whole cluster unusable ?

2014-12-16 Thread Craig Lewis
So the problem started once remapping+backfilling started, and lasted until the cluster was healthy again? Have you adjusted any of the recovery tunables? Are you using SSD journals? I had a similar experience the first time my OSDs started backfilling. The average RadosGW operation latency wen

Re: [ceph-users] Test 6

2014-12-16 Thread Craig Lewis
I always wondered why my posts didn't show up until somebody replied to them. I thought it was my filters. Thanks! On Mon, Dec 15, 2014 at 10:57 PM, Leen de Braal wrote: > > If you are trying to see if your mails come through, don't check on the > list. You have a gmail account, gmail removes m

Re: [ceph-users] Dual RADOSGW Network

2014-12-16 Thread Craig Lewis
You may need split horizon DNS. The internal machines' DNS should resolve to the internal IP, and the external machines' DNS should resolve to the external IP. There are various ways to do that. The RadosGW config has an example of setting up Dnsmasq: http://ceph.com/docs/master/radosgw/config/#

Re: [ceph-users] Multiple issues :( Ubuntu 14.04, latest Ceph

2014-12-15 Thread Craig Lewis
On Sun, Dec 14, 2014 at 6:31 PM, Benjamin wrote: > > The machines each have Ubuntu 14.04 64-bit, with 1GB of RAM and 8GB of > disk. They have between 10% and 30% disk utilization but common between all > of them is that they *have free disk space* meaning I have no idea what > the heck is causing

Re: [ceph-users] Number of SSD for OSD journal

2014-12-15 Thread Craig Lewis
I was going with a low perf scenario, and I still ended up adding SSDs. Everything was fine in my 3 node cluster, until I wanted to add more nodes. Admittedly, I was a bit aggressive with the expansion. I added a whole node at once, rather than one or two disks at a time. Still, I wasn't expect

Re: [ceph-users] Dual RADOSGW Network

2014-12-15 Thread Craig Lewis
That shouldn't be a problem. Just have Apache bind to all interfaces instead of the external IP. In my case, I only have Apache bound to the internal interface. My load balancer has an external and internal IP, and I'm able to talk to it on both interfaces. On Mon, Dec 15, 2014 at 2:00 PM, Geor

Re: [ceph-users] my cluster has only rbd pool

2014-12-15 Thread Craig Lewis
If you're running Ceph 0.88 or newer, only the rdb pool is created by default now. Greg Farnum mentioned that the docs are out of date there. On Sat, Dec 13, 2014 at 8:25 PM, wang lin wrote: > > Hi All > I set up my first ceph cluster according to instructions in >

Re: [ceph-users] active+degraded on an empty new cluster

2014-12-09 Thread Craig Lewis
When I first created a test cluster, I used 1 GiB disks. That causes problems. Ceph has a CRUSH weight. By default, the weight is the size of the disk in TiB, truncated to 2 decimal places. ie, any disk smaller than 10 GiB will have a weight of 0.00. I increased all of my virtual disks to 10 G

Re: [ceph-users] Scrub while cluster re-balancing

2014-12-03 Thread Craig Lewis
0 0 11238999 44955958 11259655 45038593 > total used 18004641796 1463173 > total avail32330689516 > total space50335331312 > ems@rack6-ramp-4:~$ > > -Thanks & regards, > Mallikarjun Biradar > > On Wed, Dec 3, 2014 at

Re: [ceph-users] Rebuild OSD's

2014-12-02 Thread Craig Lewis
You have a total of 2 OSDs, and 2 disks, right? The safe method is to mark one OSD out, and wait for the cluster to heal. Delete, reformat, add it back to the cluster, and wait for the cluster to heal. Repeat. But that only works when you have enough OSDs that the cluster can heal. So you'll ha

Re: [ceph-users] Scrub while cluster re-balancing

2014-12-02 Thread Craig Lewis
, > Mallikarjun Biradar > On 3 Dec 2014 00:15, "Craig Lewis" wrote: > >> You mean `ceph -w` and `ceph -s` didn't show any PGs in >> the active+clean+scrubbing state while pool 2's PGs were being scrubbed? >> >> I see that happen with my really s

Re: [ceph-users] Slow Requests when taking down OSD Node

2014-12-02 Thread Craig Lewis
setting "ceph osd set noout" I do a "service ceph stop osd.51" > and as soon as I do this I get growing numbers (200) of slow requests, > although there is not a big load on my cluster. > > Christoph > > On Tue, Dec 02, 2014 at 10:40:13AM -0800, Craig Lewi

Re: [ceph-users] Removing Snapshots Killing Cluster Performance

2014-12-02 Thread Craig Lewis
On Mon, Dec 1, 2014 at 1:51 AM, Daniel Schneller < daniel.schnel...@centerdevice.com> wrote: > > I could not find any way to throttle the background deletion activity > > (the command returns almost immediately). > I'm only aware of osd snap trim sleep. I haven't tried this since my Firefly upgr

Re: [ceph-users] Scrub while cluster re-balancing

2014-12-02 Thread Craig Lewis
You mean `ceph -w` and `ceph -s` didn't show any PGs in the active+clean+scrubbing state while pool 2's PGs were being scrubbed? I see that happen with my really small pools. I have a bunch of RadosGW pools that contain <5 objects, and ~1kB of data. When I scrub the PGs in those pools, they comp

Re: [ceph-users] Slow Requests when taking down OSD Node

2014-12-02 Thread Craig Lewis
I've found that it helps to shut down the osds before shutting down the host. Especially if the node is also a monitor. It seems that some OSD shutdown messages get lost while monitors are holding elections. On Tue, Dec 2, 2014 at 10:10 AM, Christoph Adomeit < christoph.adom...@gatworks.de> wrot

Re: [ceph-users] Optimal or recommended threads values

2014-12-01 Thread Craig Lewis
I'm still using the default values, mostly because I haven't had time to test. On Thu, Nov 27, 2014 at 2:44 AM, Andrei Mikhailovsky wrote: > Hi Craig, > > Are you keeping the filestore, disk and op threads at their default > values? or did you also change them? > > Cheers > > > Tuning these valu

Re: [ceph-users] Create OSD on ZFS Mount (firefly)

2014-11-25 Thread Craig Lewis
There was a good thread on the mailing list a little while ago. There were several recommendations in that thread, maybe some of them will help. Found it: https://www.mail-archive.com/ceph-users@lists.ceph.com/msg14154.html On Tue, Nov 25, 2014 at 4:16 AM, Lindsay Mathieson < lindsay.mathie...@

Re: [ceph-users] private network - VLAN vs separate switch

2014-11-25 Thread Craig Lewis
It's mostly about bandwidth. With VLANs, the public and cluster networks are going to be sharing the inter-switch links. For a cluster that size, I don't see much advantage to the VLANs. You'll save a few ports by having the inter-switch links shared, at the expense of contention on those links.

Re: [ceph-users] Tip of the week: don't use Intel 530 SSD's for journals

2014-11-25 Thread Craig Lewis
I have suffered power losses in every data center I've been in. I have lost SSDs because of it (Intel 320 Series). The worst time, I lost both SSDs in a RAID1. That was a bad day. I'm using the Intel DC S3700 now, so I don't have a repeat. My cluster is small enough that losing a journal SSD w

Re: [ceph-users] Negative number of objects degraded for extended period of time

2014-11-24 Thread Craig Lewis
ection works and how to disable garbage collector for some of the > radosgw to prove that's the issue. > > Yehuda Sadeh mentioned back in 2012 that ""we may also want to explore > doing that as part of a bigger garbage collection scheme that we'll soon be > working o

Re: [ceph-users] Optimal or recommended threads values

2014-11-24 Thread Craig Lewis
Tuning these values depends on a lot more than just the SSDs and HDDs. Which kernel and IO scheduler are you using? Does your HBA do write caching? It also depends on what your goals are. Tuning for a RadosGW cluster is different that for a RDB cluster. The short answer is that you are the only

Re: [ceph-users] Regarding Federated Gateways - Zone Sync Issues

2014-11-24 Thread Craig Lewis
933617 7f73b07c0700 20 get_obj_state: > rctx=0x7f73dc006b30 obj=.us-west.users.uid:east-user state=0x7f73dc006498 > s->prefetch_data=0 > 2014-11-22 14:19:21.933620 7f73b07c0700 20 state for > obj=.us-west.users.uid:east-user is not atomic, not appending atomic test > 2014-11

Re: [ceph-users] pg's degraded

2014-11-20 Thread Craig Lewis
uild the cluster altogether. > > —Jiten > > On Nov 20, 2014, at 1:40 PM, Craig Lewis > wrote: > > So you have your crushmap set to choose osd instead of choose host? > > Did you wait for the cluster to recover between each OSD rebuild? If you > rebuilt all 3 OSDs at

Re: [ceph-users] pg's degraded

2014-11-20 Thread Craig Lewis
-cephosd004 > 3 0.0 osd.3 up 1 > > > [jshah@Lab-cephmon001 ~]$ ceph pg 2.33 query > Error ENOENT: i don't have paid 2.33 > > —Jiten > > > On Nov 20, 2014, at 11:18 AM, Craig Lewis > wrote: > > Just to be clear, this is from a cluster that was healthy, had a

Re: [ceph-users] pg's degraded

2014-11-20 Thread Craig Lewis
Just to be clear, this is from a cluster that was healthy, had a disk replaced, and hasn't returned to healthy? It's not a new cluster that has never been healthy, right? Assuming it's an existing cluster, how many OSDs did you replace? It almost looks like you replaced multiple OSDs at the same

Re: [ceph-users] Regarding Federated Gateways - Zone Sync Issues

2014-11-20 Thread Craig Lewis
You need to create two system users, in both zones. They should have the same name, access key, and secret in both zones. By convention, these system users are named the same as the zones. You shouldn't use those system users for anything other than replication. You should create a non-system us

Re: [ceph-users] osd crashed while there was no space

2014-11-18 Thread Craig Lewis
ould choose to migrate? And in > the migrating, other > OSDs will crashed one by one until the cluster could not work. > > 2014-11-18 5:28 GMT+08:00 Craig Lewis : > > At this point, it's probably best to delete the pool. I'm assuming the > pool > > only contains b

Re: [ceph-users] OSD commits suicide

2014-11-18 Thread Craig Lewis
single node, it takes 4-5 days to drain, format, and backfill. That was months ago, and I'm still dealing with the side effects. I'm not eager to try again. On Mon, Nov 17, 2014 at 2:04 PM, Andrey Korolyov wrote: > On Tue, Nov 18, 2014 at 12:54 AM, Craig Lewis > wrote:

Re: [ceph-users] Negative number of objects degraded for extended period of time

2014-11-17 Thread Craig Lewis
Well, after 4 days, this is probably moot. Hopefully it's finished backfilling, and your problem is gone. If not, I believe that if you fix those backfill_toofull, the negative numbers will start approaching zero. I seem to recall that negative degraded is a special case of degraded, but I don't

Re: [ceph-users] Deep scrub parameter tuning

2014-11-17 Thread Craig Lewis
The minimum value for osd_deep_scrub_interval is osd_scrub_min_interval, and it wouldn't be advisable to go that low. I can't find the documentation, but basically Ceph will attempt a scrub sometime between osd_scrub_min_interval and osd_scrub_max_interval. If the PG hasn't been deep-scrubbed in

Re: [ceph-users] OSD commits suicide

2014-11-17 Thread Craig Lewis
ported by Dmitry Smirnov 26 days ago, but the report has no > response yet. Any ideas? > > In my experience, OSD's are quite unstable in Giant and very easily > stressed, causing chain effects, further worsening the issues. It would > be nice to know if this is also noticed by oth

Re: [ceph-users] jbod + SMART : how to identify failing disks ?

2014-11-17 Thread Craig Lewis
I use `dd` to force activity to the disk I want to replace, and watch the activity lights. That only works if your disks aren't 100% busy. If they are, stop the ceph-osd daemon, and see which drive stops having activity. Repeat until you're 100% confident that you're pulling the right drive. On

Re: [ceph-users] osd crashed while there was no space

2014-11-17 Thread Craig Lewis
At this point, it's probably best to delete the pool. I'm assuming the pool only contains benchmark data, and nothing important. Assuming you can delete the pool: First, figure out the ID of the data pool. You can get that from ceph osd dump | grep '^pool' Once you have the number, delete the d

Re: [ceph-users] OSDs down

2014-11-17 Thread Craig Lewis
Firstly, any chance of getting node4 and node5 back up? You can move the disks (monitor and osd) to a new chasis, and bring it back up. As long as it has the same IP as the original node4 and node5, the monitor should join. How much is the clock skewed on node2? I haven't had problems with smal

Re: [ceph-users] Federated gateways

2014-11-14 Thread Craig Lewis
cluster, but I don’t know > what to think anymore. > > Also do both users need to be system users on both ends? > > Aaron > > > > On Nov 12, 2014, at 4:00 PM, Craig Lewis > wrote: > > http://tracker.ceph.com/issues/9206 > > My post to the ML: http://www.spinics

Re: [ceph-users] Federated gateways

2014-11-12 Thread Craig Lewis
ett wrote: > In playing around with this a bit more, I noticed that the two users on > the secondary node cant see each others buckets. Is this a problem? > IIRC, the system user couldn't see each other's buckets, but they could read and write the objects. > On Nov 11, 2014,

Re: [ceph-users] Federated gateways

2014-11-11 Thread Craig Lewis
> > I see you're running 0.80.5. Are you using Apache 2.4? There is a known > issue with Apache 2.4 on the primary and replication. It's fixed, just > waiting for the next firefly release. Although, that causes 40x errors > with Apache 2.4, not 500 errors. > > It is apache 2.4, but I’m actually

Re: [ceph-users] pg's stuck for 4-5 days after reaching backfill_toofull

2014-11-11 Thread Craig Lewis
How many OSDs are nearfull? I've seen Ceph want two toofull OSDs to swap PGs. In that case, I dynamically raised mon_osd_nearfull_ratio and osd_backfill_full_ratio a bit, then put it back to normal once the scheduling deadlock finished. Keep in mind that ceph osd reweight is temporary. If you m

Re: [ceph-users] Federated gateways

2014-11-11 Thread Craig Lewis
cs=1 l=1 c=0x7f53f00053f0).writer: state = open policy.server=0 > 2014-11-11 14:37:06.701728 7f51ff1f1700 20 -- 172.16.10.103:0/1007381 >> > 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 > cs=1 l=1 c=0x7f53f00053f0).writer sleeping > 2014-11-11 14:37:06.70

Re: [ceph-users] emperor -> firefly 0.80.7 upgrade problem

2014-11-10 Thread Craig Lewis
I had the same experience with force_create_pg too. I ran it, and the PGs sat there in creating state. I left the cluster overnight, and sometime in the middle of the night, they created. The actual transition from creating to active+clean happened during the recovery after a single OSD was kick

Re: [ceph-users] osd down

2014-11-10 Thread Craig Lewis
e that is why I am asking. > > Anyway...thanks again for all the help. > > Shain > > Sent from my iPhone > > On Nov 7, 2014, at 2:09 PM, Craig Lewis wrote: > > I'd stop that osd daemon, and run xfs_check / xfs_repair on that > partition. > > If you repair

Re: [ceph-users] Stuck in stale state

2014-11-10 Thread Craig Lewis
"nothing to send, going to standby" isn't necessarily bad, I see it from time to time. It shouldn't stay like that for long though. If it's been 5 minutes, and the cluster still isn't doing anything, I'd restart that osd. On Fri, Nov 7, 2014 at 1:55 PM, Jan Pekař wrote: > Hi, > > I was testing

Re: [ceph-users] How to remove hung object

2014-11-10 Thread Craig Lewis
Do you have any OSDs that are offline that you can bring back online? ceph pg query 6.9d8 should tell you. At the bottom, there is a section with down_osds_we_would_probe. Focus on getting those OSDs back up. On Sat, Nov 8, 2014 at 11:13 PM, Tuân Tạ Bá wrote: > > Hi all, > > I want to remov

Re: [ceph-users] An OSD always crash few minutes after start

2014-11-10 Thread Craig Lewis
You're running 0.87-6. There were various fixes for this problem in Firefly. Were any of these snapshots created on early version of Firefly? So far, every fix for this issue has gotten developers involved. I'd see if you can talk to some devs on IRC, or post to the ceph-devel mailing list. M

Re: [ceph-users] OSD commits suicide

2014-11-10 Thread Craig Lewis
Have you tuned any of the recovery or backfill parameters? My ceph.conf has: [osd] osd max backfills = 1 osd recovery max active = 1 osd recovery op priority = 1 Still, if it's running for a few hours, then failing, it sounds like there might be something else at play. OSDs use a lot of RA

Re: [ceph-users] PG inconsistency

2014-11-10 Thread Craig Lewis
For #1, it depends what you mean by fast. I wouldn't worry about it taking 15 minutes. If you mark the old OSD out, ceph will start remapping data immediately, including a bunch of PGs on unrelated OSDs. Once you replace the disk, and put the same OSDID back in the same host, the CRUSH map will

Re: [ceph-users] emperor -> firefly 0.80.7 upgrade problem

2014-11-10 Thread Craig Lewis
If all of your PGs now have an empty down_osds_we_would_probe, I'd run through this discussion again. The commands to tell Ceph to give up on lost data should have an effect now. That's my experience anyway. Nothing progressed until I took care of down_osds_we_would_probe. After that was empty,

Re: [ceph-users] emperor -> firefly 0.80.7 upgrade problem

2014-11-07 Thread Craig Lewis
ceph-disk-prepare will give you the next unused number. So this will work only if the osd you remove is greater than 20. On Thu, Nov 6, 2014 at 12:12 PM, Chad Seys wrote: > Hi Craig, > > > You'll have trouble until osd.20 exists again. > > > > Ceph really does not want to lose data. Even if yo

Re: [ceph-users] osd down

2014-11-07 Thread Craig Lewis
I'd stop that osd daemon, and run xfs_check / xfs_repair on that partition. If you repair anything, you should probably force a deep-scrub on all the PGs on that disk. I think ceph osd deep-scrub will do that, but you might have to manually grep ceph pg dump . Or you could just treat it like a

Re: [ceph-users] Is it normal that osd's memory exceed 1GB under stresstest?

2014-11-07 Thread Craig Lewis
It depends on which version of ceph, but it's pretty normal under newer versions. There are a bunch of variables. How many PGs per OSD, how much data is in the PGs, etc. I'm a bit light on the PGs (~60 PGs per OSD), and heavy on the data (~3 TiB of data on each OSD). In the production cluster,

Re: [ceph-users] buckets and users

2014-11-07 Thread Craig Lewis
hought it was using the > default region, so I didn't have to create extra regions. > Let me try to figure this out, the docs are a little bit confusing. > > Marco Garcês > > > > On Thu, Nov 6, 2014 at 6:39 PM, Craig Lewis > wrote: > > You need to tell each

  1   2   3   4   >