Re: [ceph-users] Fwd: List of SSDs

2016-03-03 Thread Shinobu Kinjo
Comparing with these SSDs, S3710s S3610s SM863 845DC Pro which one is more reasonable in terms of performance, cost or whatever? S3710s does not sound reasonable to me. > And I had no luck at all getting the newer versions into a generic kernel > or Debian. So it's not always better to use

Re: [ceph-users] Fwd: List of SSDs

2016-03-03 Thread Christian Balzer
Hello, On Mon, 29 Feb 2016 15:00:08 -0800 Heath Albritton wrote: > > Did you just do these tests or did you also do the "suitable for Ceph" > > song and dance, as in sync write speed? > > These were done with libaio, so async. I can do a sync test if that > helps. My goal for testing wasn't

Re: [ceph-users] OSDs are crashing during PG replication

2016-03-03 Thread Shinobu Kinjo
Thank you for your explanation. > Every time 2 of 18 OSDs are crashing. I think it's happening when run PG > replication because crashing only 2 OSDs and every time they're are the same. 1st you said, 2 osds were crashed every time. From the log you pasted, it makes sense to do something for

Re: [ceph-users] OSDs are crashing during PG replication

2016-03-03 Thread Alexander Gubanov
I decided to refuse use of ssd cache pool and create just 2 pool. 1st pool only of ssd for fast storage 2nd only of hdd for slow storage. What about this file, honestly, I don't know why it is created. As I say I flush the journal for fallen OSD and remove this file and then I start osd damon:

[ceph-users] abort slow requests ?

2016-03-03 Thread Ben Hines
I have a few bad objects in ceph which are 'stuck on peering'. The clients hit them and they build up and eventually stop all traffic to the OSD. I can open up traffic by resetting the OSD (aborting those requests) temporarily. Is there a way to tell ceph to cancel/abort these 'slow requests'

Re: [ceph-users] Problem: silently corrupted RadosGW objects caused by slow requests

2016-03-03 Thread Robin H. Johnson
On Thu, Mar 03, 2016 at 01:55:13PM +0100, Ritter Sławomir wrote: > Hi, > > I think this is really serious problem - again: > > - we silently lost S3/RGW objects in clusters > > Moreover, it our situation looks very similiar to described in > uncorrected bug #13764 (Hammer) and in corrected

Re: [ceph-users] Problem: silently corrupted RadosGW objects caused by slow requests

2016-03-03 Thread Yehuda Sadeh-Weinraub
On Thu, Feb 25, 2016 at 7:17 AM, Ritter Sławomir wrote: > Hi, > > > > We have two CEPH clusters running on Dumpling 0.67.11 and some of our > "multipart objects" are incompleted. It seems that some slow requests could > cause corruption of related S3 objects. Moveover

Re: [ceph-users] Ceph RBD latencies

2016-03-03 Thread Adrian Saul
> Samsung EVO... > Which exact model, I presume this is not a DC one? > > If you had put your journals on those, you would already be pulling your hairs > out due to abysmal performance. > > Also with Evo ones, I'd be worried about endurance. No, I am using the P3700DCs for journals. The

Re: [ceph-users] [Hammer upgrade]: procedure for upgrade

2016-03-03 Thread ceph
Hi, As the docs said, mon, then osd, then rgw Restart each daemon after upgrade the code Works fine On 03/03/2016 22:11, Andrea Annoè wrote: > Hi to all, > An architecture of Ceph have: > 1 RGW > 3 MON > 4 OSD > > Someone have tested procedure for upgrade Ceph architecture with RGW,MON,OSD ? >

[ceph-users] [Hammer upgrade]: procedure for upgrade

2016-03-03 Thread Andrea Annoè
Hi to all, An architecture of Ceph have: 1 RGW 3 MON 4 OSD Someone have tested procedure for upgrade Ceph architecture with RGW,MON,OSD ? What component I will upgrade for first? Except RGW all service will be up when upgrade apply? Thanks in advance for share your experience. Best regards

Re: [ceph-users] PG's stuck inactive, stuck unclean, incomplete, imports cause osd segfaults

2016-03-03 Thread Philip S. Hempel
On 03/03/2016 03:00 PM, Richard Arends wrote: On 03/03/2016 08:32 PM, Philip S. Hempel wrote: On 03/03/2016 01:49 PM, Richard Arends wrote: On 03/03/2016 07:21 PM, Philip S. Hempel wrote: Philip, Sorry, can't help you with the segfault. What i would do, is set debug options in ceph.conf

Re: [ceph-users] Help: pool not responding

2016-03-03 Thread Dimitar Boichev
But the whole cluster or what ? Regards. Dimitar Boichev SysAdmin Team Lead AXSMarine Sofia Phone: +359 889 22 55 42 Skype: dimitar.boichev.axsmarine E-mail: dimitar.boic...@axsmarine.com On Mar 3, 2016, at 22:47, Mario Giammarco

Re: [ceph-users] PG's stuck inactive, stuck unclean, incomplete, imports cause osd segfaults

2016-03-03 Thread Richard Arends
On 03/03/2016 09:04 PM, Philip S. Hempel wrote: On 03/03/2016 03:00 PM, Richard Arends wrote: Do you have more info before and after this message? There is about 40 lines above like this this is the last few lines -8> 2016-03-03 14:47:54.244421 7f5b57c01840 5 osd.34 pg_epoch: 89826

Re: [ceph-users] PG's stuck inactive, stuck unclean, incomplete, imports cause osd segfaults

2016-03-03 Thread Richard Arends
On 03/03/2016 08:32 PM, Philip S. Hempel wrote: On 03/03/2016 01:49 PM, Richard Arends wrote: On 03/03/2016 07:21 PM, Philip S. Hempel wrote: Philip, Sorry, can't help you with the segfault. What i would do, is set debug options in ceph.conf and start the OSD, maybe that extra debug info

Re: [ceph-users] CEPH FS - all_squash option equivalent

2016-03-03 Thread Gregory Farnum
On Thu, Mar 3, 2016 at 11:05 AM, Lincoln Bryant wrote: > Also very interested in this if there are any docs available! > > --Lincoln > >> On Mar 3, 2016, at 1:04 PM, Fred Rolland wrote: >> >> Can you share a link describing the UID squashing feature?

Re: [ceph-users] CEPH FS - all_squash option equivalent

2016-03-03 Thread Lincoln Bryant
Also very interested in this if there are any docs available! --Lincoln > On Mar 3, 2016, at 1:04 PM, Fred Rolland wrote: > > Can you share a link describing the UID squashing feature? > > On Mar 3, 2016 9:02 PM, "Gregory Farnum" wrote: > On Wed, Mar

Re: [ceph-users] CEPH FS - all_squash option equivalent

2016-03-03 Thread Fred Rolland
Can you share a link describing the UID squashing feature? On Mar 3, 2016 9:02 PM, "Gregory Farnum" wrote: > On Wed, Mar 2, 2016 at 11:22 PM, Fred Rolland wrote: > > Thanks for your reply. > > > > Server : > > [root@ceph-1 ~]# rpm -qa | grep ceph > >

Re: [ceph-users] CEPH FS - all_squash option equivalent

2016-03-03 Thread Gregory Farnum
On Wed, Mar 2, 2016 at 11:22 PM, Fred Rolland wrote: > Thanks for your reply. > > Server : > [root@ceph-1 ~]# rpm -qa | grep ceph > ceph-mon-0.94.1-13.el7cp.x86_64 That would be a Hammer release. Nothing there for doing anything with permission checks at all. -Greg >

Re: [ceph-users] Upgrade from Hammer LTS to Infernalis or wait for Jewel LTS?

2016-03-03 Thread Oliver Dzombic
Hi, i was unable to find any time table of EOL's of the different versions. Can you please tell me where your informations come from ( EOL/Release dates, LTS's )? Wiki » Planning » @tracker.ceph.com did not really help Roadmap did not help too. Thank you ! -- Mit freundlichen Gruessen /

Re: [ceph-users] PG's stuck inactive, stuck unclean, incomplete, imports cause osd segfaults

2016-03-03 Thread Richard Arends
On 03/03/2016 07:21 PM, Philip S. Hempel wrote: Philip, Sorry, can't help you with the segfault. What i would do, is set debug options in ceph.conf and start the OSD, maybe that extra debug info will give something you can work with. On 03/03/2016 01:15 PM, Richard Arends wrote: On

Re: [ceph-users] Ceph RBD latencies

2016-03-03 Thread Nick Fisk
You can also dump the historic ops from the OSD admin socket. It will give a brief overview of each step and how long each one is taking. But generally what you are seeing is not unusual. Currently best case for a RBD on a replicated pool will be somewhere between 200-500 iops. The Ceph code is

Re: [ceph-users] PG's stuck inactive, stuck unclean, incomplete, imports cause osd segfaults

2016-03-03 Thread Richard Arends
On 03/03/2016 06:56 PM, Philip Hempel wrote: I did the import after using the objectool to remove the pg and that osd (34) segfaults now. Segfault output is not my cup of tea, but is that exact the same segfault as you posted earlier? -- Regards, Richard.

Re: [ceph-users] PG's stuck inactive, stuck unclean, incomplete, imports cause osd segfaults

2016-03-03 Thread Philip S. Hempel
On 03/03/2016 12:44 PM, Richard Arends wrote: On 03/03/2016 06:40 PM, Philip Hempel wrote: osd 45. But that import causes a segfault on the osd Did that OSD allready had info (files) for that PG? --- Regards, Richard. No, this was a new import.

Re: [ceph-users] PG's stuck inactive, stuck unclean, incomplete, imports cause osd segfaults

2016-03-03 Thread Richard Arends
On 03/03/2016 06:40 PM, Philip Hempel wrote: osd 45. But that import causes a segfault on the osd Did that OSD allready had info (files) for that PG? --- Regards, Richard. ___ ceph-users mailing list ceph-users@lists.ceph.com

Re: [ceph-users] PG's stuck inactive, stuck unclean, incomplete, imports cause osd segfaults - Hire a consultant

2016-03-03 Thread Richard Arends
On 03/03/2016 06:12 PM, Philip Hempel wrote: Philip, I forgot to CC the list, now i did... To export the data I used ceph-objectstore-tool to do this using the export command. I am trying to repair a cluster that has 74 pgs that are down, I have seen that the pgs in question are presently

Re: [ceph-users] PG's stuck inactive, stuck unclean, incomplete, imports cause osd segfaults - Hire a consultant

2016-03-03 Thread Philip S. Hempel
On 03/02/2016 01:40 PM, Philip S. Hempel wrote: Hello everyone, I am trying to repair a cluster that has 74 pgs that are down, I have seen that the pgs in question are presently with 0 data on the OSD. I have exported data from OSD's that were pulled when the client had thought the disk were

[ceph-users] OSDs go down with infernalis

2016-03-03 Thread Yoann Moulin
Hello, I'm (almost) a new user of ceph (couple of month). In my university, we start to do some test with ceph a couple of months ago. We have 2 clusters. Each cluster have 100 OSDs on 10 servers : Each server as this setup : CPU : 2 x Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz Memory : 128GB

[ceph-users] ceph upgrade and the impact to rbd clients

2016-03-03 Thread Xu (Simon) Chen
Hi all, I am running ceph for cinder backend of my OpenStack deployment. I am curious if I upgrade ceph (say from an older version of firefly to a newer version of firefly, or from firefly to hammer), what do I need to do with my VMs, which continue to run with librbd of the previous version? I

Re: [ceph-users] Ceph RBD latencies

2016-03-03 Thread Jan Schermer
I think the latency comes from journal flushing Try tuning filestore min sync interval = .1 filestore max sync interval = 5 and also /proc/sys/vm/dirty_bytes (I suggest 512MB) /proc/sys/vm/dirty_background_bytes (I suggest 256MB) See if that helps It would be useful to see the job you are

Re: [ceph-users] Ceph RBD latencies

2016-03-03 Thread RDS
A couple of suggestions: 1) # of pgs per OSD should be 100-200 2) When dealing with SSD or Flash, performance of these devices hinge on how you partition them and how you tune linux: a) if using partitions, did you align the partitions on a 4k boundary? I start at sector 2048 using

Re: [ceph-users] Problem: silently corrupted RadosGW objects caused by slow requests

2016-03-03 Thread Ritter Sławomir
Hi, I think this is really serious problem - again: - we silently lost S3/RGW objects in clusters Moreover, it our situation looks very similiar to described in uncorrected bug #13764 (Hammer) and in corrected #8269 (Dumpling). Regards, SR -Original Message- From: ceph-users

[ceph-users] Details of project

2016-03-03 Thread Nishant karn
I wanted to know more about two of the projects that are in your idea page they are listed below: 1. RADOS PROXY 2. RBD DIFF CHECKSUMS Please guide me through this. What should I do to get selected for these projects. I have a working knowledge of C/C++ and really wanted to join open source

Re: [ceph-users] XFS and nobarriers on Intel SSD

2016-03-03 Thread Maxime Guyot
Hello, It looks like this thread is one of the main google hit on this issue, so let me bring some update. I experienced the same symptoms with Intel S3610 and LSI2208. The logs reported “task abort!” messages on a daily basis since November: Write(10): 2a 00 0e 92 88 90 00 00 10 00 scsi

Re: [ceph-users] Fwd: Help: pool not responding

2016-03-03 Thread Mario Giammarco
I have tried "force create". It says "creating" but at the end problem persists. I have restarted ceph as usual. I am evaluating ceph and I am shocked because it semeed a very robust filesystem and now for a glitch I have an entire pool blocked and there is no simple procedure to force a recovery.

Re: [ceph-users] Restore properties to default?

2016-03-03 Thread Max A. Krasilnikov
Здравствуйте! On Thu, Mar 03, 2016 at 09:53:22AM +1000, lindsay.mathieson wrote: > Ok, reduced my recovery I/O with > ceph tell osd.* injectargs '--osd-max-backfills 1' > ceph tell osd.* injectargs '--osd-recovery-max-active 1' > ceph tell osd.* injectargs '--osd-client-op-priority 63' > Now

[ceph-users] ceph mon failed to restart

2016-03-03 Thread M Ranga Swami Reddy
I have tried to restart the one of the ceph mon , got the below error: == 2016-03-03 08:16:00.120355 7f43b067d7c0 -1 obtain_monmap unable to find a monmap 2016-03-03 08:16:00.120374 7f43b067d7c0 -1 unable to obtain a monmap: (2) No such file or directory 2016-03-03 08:16:00.124437 7f43b067d7c0

Re: [ceph-users] Ceph RBD latencies

2016-03-03 Thread Christian Balzer
Hello, On Thu, 3 Mar 2016 07:41:09 + Adrian Saul wrote: > Hi Ceph-users, > > TL;DR - I can't seem to pin down why an unloaded system with flash based > OSD journals has higher than desired write latencies for RBD devices. > Any ideas? > > > I am developing a storage system based on