Re: [ceph-users] OSD won't come back "UP"

2016-10-07 Thread Reed Dier
Resolved. Apparently it took the OSD almost 2.5 hours to fully boot. Had not seen this behavior before, but it eventually booted itself back into the crush map. Bookend log stamps below. > 2016-10-07 21:33:39.241720 7f3d59a97800 0 set uid:gid to 64045:64045 > (ceph:ceph) > 2016-10-07 23:53:

[ceph-users] OSD won't come back "UP"

2016-10-07 Thread Reed Dier
Attempting to adjust parameters of some of my recovery options, I restarted a single osd in the cluster with the following syntax: > sudo restart ceph-osd id=0 The osd restarts without issue, status shows running with the PID. > sudo status ceph-osd id=0 > ceph-osd (ceph/0) start/running, proc

Re: [ceph-users] Ceph + VMWare

2016-10-07 Thread Jake Young
Hey Patrick, I work for Cisco. We have a 200TB cluster (108 OSDs on 12 OSD Nodes) and use the cluster for both OpenStack and VMware deployments. We are using iSCSI now, but it really would be much better if VMware did support RBD natively. We present a 1-2TB Volume that is shared between 4-8 ES

Re: [ceph-users] maintenance questions

2016-10-07 Thread Gregory Farnum
On Fri, Oct 7, 2016 at 1:21 PM, Jeff Applewhite wrote: > Hi All > > I have a few questions pertaining to management of MONs and OSDs. This is in > a Ceph 2.x context only. You mean Jewel? ;) > --- > 1) Can MONs be placed in something resembling mainten

Re: [ceph-users] CephFS: No space left on device

2016-10-07 Thread Mykola Dvornik
10.2.2 -Mykola On 7 October 2016 at 15:43, Yan, Zheng wrote: > On Thu, Oct 6, 2016 at 4:11 PM, wrote: > > Is there any way to repair pgs/cephfs gracefully? > > > > So far no. We need to write a tool to repair this type of corruption. > > Which version of ceph did you use before upgrading to

Re: [ceph-users] unable to start radosgw after upgrade from 10.2.2 to 10.2.3

2016-10-07 Thread Graham Allan
Dear Orit, On 10/07/2016 04:21 AM, Orit Wasserman wrote: Hi, On Wed, Oct 5, 2016 at 11:23 PM, Andrei Mikhailovsky wrote: Hello everyone, I've just updated my ceph to version 10.2.3 from 10.2.2 and I am no longer able to start the radosgw service. When executing I get the following error: 20

[ceph-users] maintenance questions

2016-10-07 Thread Jeff Applewhite
Hi All I have a few questions pertaining to management of MONs and OSDs. This is in a Ceph 2.x context only. --- 1) Can MONs be placed in something resembling maintenance mode (for firmware updates, patch reboots, etc.). If so how? If not how addressed?

Re: [ceph-users] jewel/CephFS - misc problems (duplicate strays, mismatch between head items and fnode.fragst)

2016-10-07 Thread Kjetil Jørgensen
Hi On Fri, Oct 7, 2016 at 6:31 AM, Yan, Zheng wrote: > On Fri, Oct 7, 2016 at 8:20 AM, Kjetil Jørgensen > wrote: > > And - I just saw another recent thread - > > http://tracker.ceph.com/issues/17177 - can be an explanation of > most/all of > > the above ? > > > > Next question(s) would then be:

Re: [ceph-users] jewel/CephFS - misc problems (duplicate strays, mismatch between head items and fnode.fragst)

2016-10-07 Thread Kjetil Jørgensen
On Fri, Oct 7, 2016 at 4:46 AM, John Spray wrote: > On Fri, Oct 7, 2016 at 1:05 AM, Kjetil Jørgensen > wrote: > > Hi, > > > > context (i.e. what we're doing): We're migrating (or trying to) migrate > off > > of an nfs server onto cephfs, for a workload that's best described as > "big > > piles"

Re: [ceph-users] rsync kernel client cepfs mkstemp no space left on device

2016-10-07 Thread Gregory Farnum
On Fri, Oct 7, 2016 at 7:15 AM, Hauke Homburg wrote: > Hello, > > I have a Ceph Cluster with 5 Server, and 40 OSD. Aktual on this Cluster > are 85GB Free Space, and the rsync dir has lots of Pictures and a Data > Volume of 40GB. > > The Linux is a Centos 7 and the Last stable Ceph. The Client is a

Re: [ceph-users] unable to start radosgw after upgrade from 10.2.2 to 10.2.3

2016-10-07 Thread Graham Allan
The fundamental problem seems to be the same in each case, related to a missing master_zone in the zonegroup. Like yours, our cluster has been running for several years with few config changes, though in our case, the 10.2.3 radosgw simply doesn't start at all, logging the following error: 201

Re: [ceph-users] CephFS: No space left on device

2016-10-07 Thread Yan, Zheng
On Thu, Oct 6, 2016 at 4:11 PM, wrote: > Is there any way to repair pgs/cephfs gracefully? > So far no. We need to write a tool to repair this type of corruption. Which version of ceph did you use before upgrading to 10.2.3 ? Regards Yan, Zheng > > > -Mykola > > > > From: Yan, Zheng > Sent:

Re: [ceph-users] jewel/CephFS - misc problems (duplicate strays, mismatch between head items and fnode.fragst)

2016-10-07 Thread Yan, Zheng
On Fri, Oct 7, 2016 at 8:20 AM, Kjetil Jørgensen wrote: > And - I just saw another recent thread - > http://tracker.ceph.com/issues/17177 - can be an explanation of most/all of > the above ? > > Next question(s) would then be: > > How would one deal with duplicate stray(s) Here is an untested met

[ceph-users] rsync kernel client cepfs mkstemp no space left on device

2016-10-07 Thread Hauke Homburg
Hello, I have a Ceph Cluster with 5 Server, and 40 OSD. Aktual on this Cluster are 85GB Free Space, and the rsync dir has lots of Pictures and a Data Volume of 40GB. The Linux is a Centos 7 and the Last stable Ceph. The Client is a Debian 8 with Kernel 4 and the Cluster is with cephfs mounted. W

Re: [ceph-users] Hammer OSD memory usage very high

2016-10-07 Thread Haomai Wang
do you try to restart osd to se the memory usage? On Fri, Oct 7, 2016 at 1:04 PM, David Burns wrote: > Hello all, > > We have a small 160TB Ceph cluster used only as a test s3 storage repository > for media content. > > Problem > Since upgrading from Firefly to Hammer we are experiencing very hi

Re: [ceph-users] Ceph Mon Crashing after creating Cephfs

2016-10-07 Thread John Spray
On Fri, Oct 7, 2016 at 12:37 PM, James Horner wrote: > Hi John > > Thanks for that, life saver! Running on Debian Jessie and I replaced the > mail ceph repo in source.d to: > > deb > http://gitbuilder.ceph.com/ceph-deb-jessie-x86_64-basic/ref/wip-17466-jewel/ > jessie main > > Updated and Upgraded

Re: [ceph-users] jewel/CephFS - misc problems (duplicate strays, mismatch between head items and fnode.fragst)

2016-10-07 Thread John Spray
On Fri, Oct 7, 2016 at 1:05 AM, Kjetil Jørgensen wrote: > Hi, > > context (i.e. what we're doing): We're migrating (or trying to) migrate off > of an nfs server onto cephfs, for a workload that's best described as "big > piles" of hardlinks. Essentially, we have a set of "sources": > foo/01/ > foo

[ceph-users] Crash in ceph_read_iter->__free_pages due to null page

2016-10-07 Thread Nikolay Borisov
Hello, I've encountered yet another cephfs crash: [990188.822271] BUG: unable to handle kernel NULL pointer dereference at 001c [990188.822790] IP: [] __free_pages+0x5/0x30 [990188.823090] PGD 180dd8f067 PUD 1bf2722067 PMD 0 [990188.823506] Oops: 0002 [#1] SMP [990188.831274] CPU

Re: [ceph-users] Ceph Mon Crashing after creating Cephfs

2016-10-07 Thread James Horner
Hi John Thanks for that, life saver! Running on Debian Jessie and I replaced the mail ceph repo in source.d to: deb http://gitbuilder.ceph.com/ceph-deb-jessie-x86_64-basic/ref/wip-17466-jewel/ jessie main Updated and Upgraded Ceph, tried to manually run my mon which failed as it had already been

Re: [ceph-users] unable to start radosgw after upgrade from 10.2.2 to 10.2.3

2016-10-07 Thread Orit Wasserman
On Fri, Oct 7, 2016 at 12:24 PM, Andrei Mikhailovsky wrote: > Hi Orit, > > The radosgw service has been configured about two years ago using the > documentation on the ceph.com website. No changes to configuration has been > done since. The service was working fine until the 10.2.3 update recent

Re: [ceph-users] Ceph Mon Crashing after creating Cephfs

2016-10-07 Thread John Spray
On Fri, Oct 7, 2016 at 8:04 AM, James Horner wrote: > Hi All > > Just wondering if anyone can help me out here. Small home cluster with 1 > mon, the next phase of the plan called for more but I hadn't got there yet. > > I was trying to setup Cephfs and I ran "ceph fs new" without having an MDS > a

Re: [ceph-users] unable to start radosgw after upgrade from 10.2.2 to 10.2.3

2016-10-07 Thread Andrei Mikhailovsky
Hi Orit, The radosgw service has been configured about two years ago using the documentation on the ceph.com website. No changes to configuration has been done since. The service was working fine until the 10.2.3 update recently. I have been updating ceph to include every major release and prac

Re: [ceph-users] unable to start radosgw after upgrade from 10.2.2 to 10.2.3

2016-10-07 Thread Orit Wasserman
Hi, On Wed, Oct 5, 2016 at 11:23 PM, Andrei Mikhailovsky wrote: > Hello everyone, > > I've just updated my ceph to version 10.2.3 from 10.2.2 and I am no longer > able to start the radosgw service. When executing I get the following error: > > 2016-10-05 22:14:10.735883 7f1852d26a00 0 ceph versi

Re: [ceph-users] [EXTERNAL] Benchmarks using fio tool gets stuck

2016-10-07 Thread Mario Rodríguez Molins
Adding the parameter "--iodirect=1" to the fio command line, it does not get stuck anymore. This is how it looks now my script: for operation in read write randread randwrite; do for rbd in 4K 64K 1M 4M; do for bs in 4k 64k 1M 4M ; do # - create rbd image with block size $rbd f

Re: [ceph-users] unfound objects blocking cluster, need help!

2016-10-07 Thread Paweł Sadowski
Hi, I work with Tomasz and I'm investigating this situation. We still don't fully understood why there was unfound object after removing single OSD. >From logs[1] it looks like all PGs were active+clean before marking that OSD out. After that backfills started on multiple OSDs. Three minutes later