[ceph-users] Fwd: [SOLVED] ceph-disk activate fails (after 33 osd drives)

Alexey Sheplyakov Mon, 15 Feb 2016 07:14:10 -0800

[forwarding to the list so people know how to solve the problem]

---------- Forwarded message ----------
From: John Hogenmiller (yt) <j...@yourtech.us>
Date: Fri, Feb 12, 2016 at 6:48 PM
Subject: Re: [ceph-users] ceph-disk activate fails (after 33 osd drives)
To: Alexey Sheplyakov <asheplya...@mirantis.com>



That was it, thank you.  Definitely documenting that item.  I'll
proceed a bit slower, 1 node at a time and wait for health_ok.

I was actually looking at the supermicro 72-drive ceph nodes myself.
We have what is effectively supermicro "white label" 2UTwin hardware
attached to a 60 drive DAE. The hardware I'm testing on has 6TB
drives, though our newer models have 8TB.  In the near future, I'll
have to really dive into the placement group docs and try and work out
bottlenecks and optimizations for configurations between 1 and 4 racks
(480, 960, 1440, and 1920 OSDs) as well as monitoring nodes. We're
doing strictly object store.

Thanks again,
John

On Fri, Feb 12, 2016 at 10:25 AM, Alexey Sheplyakov
<asheplya...@mirantis.com> wrote:
>
> John,
>
> > 2016-02-12 12:53:43.340526 7f149bc71940 -1 journal FileJournal::_open: 
> > unable to setup io_context (0) Success
>
> Try increasing aio-max-nr:
>
> echo 131072 > /proc/sys/fs/aio-max-nr
>
> Best regards,
>       Alexey
>
>
> On Fri, Feb 12, 2016 at 4:51 PM, John Hogenmiller (yt) <j...@yourtech.us> 
> wrote:
> >
> >
> > I have 7 servers, each containing 60 x 6TB drives in jbod mode. When I first
> > started, I only activated a couple drives on 3 nodes as Ceph OSDs.
> > Yesterday, I went to expand to the remaining nodes as well as prepare and
> > activate all the drives.
> >
> > ceph-disk prepare worked just fine. However, ceph-disk activate-all managed
> > to only activate 33 drives and failed on the rest.  This is consistent all 7
> > nodes (existing and newly installed). At the end of the day, I have 33 Ceph
> > OSDs activated per server and can't activate any more. I did have to bump up
> > the pg_num and pgp_num on the pool in order to accommodate the drives that
> > did activate. I don't know if having a low pg number during the mass influx
> > of OSDs caused an issue or not within the pool. I don't think so because I
> > can only set the pg_num to a maximum value determined by the number of known
> > OSDs. But maybe you have to expand slowly, increase pg's, expand osds,
> > increase pgs in a slow fashion.  I certainly have not seen anything to
> > suggest a magic "33/node limit", and I've seen references to servers with up
> > to 72 Ceph OSDs on them.
> >
> > I then attempted to activate individual ceph osd's and got the same set of
> > errors. I even wiped a drive, re-ran `ceph-disk prepare` and `ceph-disk
> > activate` to have it fail in the same way.
> >
> > status:
> > ```
> > root@ljb01:/home/ceph/rain-cluster# ceph status
> >     cluster 4ebe7995-6a33-42be-bd4d-20f51d02ae45
> >      health HEALTH_OK
> >      monmap e5: 5 mons at
> > {hail02-r01-06=172.29.4.153:6789/0,hail02-r01-08=172.29.4.155:6789/0,rain02-r01-01=172.29.4.148:6789/0,rain02-r01-03=172.29.4.150:6789/0,rain02-r01-04=172.29.4.151:6789/0}
> >             election epoch 12, quorum 0,1,2,3,4
> > rain02-r01-01,rain02-r01-03,rain02-r01-04,hail02-r01-06,hail02-r01-08
> >      osdmap e1116: 420 osds: 232 up, 232 in
> >             flags sortbitwise
> >       pgmap v397198: 10872 pgs, 14 pools, 101 MB data, 8456 objects
> >             38666 MB used, 1264 TB / 1264 TB avail
> >                10872 active+clean
> > ```
> >
> >
> >
> > Here is what I get when I run ceph-disk prepare on a blank drive:
> >
> > ```
> > root@rain02-r01-01:/etc/ceph# ceph-disk  prepare  /dev/sdbh1
> > The operation has completed successfully.
> > The operation has completed successfully.
> > meta-data=/dev/sdbh1             isize=2048   agcount=6, agsize=268435455
> > blks
> >          =                       sectsz=512   attr=2, projid32bit=0
> > data     =                       bsize=4096   blocks=1463819665, imaxpct=5
> >          =                       sunit=0      swidth=0 blks
> > naming   =version 2              bsize=4096   ascii-ci=0
> > log      =internal log           bsize=4096   blocks=521728, version=2
> >          =                       sectsz=512   sunit=0 blks, lazy-count=1
> > realtime =none                   extsz=4096   blocks=0, rtextents=0
> > The operation has completed successfully.
> >
> > root@rain02-r01-01:/etc/ceph# parted /dev/sdh print
> > Model: ATA HUS726060ALA640 (scsi)
> > Disk /dev/sdh: 6001GB
> > Sector size (logical/physical): 512B/512B
> > Partition Table: gpt
> >
> > Number  Start   End     Size    File system  Name          Flags
> >  2      1049kB  5369MB  5368MB               ceph journal
> >  1      5370MB  6001GB  5996GB  xfs          ceph data
> > ```
> >
> > And finally the errors from attempting to activate the drive.
> >
> > ```
> > root@rain02-r01-01:/etc/ceph# ceph-disk activate /dev/sdbh1
> > got monmap epoch 5
> > 2016-02-12 12:53:43.340526 7f149bc71940 -1 journal FileJournal::_open:
> > unable to setup io_context (0) Success
> > 2016-02-12 12:53:43.340748 7f1493f83700 -1 journal io_submit to 0~4096 got
> > (22) Invalid argument
> > 2016-02-12 12:53:43.341186 7f149bc71940 -1
> > filestore(/var/lib/ceph/tmp/mnt.KRphD_) could not find
> > -1/23c2fcde/osd_superblock/0 in index: (2) No such file or directory
> > os/FileJournal.cc: In function 'int FileJournal::write_aio_bl(off64_t&,
> > ceph::bufferlist&, uint64_t)' thread 7f1493f83700 time 2016-02-12
> > 12:53:43.341355
> > os/FileJournal.cc: 1469: FAILED assert(0 == "io_submit got unexpected
> > error")
> >  ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299)
> >  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> > const*)+0x8b) [0x7f149b767f2b]
> >  2: (FileJournal::write_aio_bl(long&, ceph::buffer::list&, unsigned
> > long)+0x5ad) [0x7f149b5fe27d]
> >  3: (FileJournal::do_aio_write(ceph::buffer::list&)+0x263) [0x7f149b602e63]
> >  4: (FileJournal::write_thread_entry()+0x4e4) [0x7f149b607394]
> >  5: (FileJournal::Writer::entry()+0xd) [0x7f149b44bddd]
> >  6: (()+0x8182) [0x7f1499d87182]
> >  7: (clone()+0x6d) [0x7f14980ce47d]
> >  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
> > interpret this.
> > 2016-02-12 12:53:43.345434 7f1493f83700 -1 os/FileJournal.cc: In function
> > 'int FileJournal::write_aio_bl(off64_t&, ceph::bufferlist&, uint64_t)'
> > thread 7f1493f83700 time 2016-02-12 12:53:43.341355
> > os/FileJournal.cc: 1469: FAILED assert(0 == "io_submit got unexpected
> > error")
> >
> >  ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299)
> >  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> > const*)+0x8b) [0x7f149b767f2b]
> >  2: (FileJournal::write_aio_bl(long&, ceph::buffer::list&, unsigned
> > long)+0x5ad) [0x7f149b5fe27d]
> >  3: (FileJournal::do_aio_write(ceph::buffer::list&)+0x263) [0x7f149b602e63]
> >  4: (FileJournal::write_thread_entry()+0x4e4) [0x7f149b607394]
> >  5: (FileJournal::Writer::entry()+0xd) [0x7f149b44bddd]
> >  6: (()+0x8182) [0x7f1499d87182]
> >  7: (clone()+0x6d) [0x7f14980ce47d]
> >  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
> > interpret this.
> >
> >     -4> 2016-02-12 12:53:43.340526 7f149bc71940 -1 journal
> > FileJournal::_open: unable to setup io_context (0) Success
> >     -3> 2016-02-12 12:53:43.340748 7f1493f83700 -1 journal io_submit to
> > 0~4096 got (22) Invalid argument
> >     -1> 2016-02-12 12:53:43.341186 7f149bc71940 -1
> > filestore(/var/lib/ceph/tmp/mnt.KRphD_) could not find
> > -1/23c2fcde/osd_superblock/0 in index: (2) No such file or directory
> >      0> 2016-02-12 12:53:43.345434 7f1493f83700 -1 os/FileJournal.cc: In
> > function 'int FileJournal::write_aio_bl(off64_t&, ceph::bufferlist&,
> > uint64_t)' thread 7f1493f83700 time 2016-02-12 12:53:43.3
> > 41355
> > os/FileJournal.cc: 1469: FAILED assert(0 == "io_submit got unexpected
> > error”)
> >
> >
> >  ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299)
> >  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> > const*)+0x8b) [0x7f149b767f2b]
> >  2: (FileJournal::write_aio_bl(long&, ceph::buffer::list&, unsigned
> > long)+0x5ad) [0x7f149b5fe27d]
> >  3: (FileJournal::do_aio_write(ceph::buffer::list&)+0x263) [0x7f149b602e63]
> >  4: (FileJournal::write_thread_entry()+0x4e4) [0x7f149b607394]
> >  5: (FileJournal::Writer::entry()+0xd) [0x7f149b44bddd]
> >  6: (()+0x8182) [0x7f1499d87182]
> >  7: (clone()+0x6d) [0x7f14980ce47d]
> >  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
> > interpret this.
> >
> > terminate called after throwing an instance of 'ceph::FailedAssertion'
> > *** Caught signal (Aborted) **
> >  in thread 7f1493f83700
> >  ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299)
> >  1: (()+0x7d02ca) [0x7f149b67b2ca]
> >  2: (()+0x10340) [0x7f1499d8f340]
> >  3: (gsignal()+0x39) [0x7f149800acc9]
> >  4: (abort()+0x148) [0x7f149800e0d8]
> >  5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7f1498915535]
> >  6: (()+0x5e6d6) [0x7f14989136d6]
> >  7: (()+0x5e703) [0x7f1498913703]
> >  8: (()+0x5e922) [0x7f1498913922]
> >  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> > const*)+0x278) [0x7f149b768118]
> >  10: (FileJournal::write_aio_bl(long&, ceph::buffer::list&, unsigned
> > long)+0x5ad) [0x7f149b5fe27d]
> >  11: (FileJournal::do_aio_write(ceph::buffer::list&)+0x263) [0x7f149b602e63]
> >  12: (FileJournal::write_thread_entry()+0x4e4) [0x7f149b607394]
> >  13: (FileJournal::Writer::entry()+0xd) [0x7f149b44bddd]
> >  14: (()+0x8182) [0x7f1499d87182]
> >  15: (clone()+0x6d) [0x7f14980ce47d]
> > 2016-02-12 12:53:43.348498 7f1493f83700 -1 *** Caught signal (Aborted) **
> >  in thread 7f1493f83700
> >
> >  ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299)
> >  1: (()+0x7d02ca) [0x7f149b67b2ca]
> >  2: (()+0x10340) [0x7f1499d8f340]
> >  3: (gsignal()+0x39) [0x7f149800acc9]
> >  4: (abort()+0x148) [0x7f149800e0d8]
> >  5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7f1498915535]
> >  6: (()+0x5e6d6) [0x7f14989136d6]
> >  7: (()+0x5e703) [0x7f1498913703]
> >  8: (()+0x5e922) [0x7f1498913922]
> >  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> > const*)+0x278) [0x7f149b768118]
> >  10: (FileJournal::write_aio_bl(long&, ceph::buffer::list&, unsigned
> > long)+0x5ad) [0x7f149b5fe27d]
> >  11: (FileJournal::do_aio_write(ceph::buffer::list&)+0x263) [0x7f149b602e63]
> >  12: (FileJournal::write_thread_entry()+0x4e4) [0x7f149b607394]
> >  13: (FileJournal::Writer::entry()+0xd) [0x7f149b44bddd]
> >  14: (()+0x8182) [0x7f1499d87182]
> >  15: (clone()+0x6d) [0x7f14980ce47d]
> >  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
> > interpret this.
> >
> >      0> 2016-02-12 12:53:43.348498 7f1493f83700 -1 *** Caught signal
> > (Aborted) **
> >  in thread 7f1493f83700
> >
> >  ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299)
> >  1: (()+0x7d02ca) [0x7f149b67b2ca]
> >  2: (()+0x10340) [0x7f1499d8f340]
> >  3: (gsignal()+0x39) [0x7f149800acc9]
> >  4: (abort()+0x148) [0x7f149800e0d8]
> >  5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7f1498915535]
> >  6: (()+0x5e6d6) [0x7f14989136d6]
> >  7: (()+0x5e703) [0x7f1498913703]
> >  8: (()+0x5e922) [0x7f1498913922]
> >  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> > const*)+0x278) [0x7f149b768118]
> >  10: (FileJournal::write_aio_bl(long&, ceph::buffer::list&, unsigned
> > long)+0x5ad) [0x7f149b5fe27d]
> >  11: (FileJournal::do_aio_write(ceph::buffer::list&)+0x263) [0x7f149b602e63]
> > 12: (FileJournal::write_thread_entry()+0x4e4) [0x7f149b607394] 13:
> > (FileJournal::Writer::entry()+0xd) [0x7f149b44bddd]
> >  14: (()+0x8182) [0x7f1499d87182]
> >  15: (clone()+0x6d) [0x7f14980ce47d]
> >  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
> > interpret this.
> >
> > ERROR:ceph-disk:Failed to activate
> > Traceback (most recent call last):
> >   File "/usr/sbin/ceph-disk", line 3576, in <module>
> >     main(sys.argv[1:])
> >   File "/usr/sbin/ceph-disk", line 3532, in main
> >     main_catch(args.func, args)
> >   File "/usr/sbin/ceph-disk", line 3554, in main_catch
> >     func(args)
> >   File "/usr/sbin/ceph-disk", line 2424, in main_activate
> >     dmcrypt_key_dir=args.dmcrypt_key_dir,
> >   File "/usr/sbin/ceph-disk", line 2197, in mount_activate
> >     (osd_id, cluster) = activate(path, activate_key_template, init)
> >   File "/usr/sbin/ceph-disk", line 2360, in activate
> >     keyring=keyring,
> >   File "/usr/sbin/ceph-disk", line 1950, in mkfs
> >     '--setgroup', get_ceph_user(),  File "/usr/sbin/ceph-disk", line 349, in
> > command_check_call    return subprocess.check_call(arguments)
> >   File "/usr/lib/python2.7/subprocess.py", line 540, in check_call    raise
> > CalledProcessError(retcode, cmd)subprocess.CalledProcessError: Command
> > '['/usr/bin/ceph-osd', '--cluster', 'ceph', '--mkfs', '--mkkey', '-i',
> > '165', '--monmap', '/var/lib/ceph/tmp/mnt.KRphD_/activate.monmap',
> > '--osd-data', '/var/lib/ceph/tmp/mnt.KRphD_', '--osd-journal',
> > '/var/lib/ceph/tmp/mnt.K
> >
> > root@rain02-r01-01:/etc/ceph# ls -l /var/lib/ceph/tmp/
> > total 0
> > -rw-r--r-- 1 root root 0 Feb 12 12:58 ceph-disk.activate.lock
> > -rw-r--r-- 1 root root 0 Feb 12 12:58 ceph-disk.prepare.lock
> > ```
> >
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Fwd: [SOLVED] ceph-disk activate fails (after 33 osd drives)

Reply via email to