Hello list,
after a reboot of my node i see this on all OSDs of this node after the
reboot:
2012-12-14 09:03:20.393224 7f8e652f8780 -1 osd/OSD.cc: In function
'OSDMapRef OSDService::get_map(epoch_t)' thread 7f8e652f8780 time
2012-12-14 09:03:20.392528
osd/OSD.cc: 4385: FAILED
same log more verbose:
11 ec=10 les/c 3307/3307 3306/3306/3306) [] r=0 lpr=0 lcod 0'0 mlcod 0'0
inactive] read_log done
-11 2012-12-14 09:17:50.648572 7fb6e0d6b780 10 osd.3 pg_epoch: 3996
pg[3.44b( v 3988'3969 (1379'2968,3988'3969] local-les=3307 n=11 ec=10
les/c 3307/3307 3306/3306/3306)
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256
On 14/12/12 04:38, Gary Lowell wrote:
I think that the --debbuildopts '-j8 -b' might be trouncing
the
- --binary-arch flag - I'll get pbuilder setup and give it a
test - I normally use sbuild (for which the packaging changes
did have the
Hi team,
I forgot to include a description (also cc-ing correctly the
synnefo-devel list).
I am a member of the Synnefo team, where we are experimenting with RADOS
as a storage backend to host blocks for our volume block storage named
archipelago.
In this patch I implement aio stat and
Hi Noah, Gregory and Sage,
first of all, thanks for your quick replies. Here are some answers to your
questions.
Gregory, I have got the output of ceph -s before and after this specific
TeraSort run, and to me it looks ok; all 30 osds are up:
health HEALTH_OK
monmap e1: 1 mons at
On 12/14/2012 10:14 AM, Stefan Priebe wrote:
One more IMPORTANT note. This might happen due to the fact that a disk was
missing (disk failure) afte the reboot.
fstab and mountpoint are working with UUIDs so they match but the journal
block device:
osd journal = /dev/sde1
didn't match
On 12/13/2012 08:54 AM, Lachfeld, Jutta wrote:
Hi all,
Hi! Sorry to send this a bit late, it looks like the reply I authored
yesterday from my phone got eaten by vger.
I am currently doing some comparisons between CEPH FS and HDFS as a file system
for Hadoop using Hadoop's integrated
On 12/14/2012 08:52 AM, Dennis Jacobfeuerborn wrote:
On 12/14/2012 10:14 AM, Stefan Priebe wrote:
One more IMPORTANT note. This might happen due to the fact that a disk was
missing (disk failure) afte the reboot.
fstab and mountpoint are working with UUIDs so they match but the journal
block
Hello Dennis,
Am 14.12.2012 15:52, schrieb Dennis Jacobfeuerborn:
didn't match anymore - as the numbers got renumber due to the failed disk.
Is there a way to use some kind of UUIDs here too for journal?
You should be able to use /dev/disk/by-uuid/* instead. That should give you
a stable view
Hi Stefan,
Here's what I often do when I have a journal and data partition sharing
a disk:
sudo parted -s -a optimal /dev/$DEV mklabel gpt
sudo parted -s -a optimal /dev/$DEV mkpart osd-device-$i-journal 0% 10G
sudo parted -s -a optimal /dev/$DEV mkpart osd-device-$i-data 10G 100%
Mark
On
Hi Mark,
Am 14.12.2012 16:20, schrieb Mark Nelson:
sudo parted -s -a optimal /dev/$DEV mklabel gpt
sudo parted -s -a optimal /dev/$DEV mkpart osd-device-$i-journal 0% 10G
sudo parted -s -a optimal /dev/$DEV mkpart osd-device-$i-data 10G 100%
My disks are gpt too and i'm also using parted. But
Hi,
I'm looking at commit e3ed28eb2 in the next branch,
and I have a question.
Shouldn't the limit be pg_num 65536, because
PGs are numbered 0 thru pg_num-1?
If not, what am I missing?
FWIW, up through yesterday I've been using the next branch and this:
ceph osd pool set data pg_num 65536
Hello Mark,
Am 14.12.2012 16:20, schrieb Mark Nelson:
sudo parted -s -a optimal /dev/$DEV mklabel gpt
sudo parted -s -a optimal /dev/$DEV mkpart osd-device-$i-journal 0% 10G
sudo parted -s -a optimal /dev/$DEV mkpart osd-device-$i-data 10G 100%
Isn't that the part type you're using?
mkpart
On 12/13/2012 01:00 PM, Nick Bartos wrote:
Here's another log with the kernel debugging enabled:
https://gist.github.com/raw/4278697/1c9e41d275e614783fbbdee8ca5842680f46c249/rbd-hang-1355424455.log
Note that it hung on the 2nd try.
Just to make sure I'm working with the right code base, can
The kernel is 3.5.7 with the following patches applied (and in the
order specified below):
001-libceph_eliminate_connection_state_DEAD_13_days_ago.patch
002-libceph_kill_bad_proto_ceph_connection_op_13_days_ago.patch
003-libceph_rename_socket_callbacks_13_days_ago.patch
On 12/14/2012 03:41 PM, Jim Schutt wrote:
Hi,
I'm looking at commit e3ed28eb2 in the next branch,
and I have a question.
Shouldn't the limit be pg_num 65536, because
PGs are numbered 0 thru pg_num-1?
If not, what am I missing?
FWIW, up through yesterday I've been using the next branch and
On 12/14/2012 09:59 AM, Joao Eduardo Luis wrote:
On 12/14/2012 03:41 PM, Jim Schutt wrote:
Hi,
I'm looking at commit e3ed28eb2 in the next branch,
and I have a question.
Shouldn't the limit be pg_num 65536, because
PGs are numbered 0 thru pg_num-1?
If not, what am I missing?
FWIW, up
On Fri, 14 Dec 2012, Stefan Priebe wrote:
One more IMPORTANT note. This might happen due to the fact that a disk was
missing (disk failure) afte the reboot.
fstab and mountpoint are working with UUIDs so they match but the journal
block device:
osd journal = /dev/sde1
didn't match
Hi Sage,
this was just an idea and i need to fix MY uuid problem. But then the
crash is still a problem of ceph. Have you looked into my log?
Am 14.12.2012 20:42, schrieb Sage Weil:
On Fri, 14 Dec 2012, Stefan Priebe wrote:
One more IMPORTANT note. This might happen due to the fact that a
I have updated the testing branch in the ceph-client git
repository again, and you'll find that a forced update is
needed to bring your own repository up to date.
This will probably be necessary again at some point once
we get some reviews done on commits still in this branch,
but we'll try not
Reviewed-by: Sage Weil s...@inktank.com
On Thu, 13 Dec 2012, Alex Elder wrote:
There is no check in rbd_remove() to see if anybody holds open the
image being removed. That's not cool.
Add a simple open count that goes up and down with opens and closes
(releases) of the device, and don't
Reviewed-by: Sage Weil s...@inktank.com
On Thu, 13 Dec 2012, Alex Elder wrote:
If an osd has no requests and no linger requests, __reset_osd()
will just remove it with a call to __remove_osd(). That drops
a reference to the osd, and therefore the osd may have been free
by the time
Reviewed-by: Sage Weil s...@inktank.com
On Thu, 13 Dec 2012, Alex Elder wrote:
RBD_MAX_SEG_NAME_LEN represents the maximum length of an rbd object
name (i.e., one of the objects providing storage backing an rbd
image).
Another symbol, MAX_OBJ_NAME_SIZE, is used in the osd client code to
We should drop this one, I think. See upstream commit
4c199a93a2d36b277a9fd209a0f2793f8460a215. When we added the similar call
on teh request tree it caused some noise in linux-next and then got
removed.
sage
On Thu, 13 Dec 2012, Alex Elder wrote:
It turns out to be harmless but the
When a connection's socket disconnects, or if there's a protocol
error of some kind on the connection, a fault is signaled and
the connection is reset (closed and reopened, basically). We
currently get an error message on the log whenever this occurs.
A ceph connection will attempt to
Reviewed-by: Sage Weil s...@inktank.com
On Fri, 14 Dec 2012, Alex Elder wrote:
When a connection's socket disconnects, or if there's a protocol
error of some kind on the connection, a fault is signaled and
the connection is reset (closed and reopened, basically). We
currently get an error
Reviewed-by: Sage Weil s...@inktank.com
On Thu, 13 Dec 2012, Alex Elder wrote:
In __unregister_linger_request(), the request is being removed
from the osd client's req_linger list only when the request
has a non-null osd pointer. It should be done whether or not
the request currently has an
Reviewed-by: Sage Weil s...@inktank.com
On Thu, 13 Dec 2012, Alex Elder wrote:
A connection's socket can close for any reason, independent of the
state of the connection (and without irrespective of the connection
mutex). As a result, the connectino can be in pretty much any state
at the
28 matches
Mail list logo