On Tue, Jul 10, 2012 at 10:36 AM, Yann Dupont
wrote:
>> Fundamentally, it comes down to this: the two clusters will still have
>> the same fsid, and you won't be isolated from configuration errors or
> (CEPH-PROD is the old btrfs volume ). /CEPH is new xfs volume, completely
> redone & reformatted
Le 10/07/2012 19:11, Tommi Virtanen a écrit :
On Tue, Jul 10, 2012 at 9:39 AM, Yann Dupont wrote:
The cluster mechanism was never intended for moving existing osds to
other clusters. Trying that might not be a good idea.
Ok, good to know. I saw that the remaining maps could lead to problem, bu
On Tue, Jul 10, 2012 at 9:39 AM, Yann Dupont wrote:
>> The cluster mechanism was never intended for moving existing osds to
>> other clusters. Trying that might not be a good idea.
> Ok, good to know. I saw that the remaining maps could lead to problem, but
> in 2 words, what are the other associa
Le 10/07/2012 17:56, Tommi Virtanen a écrit :
On Tue, Jul 10, 2012 at 2:46 AM, Yann Dupont wrote:
As I've keeped the original broken btrfs volumes, I tried this morning to
run the old osd in parrallel, using the $cluster variable. I only have
partial success.
The cluster mechanism was never in
On Tue, Jul 10, 2012 at 2:46 AM, Yann Dupont wrote:
> As I've keeped the original broken btrfs volumes, I tried this morning to
> run the old osd in parrallel, using the $cluster variable. I only have
> partial success.
The cluster mechanism was never intended for moving existing osds to
other cl
Le 09/07/2012 19:14, Samuel Just a écrit :
Can you restart the node that failed to complete the upgrade with
Well, it's a little big complicated ; I now run those nodes with XFS,
and I've long-running jobs on it right now, so I can't stop the ceph
cluster at the moment.
As I've keeped the o
On Mon, Jul 9, 2012 at 12:05 PM, Yann Dupont wrote:
>> The information here isn't enough to say whether the cause of the
>> corruption is btrfs or LevelDB, but the recovery needs to handled by
>> LevelDB -- and upstream is working on making it more robust:
>> http://code.google.com/p/leveldb/issue
Le 09/07/2012 19:43, Tommi Virtanen a écrit :
On Wed, Jul 4, 2012 at 1:06 AM, Yann Dupont wrote:
Well, I probably wasn't clear enough. I talked about crashed FS, but i was
talking about ceph. The underlying FS (btrfs in that case) of 1 node (and
only one) has PROBABLY crashed in the past, causi
On Wed, Jul 4, 2012 at 1:06 AM, Yann Dupont wrote:
> Well, I probably wasn't clear enough. I talked about crashed FS, but i was
> talking about ceph. The underlying FS (btrfs in that case) of 1 node (and
> only one) has PROBABLY crashed in the past, causing corruption in ceph data
> on this node,
Can you restart the node that failed to complete the upgrade with
debug filestore = 20
debug osd = 20
and post the log after an hour or so of running? The upgrade process
might legitimately take a while.
-Sam
On Sat, Jul 7, 2012 at 1:19 AM, Yann Dupont wrote:
> Le 06/07/2012 19:01, Gregory Far
Le 06/07/2012 19:01, Gregory Farnum a écrit :
On Fri, Jul 6, 2012 at 12:19 AM, Yann Dupont wrote:
Le 05/07/2012 23:32, Gregory Farnum a écrit :
[...]
ok, so as all nodes were identical, I probably have hit a btrfs bug (like
a
erroneous out of space ) in more or less the same time. And when 1
On Fri, Jul 6, 2012 at 12:19 AM, Yann Dupont wrote:
> Le 05/07/2012 23:32, Gregory Farnum a écrit :
>
> [...]
>
>>> ok, so as all nodes were identical, I probably have hit a btrfs bug (like
>>> a
>>> erroneous out of space ) in more or less the same time. And when 1 osd
>>> was
>>> out,
>
>
> OH ,
Le 05/07/2012 23:32, Gregory Farnum a écrit :
[...]
ok, so as all nodes were identical, I probably have hit a btrfs bug (like a
erroneous out of space ) in more or less the same time. And when 1 osd was
out,
OH , I didn't finish the sentence... When 1 osd was out, missing data
was copied on a
On Wed, Jul 4, 2012 at 10:53 AM, Yann Dupont wrote:
> Le 04/07/2012 18:21, Gregory Farnum a écrit :
>
>> On Wednesday, July 4, 2012 at 1:06 AM, Yann Dupont wrote:
>>>
>>> Le 03/07/2012 23:38, Tommi Virtanen a écrit :
On Tue, Jul 3, 2012 at 1:54 PM, Yann Dupont >>> (mailto:yann.dup...@uni
Le 04/07/2012 18:21, Gregory Farnum a écrit :
On Wednesday, July 4, 2012 at 1:06 AM, Yann Dupont wrote:
Le 03/07/2012 23:38, Tommi Virtanen a écrit :
On Tue, Jul 3, 2012 at 1:54 PM, Yann Dupont mailto:yann.dup...@univ-nantes.fr)> wrote:
In the case I could repair, do you think a crashed FS as
On Wednesday, July 4, 2012 at 1:06 AM, Yann Dupont wrote:
> Le 03/07/2012 23:38, Tommi Virtanen a écrit :
> > On Tue, Jul 3, 2012 at 1:54 PM, Yann Dupont > (mailto:yann.dup...@univ-nantes.fr)> wrote:
> > > In the case I could repair, do you think a crashed FS as it is right now
> > > is
> > > val
Le 03/07/2012 23:38, Tommi Virtanen a écrit :
On Tue, Jul 3, 2012 at 1:54 PM, Yann Dupont wrote:
In the case I could repair, do you think a crashed FS as it is right now is
valuable for you, for future reference , as I saw you can't reproduce the
problem ? I can make an archive (or a btrfs dump
On Tue, Jul 3, 2012 at 1:54 PM, Yann Dupont wrote:
> In the case I could repair, do you think a crashed FS as it is right now is
> valuable for you, for future reference , as I saw you can't reproduce the
> problem ? I can make an archive (or a btrfs dump ?), but it will be quite
> big.
At this p
Le 03/07/2012 21:42, Tommi Virtanen a écrit :
On Tue, Jul 3, 2012 at 1:40 AM, Yann Dupont wrote:
Upgraded the kernel to 3.5.0-rc4 + some patches, seems btrfs is OK right
now.
Tried to restart osd with 0.47.3, then next branch, and today with 0.48.
4 of 8 nodes fails with the same message :
c
On Tue, Jul 3, 2012 at 1:40 AM, Yann Dupont wrote:
> Upgraded the kernel to 3.5.0-rc4 + some patches, seems btrfs is OK right
> now.
>
> Tried to restart osd with 0.47.3, then next branch, and today with 0.48.
>
> 4 of 8 nodes fails with the same message :
>
> ceph version 0.48argonaut (commit:c2b
Le 04/06/2012 19:40, Sam Just a écrit :
Can you send the osd logs? The merge_log crashes are probably fixable
if I can see the logs.
Well I'm sorry - As I send in private mail I was away from computer for
a long time.
I can't send those logs anymore, they are rotated now...
Anyway. Now tha
This is probably the same/similar to http://tracker.newdream.net/issues/2462,
no? There's a log there, though I've no idea how helpful it is.
On Monday, June 4, 2012 at 10:40 AM, Sam Just wrote:
> Can you send the osd logs? The merge_log crashes are probably fixable
> if I can see the logs.
>
Can you send the osd logs? The merge_log crashes are probably fixable
if I can see the logs.
The leveldb crash is almost certainly a result of memory corruption.
Thanks
-Sam
On Mon, Jun 4, 2012 at 9:16 AM, Tommi Virtanen wrote:
> On Mon, Jun 4, 2012 at 1:44 AM, Yann Dupont
> wrote:
>> Result
On Mon, Jun 4, 2012 at 1:44 AM, Yann Dupont wrote:
> Results : Worked like a charm during two days, apart btrfs warn messages
> then OSD begin to crash 1 after all 'domino style'.
Sorry to hear that. Reading through your message, there seem to be
several problems; whether they are because of the
Hello,
Besides the performance inconsistency (see other thread titled poor OSD
performance using kernel 3.4) where I promised some tests (will run this
afternoon), we tried this week-end to stress test ceph, making backups
with bacula on a rbd volume of 15T (8 osd nodes, using 8 physical machin
25 matches
Mail list logo