Re: [ceph-users] replace dead SSD journal

2015-05-06 Thread Andrija Panic
Well, seems like they are on satellite :) On 6 May 2015 at 02:58, Matthew Monaco m...@monaco.cx wrote: On 05/05/2015 08:55 AM, Andrija Panic wrote: Hi, small update: in 3 months - we lost 5 out of 6 Samsung 128Gb 850 PROs (just few days in between of each SSD death) - cant believe

Re: [ceph-users] replace dead SSD journal

2015-05-05 Thread Matthew Monaco
On 05/05/2015 08:55 AM, Andrija Panic wrote: Hi, small update: in 3 months - we lost 5 out of 6 Samsung 128Gb 850 PROs (just few days in between of each SSD death) - cant believe it - NOT due to wearing out... I really hope we got efective series from suplier... That's ridiculous. Are

Re: [ceph-users] replace dead SSD journal

2015-05-05 Thread Andrija Panic
Hi, small update: in 3 months - we lost 5 out of 6 Samsung 128Gb 850 PROs (just few days in between of each SSD death) - cant believe it - NOT due to wearing out... I really hope we got efective series from suplier... Regards On 18 April 2015 at 14:24, Andrija Panic andrija.pa...@gmail.com

Re: [ceph-users] replace dead SSD journal

2015-04-18 Thread Andrija Panic
yes I know, but to late now, I'm afraid :) On 18 April 2015 at 14:18, Josef Johansson jose...@gmail.com wrote: Have you looked into the samsung 845 dc? They are not that expensive last time I checked. /Josef On 18 Apr 2015 13:15, Andrija Panic andrija.pa...@gmail.com wrote: might be true,

Re: [ceph-users] replace dead SSD journal

2015-04-18 Thread Andrija Panic
might be true, yes - we had Intel 128GB (intel S3500 or S3700) - but these have horrible random/sequetial speeds - Samsun 850 PROs are 3 times at least faster on sequential, and more than 3 times faser on random/IOPS measures. And ofcourse modern enterprise drives = ... On 18 April 2015 at

Re: [ceph-users] replace dead SSD journal

2015-04-18 Thread Mark Kirkwood
Yes, it sure is - my experience with 'consumer' SSD is that they die with obscure firmware bugs (wrong capacity, zero capacity, not detected in bios anymore) rather than flash wearout. It seems that the 'enterprise' tagged drives are less inclined to suffer this fate. Regards Mark On

Re: [ceph-users] replace dead SSD journal

2015-04-18 Thread Josef Johansson
Have you looked into the samsung 845 dc? They are not that expensive last time I checked. /Josef On 18 Apr 2015 13:15, Andrija Panic andrija.pa...@gmail.com wrote: might be true, yes - we had Intel 128GB (intel S3500 or S3700) - but these have horrible random/sequetial speeds - Samsun 850 PROs

Re: [ceph-users] replace dead SSD journal

2015-04-18 Thread Steffen W Sørensen
On 17/04/2015, at 21.07, Andrija Panic andrija.pa...@gmail.com wrote: nahSamsun 850 PRO 128GB - dead after 3months - 2 of these died... wearing level is 96%, so only 4% wasted... (yes I know these are not enterprise,etc… ) Damn… but maybe your surname says it all - Don’t Panic :) But

Re: [ceph-users] replace dead SSD journal

2015-04-18 Thread Josef Johansson
If the same chassi/chip/backplane is behind both drives and maybe other drives in the chassi have troubles,it may be a defect there as well. On 18 Apr 2015 09:42, Steffen W Sørensen ste...@me.com wrote: On 17/04/2015, at 21.07, Andrija Panic andrija.pa...@gmail.com wrote: nahSamsun 850

Re: [ceph-users] replace dead SSD journal

2015-04-18 Thread Andrija Panic
heh :) yes, intresting last name :) anyway, all are the exact same age, we implememnted new CEPH nodes at exactly same time - but it's now wearing problem - the dead SSDs were siply DEAD - smartctl-a showing nothing, except 600 PB space/size :) On 18 April 2015 at 09:41, Steffen W Sørensen

Re: [ceph-users] replace dead SSD journal

2015-04-18 Thread Andrija Panic
these 2 drives, are on the regular SATA (on board)controler, and beside this, there is 12 x 4TB on the fron of the servers - normal backplane on the front. Anyway, we are going to check those dead SSDs on a pc/laptop or so,just to confirm they are really dead - but this is the way they die, not

[ceph-users] replace dead SSD journal

2015-04-17 Thread Andrija Panic
Hi guys, I have 1 SSD that hosted 6 OSD's Journals, that is dead, so 6 OSD down, ceph rebalanced etc. Now I have new SSD inside, and I will partition it etc - but would like to know, how to proceed now, with the journal recreation for those 6 OSDs that are down now. Should I flush journal

Re: [ceph-users] replace dead SSD journal

2015-04-17 Thread Steffen W Sørensen
I have 1 SSD that hosted 6 OSD's Journals, that is dead, so 6 OSD down, ceph rebalanced etc. Now I have new SSD inside, and I will partition it etc - but would like to know, how to proceed now, with the journal recreation for those 6 OSDs that are down now. Well assuming the OSDs are

Re: [ceph-users] replace dead SSD journal

2015-04-17 Thread Andrija Panic
SSD died that hosted journals for 6 OSDs - 2 x SSD died, so 12 OSDs are down, and rebalancing is about finish... after which I need to fix the OSDs. On 17 April 2015 at 19:01, Josef Johansson jo...@oderland.se wrote: Hi, Did 6 other OSDs go down when re-adding? /Josef On 17 Apr 2015, at

Re: [ceph-users] replace dead SSD journal

2015-04-17 Thread Robert LeBlanc
Delete and re-add all six OSDs. On Fri, Apr 17, 2015 at 3:36 AM, Andrija Panic andrija.pa...@gmail.com wrote: Hi guys, I have 1 SSD that hosted 6 OSD's Journals, that is dead, so 6 OSD down, ceph rebalanced etc. Now I have new SSD inside, and I will partition it etc - but would like to

Re: [ceph-users] replace dead SSD journal

2015-04-17 Thread Josef Johansson
Hi, Did 6 other OSDs go down when re-adding? /Josef On 17 Apr 2015, at 18:49, Andrija Panic andrija.pa...@gmail.com wrote: 12 osds down - I expect less work with removing and adding osd? On Apr 17, 2015 6:35 PM, Krzysztof Nowicki krzysztof.a.nowi...@gmail.com

Re: [ceph-users] replace dead SSD journal

2015-04-17 Thread Andrija Panic
Thx guys, thats what I will be doing at the end. Cheers On Apr 17, 2015 6:24 PM, Robert LeBlanc rob...@leblancnet.us wrote: Delete and re-add all six OSDs. On Fri, Apr 17, 2015 at 3:36 AM, Andrija Panic andrija.pa...@gmail.com wrote: Hi guys, I have 1 SSD that hosted 6 OSD's Journals,

Re: [ceph-users] replace dead SSD journal

2015-04-17 Thread Andrija Panic
12 osds down - I expect less work with removing and adding osd? On Apr 17, 2015 6:35 PM, Krzysztof Nowicki krzysztof.a.nowi...@gmail.com wrote: Why not just wipe out the OSD filesystem, run ceph-osd --mkfs with the existing OSD UUID, copy the keyring and let it populate itself? pt., 17 kwi

Re: [ceph-users] replace dead SSD journal

2015-04-17 Thread Andrija Panic
nahSamsun 850 PRO 128GB - dead after 3months - 2 of these died... wearing level is 96%, so only 4% wasted... (yes I know these are not enterprise,etc... ) On 17 April 2015 at 21:01, Josef Johansson jose...@gmail.com wrote: tough luck, hope everything comes up ok afterwards. What models on

Re: [ceph-users] replace dead SSD journal

2015-04-17 Thread Josef Johansson
the massive rebalancing does not affect the ssds in a good way either. But from what I've gatherd the pro should be fine. Massive amount of write errors in the logs? /Josef On 17 Apr 2015 21:07, Andrija Panic andrija.pa...@gmail.com wrote: nahSamsun 850 PRO 128GB - dead after 3months - 2 of

Re: [ceph-users] replace dead SSD journal

2015-04-17 Thread Josef Johansson
tough luck, hope everything comes up ok afterwards. What models on the SSD? /Josef On 17 Apr 2015 20:05, Andrija Panic andrija.pa...@gmail.com wrote: SSD died that hosted journals for 6 OSDs - 2 x SSD died, so 12 OSDs are down, and rebalancing is about finish... after which I need to fix the

Re: [ceph-users] replace dead SSD journal

2015-04-17 Thread Krzysztof Nowicki
I have two of them in my cluster (plus one 256GB version) for about half a year now. So far so good. I'll be keeping a closer look at them. pt., 17 kwi 2015, 21:07 Andrija Panic użytkownik andrija.pa...@gmail.com napisał: nahSamsun 850 PRO 128GB - dead after 3months - 2 of these died...

Re: [ceph-users] replace dead SSD journal

2015-04-17 Thread Andrija Panic
damn, good news for me, pssibly bad news for you :) what is wearing level (samrtctl -a /dev/sdX) - attribute near the end of the atribute list... thx On 17 April 2015 at 21:12, Krzysztof Nowicki krzysztof.a.nowi...@gmail.com wrote: I have two of them in my cluster (plus one 256GB version) for

Re: [ceph-users] replace dead SSD journal

2015-04-17 Thread Krzysztof Nowicki
Checked the SMART status. All of the Samsungs have Wear Leveling Count equal to 99 (raw values 29, 36 and 15). I'm going to have to monitor them - I could afford loosing one of them, but loosing two would mean loss of data. pt., 17 kwi 2015 o 21:22 użytkownik Josef Johansson jose...@gmail.com

Re: [ceph-users] replace dead SSD journal

2015-04-17 Thread Krzysztof Nowicki
Why not just wipe out the OSD filesystem, run ceph-osd --mkfs with the existing OSD UUID, copy the keyring and let it populate itself? pt., 17 kwi 2015 o 18:31 użytkownik Andrija Panic andrija.pa...@gmail.com napisał: Thx guys, thats what I will be doing at the end. Cheers On Apr 17, 2015