Re: Supervisely, RAID0 provides best io performance whereas no RAID the worst

2016-08-02 Thread Dejan Menges
Hi Shady, Great point, didn't know it. Thanks a lot, will definitely check if this was only related to HWX distribution. Thanks a lot, and sorry if I spammed this topic, it wasn't my intention at all. Dejan On Tue, Aug 2, 2016 at 9:37 AM Shady Xu wrote: > Hi Dejan, > > I

Re: Supervisely, RAID0 provides best io performance whereas no RAID the worst

2016-08-02 Thread Shady Xu
Hi Dejan, I checked on Github and found that DEFAULT_DATA_SOCKET_SIZE locates in the hadoop-hdfs-project/hadoop-hdfs-client/ package in the apache version of Hadoop, whereas hadoop-hdfs-project/hadoop-hdfs/ in that of Hortonworks. I am not sure if that means that parameter affects the performance

Re: Supervisely, RAID0 provides best io performance whereas no RAID the worst

2016-08-01 Thread Dejan Menges
Hi Shady, We did extensive tests on this and received fix from Hortonworks which we are probably first and only to test most likely tomorrow evening. If Hortonworks guys are reading this maybe they know official HDFS ticket ID for this, if there is such, as I can not find it in our

Re: Supervisely, RAID0 provides best io performance whereas no RAID the worst

2016-08-01 Thread Shady Xu
Thanks Allen. I am aware of the fact you said and am wondering what's the await and svctm on your cluster nodes. If there are no signifiant difference, maybe I should try other ways to tune my HBase. And Dejan, I've never heard of or noticed what you said. If that's true it's really disappointing

Re: Supervisely, RAID0 provides best io performance whereas no RAID the worst

2016-08-01 Thread Dejan Menges
Sorry for jumping in, but hence performance... it took as a while to figure out why, whatever disk/RAID0 performance you have, when it comes to HDFS and replication factor bigger then zero, disk write speed drops to 100Mbps... After long long tests with Hortonworks they found that issue is that

Re: Supervisely, RAID0 provides best io performance whereas no RAID the worst

2016-08-01 Thread Allen Wittenauer
On 2016-07-30 20:12 (-0700), Shady Xu wrote: > Thanks Andrew, I know about the disk failure risk and that it's one of the > reasons why we should use JBOD. But JBOD provides worse performance than > RAID 0. It's not about failure: it's about speed. RAID0 performance will

Re: Supervisely, RAID0 provides best io performance whereas no RAID the worst

2016-08-01 Thread Allen Wittenauer
On 2016-07-30 20:12 (-0700), Shady Xu wrote: > Thanks Andrew, I know about the disk failure risk and that it's one of the > reasons why we should use JBOD. But JBOD provides worse performance than > RAID 0. And take into account the fact that HDFS does have other >

Re: Supervisely, RAID0 provides best io performance whereas no RAID the worst

2016-07-31 Thread daemeon reiydelle
Have you considered the probability (mean time to failure - not mean time TO failure) of a disk, then factor the probability is 12 times as likely with a raid 0? Then compare that the the time to replicate in degraded mode where you have such a large number of drives on each node? Secondly, there

Re: Supervisely, RAID0 provides best io performance whereas no RAID the worst

2016-07-30 Thread Shady Xu
Thanks Andrew, I know about the disk failure risk and that it's one of the reasons why we should use JBOD. But JBOD provides worse performance than RAID 0. And take into account the fact that HDFS does have other replications and it will make one more replication on another DataNode when disk

Re: Supervisely, RAID0 provides best io performance whereas no RAID the worst

2016-07-30 Thread Andrew Wright
Yes you are. If you loose any one of your disks with a raid 0 spanning all drive you will loose all the data in that directory. And disks do die. Yes you get better single threaded performance but are putting that entire directory/data set at higher risk Cheers On Saturday, July 30, 2016,

Supervisely, RAID0 provides best io performance whereas no RAID the worst

2016-07-30 Thread Shady Xu
Hi, It's widely known that we should mount disks to different directory without any RAID configurations because it provides the best io performance. However, lately I have done some tests with three different configurations and found this may not be the truth. Below are the configurations and