Re: [ceph-users] Hadoop/Ceph and DFS IO tests

2013-07-20 Thread Mikaƫl Cluseau
Hi, On 07/11/13 12:23, ker can wrote: Unfortunately I currently do not have access to SSDs, so I had a separate disk for the journal for each data disk for now. you can try the RAM as a journal (well... not in production of course), if you want an idea of the performance on SSDs. I tried this

Re: [ceph-users] Hadoop/Ceph and DFS IO tests

2013-07-11 Thread ker can
yep - thats right. 3 OSD daemons per node. On Thu, Jul 11, 2013 at 9:16 AM, Noah Watkins wrote: > On Wed, Jul 10, 2013 at 6:23 PM, ker can wrote: > > > > Now separating out the journal from data disk ... > > > > HDFS write numbers (3 disks/data node) > > Average execution time: 466 > > Best ex

Re: [ceph-users] Hadoop/Ceph and DFS IO tests

2013-07-11 Thread Noah Watkins
On Wed, Jul 10, 2013 at 6:23 PM, ker can wrote: > > Now separating out the journal from data disk ... > > HDFS write numbers (3 disks/data node) > Average execution time: 466 > Best execution time : 426 > Worst execution time : 508 > > ceph write numbers (3 data disks/data node + 3 journal d

Re: [ceph-users] Hadoop/Ceph and DFS IO tests

2013-07-10 Thread ker can
Ran the DFS IO write tests: - Increasing the journal log size did not make any difference for me ... i guess the number i had set was sufficient. For the rest of the tests I kept it at a generous 10GB. - Separating out the journal from the data disk did make a difference as expected. Unfortunately

Re: [ceph-users] Hadoop/Ceph and DFS IO tests

2013-07-10 Thread Noah Watkins
On Wed, Jul 10, 2013 at 9:17 AM, ker can wrote: > > Seems like a good read ahead value that the ceph hadoop client can use as a > default ! Great, I'll add this tunable to the list of changes to be pushed into next release. > I'll look at the DFS write tests later today any tuning suggest

Re: [ceph-users] Hadoop/Ceph and DFS IO tests

2013-07-10 Thread ker can
hi noah, Some results for the read tests: I set client_readahead_min=4193404 which is the default for hadoop dfs.datanode.readahead.bytes also. I ran the dfsio test 6 times each for HDFS, Ceph with default read ahead & ceph with readahead=4193404. Setting read ahead in ceph did give about a 10%

Re: [ceph-users] Hadoop/Ceph and DFS IO tests

2013-07-09 Thread Noah Watkins
Yes, the libcephfs client. You should be able to adjust the settings without changing any code. The settings should be adjustable either by setting the config options in ceph.conf, or using the "ceph.conf.options" settings in Hadoop's core-site.xml. On Tue, Jul 9, 2013 at 4:26 PM, ker can wrote:

Re: [ceph-users] Hadoop/Ceph and DFS IO tests

2013-07-09 Thread ker can
Makes sense. I can try playing around with these settings when you're saying client, would this be libcephfs.so ? On Tue, Jul 9, 2013 at 5:35 PM, Noah Watkins wrote: > Greg pointed out the read-ahead client options. I would suggest > fiddling with these settings. If things improve, we

Re: [ceph-users] Hadoop/Ceph and DFS IO tests

2013-07-09 Thread Noah Watkins
Greg pointed out the read-ahead client options. I would suggest fiddling with these settings. If things improve, we can put automatic configuration of these settings into the Hadoop client itself. At the very least, we should be able to see if it is the read-ahead that is causing performance proble

Re: [ceph-users] Hadoop/Ceph and DFS IO tests

2013-07-09 Thread Noah Watkins
> Is the JNI interface still an issue or have we moved past that ? We haven't done much performance tuning with Hadoop, but I suspect that the JNI interface is not a bottleneck. My very first thought about what might be causing slow read performance is the read-ahead settings we use vs Hadoop. Ha

Re: [ceph-users] Hadoop/Ceph and DFS IO tests

2013-07-09 Thread ker can
by the way ... here's the log of the write. 13/07/09 05:52:56 INFO fs.TestDFSIO: - TestDFSIO - : write (HDFS) 13/07/09 05:52:56 INFO fs.TestDFSIO:Date & time: Tue Jul 09 05:52:56 PDT 2013 13/07/09 05:52:56 INFO fs.TestDFSIO:Number of files: 300 13/07/09 05:52:56 INFO fs

Re: [ceph-users] Hadoop/Ceph and DFS IO tests

2013-07-09 Thread ker can
For this particular test I turned off replication for both hdfs and ceph. So there is just one copy of the data lying around. hadoop@vega7250:~$ ceph osd dump | grep rep pool 0 'data' rep size 1 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 960 pgp_num 960 last_change 26 owner 0 crash_rep

Re: [ceph-users] Hadoop/Ceph and DFS IO tests

2013-07-09 Thread Noah Watkins
On Tue, Jul 9, 2013 at 12:35 PM, ker can wrote: > hi Noah, > > while we're still on the hadoop topic ... I was also trying out the > TestDFSIO tests ceph v/s hadoop. The Read tests on ceph takes about 1.5x > the hdfs time. The write tests are worse about ... 2.5x the time on hdfs, > but I guess

[ceph-users] Hadoop/Ceph and DFS IO tests

2013-07-09 Thread ker can
hi Noah, while we're still on the hadoop topic ... I was also trying out the TestDFSIO tests ceph v/s hadoop. The Read tests on ceph takes about 1.5x the hdfs time. The write tests are worse about ... 2.5x the time on hdfs, but I guess we have additional journaling overheads for the writes on ce