Thanks Larry for sharing your findings.Number looks much better than mine. I
tried with 0.9.1. Should i upgrade to 0,10,
If possible can you please share your exact command with various options. Did
you try with SSL on? My two hosts were different and i tried it from KNOX_HOST
box.
Any other idea of how can i get better number?
Regards,Mohammad
On Saturday, November 5, 2016 6:55 PM, larry mccay <[email protected]>
wrote:
Hi Mohammad -
I have played around with this a bit and haven't been able to reproduce your
results.
My environment is a sandbox VM download and the Apache Knox 0.10.0 test
instance running on the host machine.I put an ~8.5 GB file in hdfs and OPENed
it with and without Knox.
With Knox:100 8470M 0 8470M 0 0 9.9M 0 --:--:-- 0:14:09
--:--:-- 9.9M
Direct to WebHDFS:100 8470M 0 8470M 0 0 13.6M 0 --:--:--
0:10:20 --:--:-- 14.9M
While we are certainly not speeding things up it isn't too bad.I believe that
there is still room for some optimization in our rewrite process as has been
discussed a bit on [1].
This would get the numbers even closer together probably.However, even that
won't make up the difference that you are seeing.
I wonder what your test environment looks like where you are getting 99.6M avg
speed direct and 4.8M from Knox.If the KNOX_HOST and WEBHDFS_HOST are different
machines maybe you should try the direct curl command from the KNOX_HOST and
see if there is a difference being introduced by the network or something like
that.
thanks,
--larry
[1] https://issues.apache.org/jira/browse/KNOX-767
On Sat, Nov 5, 2016 at 6:41 PM, larry mccay <[email protected]> wrote:
Hi Mohammad -
Thanks for reporting this.
That is a big difference.Let me play around with it and see what I can
reproduce.
thanks,
--larry
On Sat, Nov 5, 2016 at 5:52 PM, Mohammad Islam <[email protected]> wrote:
Hi,I did a very basic comparison of download speed. I used similar "curl .."
command to download a large file (13.6 GB) and gathered the numbers.
Looks like WebHDFS with Knox is very slow ( at least 20x slower). I ran it
twice with similar numbers. For Knox, I turned off SSL and both cases I used
unsecured (non-Kerberos) cluster.
Download with Knox took nearly 49 minutes whereas direct download took 2 mins.
The download speed was 4811k for Knox and 99.6M for direct download.
I'm sure I have done something wrong. Do you see any such performance? Any help
will be really appreciated.
Regards,Mohammad
Interactions:curl -o t2.direct -L http://<WEBHDFS_HOST>:50070/we
bhdfs/v1/<FILE_PATH>?op=OPEN % Total % Received % Xferd Average Speed
Time Time Time Current Dload Upload
Total Spent Left Speed 0 0 0 0 0 0 0 0
--:--:-- --:--:-- --:--:-- 0100 13.5G 100 13.5G 0 0 99.6M 0
0:02:19 0:02:19 --:--:-- 117M
curl -H X-Auth-Params-Email: [email protected] -o t2 -L
http://<KNOW_HOST>:8445/gatewa y/sandbox/webhdfs/v1/<FILE_ PATH>?op=OPEN %
Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed 0
0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 0 0
0 13.5G 0 0 4811k 0 --:--:-- 0:49:12 --:--:-- 6121k