subject:"Loading Files from HDFS Incurs Network Communication"

Re: Loading Files from HDFS Incurs Network Communication

2015-10-26 Thread Steve Loughran

On 26 Oct 2015, at 11:21, Sean Owen mailto:so...@cloudera.com>> wrote: Yeah, are these stats actually reflecting data read locally, like through the loopback interface? I'm also no expert on the internals here but this may be measuring effectively local reads. Or are you sure it's not? HDFS

Re: Loading Files from HDFS Incurs Network Communication

2015-10-26 Thread Jinfeng Li

Hi, yes, it should be the same issue, but the solution doesn't apply in our situation. Anyway, thanks a lot for your replies. On Mon, Oct 26, 2015 at 7:44 PM Sean Owen wrote: > Hm, now I wonder if it's the same issue here: > https://issues.apache.org/jira/browse/SPARK-10149 > > Does the setting

Re: Loading Files from HDFS Incurs Network Communication

2015-10-26 Thread Sean Owen

Hm, now I wonder if it's the same issue here: https://issues.apache.org/jira/browse/SPARK-10149 Does the setting described there help? On Mon, Oct 26, 2015 at 11:39 AM, Jinfeng Li wrote: > Hi, I have already tried the same code with Spark 1.3.1, there is no such > problem. The configuration fil

Re: Loading Files from HDFS Incurs Network Communication

2015-10-26 Thread Jinfeng Li

Hi, I have already tried the same code with Spark 1.3.1, there is no such problem. The configuration files are all directly copied from Spark 1.5.1. I feel it is a bug on Spark 1.5.1. Thanks a lot for your response. On Mon, Oct 26, 2015 at 7:21 PM Sean Owen wrote: > Yeah, are these stats actual

Re: Loading Files from HDFS Incurs Network Communication

2015-10-26 Thread Sean Owen

Yeah, are these stats actually reflecting data read locally, like through the loopback interface? I'm also no expert on the internals here but this may be measuring effectively local reads. Or are you sure it's not? On Mon, Oct 26, 2015 at 11:14 AM, Steve Loughran wrote: > > > On 26 Oct 2015, at

Re: Loading Files from HDFS Incurs Network Communication

2015-10-26 Thread Steve Loughran

> On 26 Oct 2015, at 09:28, Jinfeng Li wrote: > > Replication factor is 3 and we have 18 data nodes. We check HDFS webUI, data > is evenly distributed among 18 machines. > every block in HDFS (usually 64-128-256 MB) is distributed across three machines, meaning 3 machines have it local, 15

Re: Loading Files from HDFS Incurs Network Communication

2015-10-26 Thread Jinfeng Li

I use standalone mode. Each machine has 4 workers. Spark is deployed correctly as webUI and jps command can show that. Actually, we are a team and already use spark for nearly half a year, started from Spark 1.3.1. We find this problem on one of our application and I write a simple program to demo

Re: Loading Files from HDFS Incurs Network Communication

2015-10-26 Thread Sean Owen

Hm, how about the opposite question -- do you have just 1 executor? then again everything will be remote except for a small fraction of blocks. On Mon, Oct 26, 2015 at 9:28 AM, Jinfeng Li wrote: > Replication factor is 3 and we have 18 data nodes. We check HDFS webUI, > data is evenly distribute

Re: Loading Files from HDFS Incurs Network Communication

2015-10-26 Thread Jinfeng Li

Replication factor is 3 and we have 18 data nodes. We check HDFS webUI, data is evenly distributed among 18 machines. On Mon, Oct 26, 2015 at 5:18 PM Sean Owen wrote: > Have a look at your HDFS replication, and where the blocks are for these > files. For example, if you had only 2 HDFS data nod

Re: Loading Files from HDFS Incurs Network Communication

2015-10-26 Thread Sean Owen

Have a look at your HDFS replication, and where the blocks are for these files. For example, if you had only 2 HDFS data nodes, then data would be remote to 16 of 18 workers and always entail a copy. On Mon, Oct 26, 2015 at 9:12 AM, Jinfeng Li wrote: > I cat /proc/net/dev and then take the diffe

Re: Loading Files from HDFS Incurs Network Communication

2015-10-26 Thread Jinfeng Li

The input data is a number of 16M files. On Mon, Oct 26, 2015 at 5:12 PM Jinfeng Li wrote: > I cat /proc/net/dev and then take the difference of received bytes before > and after the job. I also see a long-time peak (nearly 600Mb/s) in nload > interface. We have 18 machines and each machine rec

Re: Loading Files from HDFS Incurs Network Communication

2015-10-26 Thread Jinfeng Li

I cat /proc/net/dev and then take the difference of received bytes before and after the job. I also see a long-time peak (nearly 600Mb/s) in nload interface. We have 18 machines and each machine receives 4.7G bytes. On Mon, Oct 26, 2015 at 5:00 PM Sean Owen wrote: > -dev +user > How are you mea

Re: Loading Files from HDFS Incurs Network Communication

2015-10-26 Thread Sean Owen

-dev +user How are you measuring network traffic? It's not in general true that there will be zero network traffic, since not all executors are local to all data. That can be the situation in many cases but not always. On Mon, Oct 26, 2015 at 8:57 AM, Jinfeng Li wrote: > Hi, I find that loading

Re: Loading Files from HDFS Incurs Network Communication

Re: Loading Files from HDFS Incurs Network Communication

Re: Loading Files from HDFS Incurs Network Communication

Re: Loading Files from HDFS Incurs Network Communication

Re: Loading Files from HDFS Incurs Network Communication

Re: Loading Files from HDFS Incurs Network Communication

Re: Loading Files from HDFS Incurs Network Communication

Re: Loading Files from HDFS Incurs Network Communication

Re: Loading Files from HDFS Incurs Network Communication

Re: Loading Files from HDFS Incurs Network Communication

Re: Loading Files from HDFS Incurs Network Communication

Re: Loading Files from HDFS Incurs Network Communication

Re: Loading Files from HDFS Incurs Network Communication

13 matches

Site Navigation

Mail list logo

Footer information