On 26 Oct 2015, at 11:21, Sean Owen
mailto:so...@cloudera.com>> wrote:
Yeah, are these stats actually reflecting data read locally, like through the
loopback interface? I'm also no expert on the internals here but this may be
measuring effectively local reads. Or are you sure it's not?
HDFS
Hi, yes, it should be the same issue, but the solution doesn't apply in our
situation. Anyway, thanks a lot for your replies.
On Mon, Oct 26, 2015 at 7:44 PM Sean Owen wrote:
> Hm, now I wonder if it's the same issue here:
> https://issues.apache.org/jira/browse/SPARK-10149
>
> Does the setting
Hm, now I wonder if it's the same issue here:
https://issues.apache.org/jira/browse/SPARK-10149
Does the setting described there help?
On Mon, Oct 26, 2015 at 11:39 AM, Jinfeng Li wrote:
> Hi, I have already tried the same code with Spark 1.3.1, there is no such
> problem. The configuration fil
Hi, I have already tried the same code with Spark 1.3.1, there is no such
problem. The configuration files are all directly copied from Spark 1.5.1.
I feel it is a bug on Spark 1.5.1.
Thanks a lot for your response.
On Mon, Oct 26, 2015 at 7:21 PM Sean Owen wrote:
> Yeah, are these stats actual
Yeah, are these stats actually reflecting data read locally, like through
the loopback interface? I'm also no expert on the internals here but this
may be measuring effectively local reads. Or are you sure it's not?
On Mon, Oct 26, 2015 at 11:14 AM, Steve Loughran
wrote:
>
> > On 26 Oct 2015, at
> On 26 Oct 2015, at 09:28, Jinfeng Li wrote:
>
> Replication factor is 3 and we have 18 data nodes. We check HDFS webUI, data
> is evenly distributed among 18 machines.
>
every block in HDFS (usually 64-128-256 MB) is distributed across three
machines, meaning 3 machines have it local, 15
I use standalone mode. Each machine has 4 workers. Spark is deployed
correctly as webUI and jps command can show that.
Actually, we are a team and already use spark for nearly half a year,
started from Spark 1.3.1. We find this problem on one of our application
and I write a simple program to demo
Hm, how about the opposite question -- do you have just 1 executor? then
again everything will be remote except for a small fraction of blocks.
On Mon, Oct 26, 2015 at 9:28 AM, Jinfeng Li wrote:
> Replication factor is 3 and we have 18 data nodes. We check HDFS webUI,
> data is evenly distribute
Replication factor is 3 and we have 18 data nodes. We check HDFS webUI,
data is evenly distributed among 18 machines.
On Mon, Oct 26, 2015 at 5:18 PM Sean Owen wrote:
> Have a look at your HDFS replication, and where the blocks are for these
> files. For example, if you had only 2 HDFS data nod
Have a look at your HDFS replication, and where the blocks are for these
files. For example, if you had only 2 HDFS data nodes, then data would be
remote to 16 of 18 workers and always entail a copy.
On Mon, Oct 26, 2015 at 9:12 AM, Jinfeng Li wrote:
> I cat /proc/net/dev and then take the diffe
The input data is a number of 16M files.
On Mon, Oct 26, 2015 at 5:12 PM Jinfeng Li wrote:
> I cat /proc/net/dev and then take the difference of received bytes before
> and after the job. I also see a long-time peak (nearly 600Mb/s) in nload
> interface. We have 18 machines and each machine rec
I cat /proc/net/dev and then take the difference of received bytes before
and after the job. I also see a long-time peak (nearly 600Mb/s) in nload
interface. We have 18 machines and each machine receives 4.7G bytes.
On Mon, Oct 26, 2015 at 5:00 PM Sean Owen wrote:
> -dev +user
> How are you mea
-dev +user
How are you measuring network traffic?
It's not in general true that there will be zero network traffic, since not
all executors are local to all data. That can be the situation in many
cases but not always.
On Mon, Oct 26, 2015 at 8:57 AM, Jinfeng Li wrote:
> Hi, I find that loading
13 matches
Mail list logo