Hi Alexey,
You're looking in the right place in the first log from the driver.
Specifically the locality is on the TaskSetManager INFO log level and looks
like this:
14/09/26 16:57:31 INFO TaskSetManager: Starting task 9.0 in stage 1.0
(TID 10, 10.54.255.191, ANY, 1341 bytes)
The ANY there mean
Hello Andrew!
Thanks for reply. Which logs and on what level should I check? Driver,
master or worker?
I found this on master node, but there is only ANY locality requirement.
Here it is the driver (spark sql) log -
https://gist.github.com/13h3r/c91034307caa33139001 and one of the workers
log - h
Hi Alexey,
You should see in the logs a locality measure like NODE_LOCAL,
PROCESS_LOCAL, ANY, etc. If your Spark workers each have an HDFS data node
on them and you're reading out of HDFS, then you should be seeing almost
all NODE_LOCAL accesses. One cause I've seen for mismatches is if Spark
us
Hello again spark users and developers!
I have standalone spark cluster (1.1.0) and spark sql running on it. My
cluster consists of 4 datanodes and replication factor of files is 3.
I use thrift server to access spark sql and have 1 table with 30+
partitions. When I run query on whole table (some