Re: Log hdfs blocks sending

2014-09-27 Thread Andrew Ash
Hi Alexey, You're looking in the right place in the first log from the driver. Specifically the locality is on the TaskSetManager INFO log level and looks like this: 14/09/26 16:57:31 INFO TaskSetManager: Starting task 9.0 in stage 1.0 (TID 10, 10.54.255.191, ANY, 1341 bytes) The ANY there mean

Re: Log hdfs blocks sending

2014-09-26 Thread Alexey Romanchuk
Hello Andrew! Thanks for reply. Which logs and on what level should I check? Driver, master or worker? I found this on master node, but there is only ANY locality requirement. Here it is the driver (spark sql) log - https://gist.github.com/13h3r/c91034307caa33139001 and one of the workers log - h

Re: Log hdfs blocks sending

2014-09-25 Thread Andrew Ash
Hi Alexey, You should see in the logs a locality measure like NODE_LOCAL, PROCESS_LOCAL, ANY, etc. If your Spark workers each have an HDFS data node on them and you're reading out of HDFS, then you should be seeing almost all NODE_LOCAL accesses. One cause I've seen for mismatches is if Spark us

Log hdfs blocks sending

2014-09-25 Thread Alexey Romanchuk
Hello again spark users and developers! I have standalone spark cluster (1.1.0) and spark sql running on it. My cluster consists of 4 datanodes and replication factor of files is 3. I use thrift server to access spark sql and have 1 table with 30+ partitions. When I run query on whole table (some