>Why I think its the number of files is that I believe that a
> all of those or large part of those files are read when 
>you run sqlContext.parquetFile() and the time it would 
>take in s3 for that to happen is a lot so something 
>internally is timing out.. 

I'll create the parquet files with Drill instead of Spark which will give me
(somewhat) better control over the slice sizes and see what happens.

That said, this behavior seems wrong to me. First, exiting due to inactivity
on a job seems like (perhaps?) the wrong fix to a former problem.  Second,
there IS activity if it's reading the slice headers but the job is exiting
anyway. So if this fixes the problem the measure of "activity" seems wrong.

Ian and Manu, thanks for your help. I'll post back and let you know if that
fixes it.

Jim




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Querying-a-parquet-file-in-s3-with-an-ec2-install-tp13737p13791.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to