you many need to add

createDataFrame( for Python, inferschema) call before registerTempTable.



On Thu, Jan 7, 2016 at 12:53 PM, Henrik Baastrup <> wrote:

> Hi All,
> I have a small Hadoop cluster where I have stored a lot of data in parquet 
> files. I have installed a Spark master service on one of the nodes and now 
> would like to query my parquet files from a Spark client. When I run the 
> following program from the spark-shell on the Spark Master node all function 
> correct:
> # val sqlCont = new org.apache.spark.sql.SQLContext(sc)
> # val reader =
> # val dataFrame = reader.parquet("/user/hdfs/parquet-multi/BICC")
> # dataFrame.registerTempTable("BICC")
> # val recSet = sqlCont.sql("SELECT 
> protocolCode,beginTime,endTime,called,calling FROM BICC WHERE 
> endTime>=1449421800000000 AND endTime<=1449422400000000 AND 
> calling='6287870642893' AND p_endtime=1449422400000000")
> #
> But when I run the Java program below, from my client, I get:
> Exception in thread "main" java.lang.AssertionError: assertion failed: No 
> predefined schema found, and no Parquet data files or summary files found 
> under file:/user/hdfs/parquet-multi/BICC.
> The exception occurs at the line: DataFrame df = 
> reader.parquet("/user/hdfs/parquet-multi/BICC");
> On the Master node I can see the client connect when the SparkContext is 
> instanced, as I get the following lines in the Spark log:
> 16/01/07 18:27:47 INFO Master: Registering app SparkTest
> 16/01/07 18:27:47 INFO Master: Registered app SparkTest with ID 
> app-20160107182747-00801
> If I create a local directory with the given path, my program goes in an 
> endless loop, with the following warning on the console:
> WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; 
> check your cluster UI to ensure that workers are registered and have 
> sufficient resources
> To me it seams that my SQLContext does not connect to the Spark Master, but 
> try to work locally on the client, where the requested files do not exist.
> Java program:
>       SparkConf conf = new SparkConf()
>               .setAppName("SparkTest")
>               .setMaster("spark://");
>       JavaSparkContext sc = new JavaSparkContext(conf);
>       SQLContext sqlContext = new SQLContext(sc);
>       DataFrameReader reader =;
>       DataFrame df = reader.parquet("/user/hdfs/parquet-multi/BICC");
>       DataFrame filtered = df.filter("endTime>=1449421800000000 AND 
> endTime<=1449422400000000 AND calling='6287870642893' AND 
> p_endtime=1449422400000000");
> Are there someone there can help me?
> Henrik

Reply via email to