you many need to add createDataFrame( for Python, inferschema) call before registerTempTable.
Thanks, Prem On Thu, Jan 7, 2016 at 12:53 PM, Henrik Baastrup < henrik.baast...@netscout.com> wrote: > Hi All, > > I have a small Hadoop cluster where I have stored a lot of data in parquet > files. I have installed a Spark master service on one of the nodes and now > would like to query my parquet files from a Spark client. When I run the > following program from the spark-shell on the Spark Master node all function > correct: > > # val sqlCont = new org.apache.spark.sql.SQLContext(sc) > # val reader = sqlCont.read > # val dataFrame = reader.parquet("/user/hdfs/parquet-multi/BICC") > # dataFrame.registerTempTable("BICC") > # val recSet = sqlCont.sql("SELECT > protocolCode,beginTime,endTime,called,calling FROM BICC WHERE > endTime>=1449421800000000 AND endTime<=1449422400000000 AND > calling='6287870642893' AND p_endtime=1449422400000000") > # recSet.show() > > But when I run the Java program below, from my client, I get: > > Exception in thread "main" java.lang.AssertionError: assertion failed: No > predefined schema found, and no Parquet data files or summary files found > under file:/user/hdfs/parquet-multi/BICC. > > The exception occurs at the line: DataFrame df = > reader.parquet("/user/hdfs/parquet-multi/BICC"); > > On the Master node I can see the client connect when the SparkContext is > instanced, as I get the following lines in the Spark log: > > 16/01/07 18:27:47 INFO Master: Registering app SparkTest > 16/01/07 18:27:47 INFO Master: Registered app SparkTest with ID > app-20160107182747-00801 > > If I create a local directory with the given path, my program goes in an > endless loop, with the following warning on the console: > > WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; > check your cluster UI to ensure that workers are registered and have > sufficient resources > > To me it seams that my SQLContext does not connect to the Spark Master, but > try to work locally on the client, where the requested files do not exist. > > Java program: > SparkConf conf = new SparkConf() > .setAppName("SparkTest") > .setMaster("spark://172.27.13.57:7077"); > JavaSparkContext sc = new JavaSparkContext(conf); > SQLContext sqlContext = new SQLContext(sc); > > DataFrameReader reader = sqlContext.read(); > DataFrame df = reader.parquet("/user/hdfs/parquet-multi/BICC"); > DataFrame filtered = df.filter("endTime>=1449421800000000 AND > endTime<=1449422400000000 AND calling='6287870642893' AND > p_endtime=1449422400000000"); > filtered.show(); > > Are there someone there can help me? > > Henrik > > >