Spark 1.3 UDF ClassNotFoundException

2015-04-02 Thread ganterm
Hello, I started to use the dataframe API in Spark 1.3 with Scala. I am trying to implement a UDF and am following the sample here: https://spark.apache.org/docs/1.3.0/api/scala/index.html#org.apache.spark.sql.UserDefinedFunction meaning val predict = udf((score: Double) => if (score > 0.5) tr

Re: Tableau beta connector

2015-02-18 Thread ganterm
Ashutosh, Were you able to figure this out? I am having the exact some question. I think the answer is to use Spark SQL to create/load a table in Hive (e.g. execute the HiveQL CREATE TABLE statement) but I am not sure. Hoping for something more simple than that. Anybody? Thanks! -- View

Re: Spark streaming - tracking/deleting processed files

2015-02-04 Thread ganterm
on npp = (NewHadoopPartition) upp.parentPartition(); > > String fPath = npp.serializableHadoopSplit().value().toString(); > > String[] nT = tmpName.split(":"); > > String name = nT[0]; // name is the path of the file picked for > processing. the processing logic can be

Spark streaming - tracking/deleting processed files

2015-01-30 Thread ganterm
We are running a Spark streaming job that retrieves files from a directory (using textFileStream). One concern we are having is the case where the job is down but files are still being added to the directory. Once the job starts up again, those files are not being picked up (since they are not new