Hello,
I started to use the dataframe API in Spark 1.3 with Scala.
I am trying to implement a UDF and am following the sample here:
https://spark.apache.org/docs/1.3.0/api/scala/index.html#org.apache.spark.sql.UserDefinedFunction
meaning
val predict = udf((score: Double) => if (score > 0.5) tr
Ashutosh,
Were you able to figure this out? I am having the exact some question.
I think the answer is to use Spark SQL to create/load a table in Hive (e.g.
execute the HiveQL CREATE TABLE statement) but I am not sure. Hoping for
something more simple than that.
Anybody?
Thanks!
--
View
on npp = (NewHadoopPartition) upp.parentPartition();
>
> String fPath = npp.serializableHadoopSplit().value().toString();
>
> String[] nT = tmpName.split(":");
>
> String name = nT[0]; // name is the path of the file picked for
> processing. the processing logic can be
We are running a Spark streaming job that retrieves files from a directory
(using textFileStream).
One concern we are having is the case where the job is down but files are
still being added to the directory.
Once the job starts up again, those files are not being picked up (since
they are not new