Re: Spark 1.6 udf/udaf alternatives in dataset?

2016-01-12 Thread Muthu Jayakumar
Thanks Micheal. Let me test it with a recent master code branch. Also for every mapping step should I have to create a new case class? I cannot use Tuple as I have ~130 columns to process. Earlier I had used a Seq[Any] (actually Array[Any] to optimize on serialization) but processed it using RDD

Re: Spark 1.6 udf/udaf alternatives in dataset?

2016-01-12 Thread Michael Armbrust
> > df1.as[TestCaseClass].map(_.toMyMap).show() //fails > > This looks like a bug. What is the error? It might be fixed in branch-1.6/master if you can test there. > Please advice on what I may be missing here? > > > Also for join, may I suggest to have a custom encoder / transformation to >

Re: Spark 1.6 udf/udaf alternatives in dataset?

2016-01-12 Thread Muthu Jayakumar
I tried to rerun the same code with current snapshot version of 1.6 and 2.0 from https://repository.apache.org/content/repositories/snapshots/org/apache/spark/spark-core_2.11/ But I still see an exception around the same line. Here is the exception below. Filed an issue against the same

Re: Spark 1.6 udf/udaf alternatives in dataset?

2016-01-12 Thread Michael Armbrust
Awesome, thanks for opening the JIRA! We'll take a look. On Tue, Jan 12, 2016 at 1:53 PM, Muthu Jayakumar wrote: > I tried to rerun the same code with current snapshot version of 1.6 and > 2.0 from >

Re: Spark 1.6 udf/udaf alternatives in dataset?

2016-01-11 Thread Michael Armbrust
> > Also, while extracting a value into Dataset using as[U] method, how could > I specify a custom encoder/translation to case class (where I don't have > the same column-name mapping or same data-type mapping)? > There is no public API yet for defining your own encoders. You change the column

Re: Spark 1.6 udf/udaf alternatives in dataset?

2016-01-11 Thread Arkadiusz Bicz
Hi, There are some documentation in https://docs.cloud.databricks.com/docs/spark/1.6/index.html#examples/Dataset%20Aggregator.html and also you can check out tests of DatasetSuite in spark sources. BR, Arkadiusz Bicz On Mon, Jan 11, 2016 at 5:37 AM, Muthu Jayakumar

Re: Spark 1.6 udf/udaf alternatives in dataset?

2016-01-11 Thread Muthu Jayakumar
Hello Michael, Thank you for the suggestion. This should do the trick for column names. But how could I transform columns value type? Do I have to use an UDF? In case if I use UDF, then the other question I may have is pertaining to the map step in dataset, where I am running into an error when I