Awesome, thanks for opening the JIRA! We'll take a look.
On Tue, Jan 12, 2016 at 1:53 PM, Muthu Jayakumar wrote:
> I tried to rerun the same code with current snapshot version of 1.6 and
> 2.0 from
> https://repository.apache.org/content/repositories/snapshots/org/apache/spark/spark-core_2.11/
I tried to rerun the same code with current snapshot version of 1.6 and 2.0
from
https://repository.apache.org/content/repositories/snapshots/org/apache/spark/spark-core_2.11/
But I still see an exception around the same line. Here is the exception
below. Filed an issue against the same SPARK-1278
Thanks Micheal. Let me test it with a recent master code branch.
Also for every mapping step should I have to create a new case class? I
cannot use Tuple as I have ~130 columns to process. Earlier I had used a
Seq[Any] (actually Array[Any] to optimize on serialization) but processed
it using RDD (
>
> df1.as[TestCaseClass].map(_.toMyMap).show() //fails
>
> This looks like a bug. What is the error? It might be fixed in
branch-1.6/master if you can test there.
> Please advice on what I may be missing here?
>
>
> Also for join, may I suggest to have a custom encoder / transformation to
> say
Hello Michael,
Thank you for the suggestion. This should do the trick for column names.
But how could I transform columns value type? Do I have to use an UDF? In
case if I use UDF, then the other question I may have is pertaining to the
map step in dataset, where I am running into an error when I
>
> Also, while extracting a value into Dataset using as[U] method, how could
> I specify a custom encoder/translation to case class (where I don't have
> the same column-name mapping or same data-type mapping)?
>
There is no public API yet for defining your own encoders. You change the
column na
Hi,
There are some documentation in
https://docs.cloud.databricks.com/docs/spark/1.6/index.html#examples/Dataset%20Aggregator.html
and also you can check out tests of DatasetSuite in spark sources.
BR,
Arkadiusz Bicz
On Mon, Jan 11, 2016 at 5:37 AM, Muthu Jayakumar wrote:
> Hello there,
>
Hello there,
While looking at the features of Dataset, it seem to provide an alternative
way towards udf and udaf. Any documentation or sample code snippet to write
this would be helpful in rewriting existing UDFs into Dataset mapping step.
Also, while extracting a value into Dataset using as[U] m