Thanks Micheal. Let me test it with a recent master code branch.
Also for every mapping step should I have to create a new case class? I
cannot use Tuple as I have ~130 columns to process. Earlier I had used a
Seq[Any] (actually Array[Any] to optimize on serialization) but processed
it using RDD
>
> df1.as[TestCaseClass].map(_.toMyMap).show() //fails
>
> This looks like a bug. What is the error? It might be fixed in
branch-1.6/master if you can test there.
> Please advice on what I may be missing here?
>
>
> Also for join, may I suggest to have a custom encoder / transformation to
>
I tried to rerun the same code with current snapshot version of 1.6 and 2.0
from
https://repository.apache.org/content/repositories/snapshots/org/apache/spark/spark-core_2.11/
But I still see an exception around the same line. Here is the exception
below. Filed an issue against the same
Awesome, thanks for opening the JIRA! We'll take a look.
On Tue, Jan 12, 2016 at 1:53 PM, Muthu Jayakumar wrote:
> I tried to rerun the same code with current snapshot version of 1.6 and
> 2.0 from
>
>
> Also, while extracting a value into Dataset using as[U] method, how could
> I specify a custom encoder/translation to case class (where I don't have
> the same column-name mapping or same data-type mapping)?
>
There is no public API yet for defining your own encoders. You change the
column
Hi,
There are some documentation in
https://docs.cloud.databricks.com/docs/spark/1.6/index.html#examples/Dataset%20Aggregator.html
and also you can check out tests of DatasetSuite in spark sources.
BR,
Arkadiusz Bicz
On Mon, Jan 11, 2016 at 5:37 AM, Muthu Jayakumar
Hello Michael,
Thank you for the suggestion. This should do the trick for column names.
But how could I transform columns value type? Do I have to use an UDF? In
case if I use UDF, then the other question I may have is pertaining to the
map step in dataset, where I am running into an error when I