Re: Dataset and lambas

Michael Armbrust Mon, 07 Dec 2015 12:09:04 -0800

These specific JIRAs don't exist yet, but watch SPARK-9999 as we'll make
sure everything shows up there.


On Sun, Dec 6, 2015 at 10:06 AM, Koert Kuipers <ko...@tresata.com> wrote:

> that's good news about plans to avoid unnecessary conversions, and allow
> access to more efficient internal types. could you point me to the jiras,
> if they exist already? i just tried to find them but had little luck.
> best, koert
>
> On Sat, Dec 5, 2015 at 4:09 PM, Michael Armbrust <mich...@databricks.com>
> wrote:
>
>> On Sat, Dec 5, 2015 at 9:42 AM, Koert Kuipers <ko...@tresata.com> wrote:
>>
>>> hello all,
>>> DataFrame internally uses a different encoding for values then what the
>>> user sees. i assume the same is true for Dataset?
>>>
>>
>> This is true.  We encode objects in the tungsten binary format using code
>> generated serializers.
>>
>>
>>> if so, does this means that a function like Dataset.map needs to convert
>>> all the values twice (once to user format and then back to internal
>>> format)? or is it perhaps possible to write scala functions that operate on
>>> internal formats and avoid this?
>>>
>>
>> Currently this is true, but there are plans to avoid unnecessary
>> conversions (back to back maps / filters, etc) and only convert when we
>> need to (shuffles, sorting, hashing, SQL operations).
>>
>> There are also plans to allow you to directly access some of the more
>> efficient internal types by using them as fields in your classes (mutable
>> UTF8 String instead of the immutable java.lang.String).
>>
>>
>

Re: Dataset and lambas

Reply via email to