Re: Dataset and lambas

Koert Kuipers Mon, 07 Dec 2015 12:30:12 -0800

great thanks

On Mon, Dec 7, 2015 at 3:02 PM, Michael Armbrust <mich...@databricks.com>
wrote:


> These specific JIRAs don't exist yet, but watch SPARK-9999 as we'll make
> sure everything shows up there.
>
> On Sun, Dec 6, 2015 at 10:06 AM, Koert Kuipers <ko...@tresata.com> wrote:
>
>> that's good news about plans to avoid unnecessary conversions, and allow
>> access to more efficient internal types. could you point me to the jiras,
>> if they exist already? i just tried to find them but had little luck.
>> best, koert
>>
>> On Sat, Dec 5, 2015 at 4:09 PM, Michael Armbrust <mich...@databricks.com>
>> wrote:
>>
>>> On Sat, Dec 5, 2015 at 9:42 AM, Koert Kuipers <ko...@tresata.com> wrote:
>>>
>>>> hello all,
>>>> DataFrame internally uses a different encoding for values then what the
>>>> user sees. i assume the same is true for Dataset?
>>>>
>>>
>>> This is true.  We encode objects in the tungsten binary format using
>>> code generated serializers.
>>>
>>>
>>>> if so, does this means that a function like Dataset.map needs to
>>>> convert all the values twice (once to user format and then back to internal
>>>> format)? or is it perhaps possible to write scala functions that operate on
>>>> internal formats and avoid this?
>>>>
>>>
>>> Currently this is true, but there are plans to avoid unnecessary
>>> conversions (back to back maps / filters, etc) and only convert when we
>>> need to (shuffles, sorting, hashing, SQL operations).
>>>
>>> There are also plans to allow you to directly access some of the more
>>> efficient internal types by using them as fields in your classes (mutable
>>> UTF8 String instead of the immutable java.lang.String).
>>>
>>>
>>
>

Re: Dataset and lambas

Reply via email to