These specific JIRAs don't exist yet, but watch SPARK-9999 as we'll make sure everything shows up there.
On Sun, Dec 6, 2015 at 10:06 AM, Koert Kuipers <ko...@tresata.com> wrote: > that's good news about plans to avoid unnecessary conversions, and allow > access to more efficient internal types. could you point me to the jiras, > if they exist already? i just tried to find them but had little luck. > best, koert > > On Sat, Dec 5, 2015 at 4:09 PM, Michael Armbrust <mich...@databricks.com> > wrote: > >> On Sat, Dec 5, 2015 at 9:42 AM, Koert Kuipers <ko...@tresata.com> wrote: >> >>> hello all, >>> DataFrame internally uses a different encoding for values then what the >>> user sees. i assume the same is true for Dataset? >>> >> >> This is true. We encode objects in the tungsten binary format using code >> generated serializers. >> >> >>> if so, does this means that a function like Dataset.map needs to convert >>> all the values twice (once to user format and then back to internal >>> format)? or is it perhaps possible to write scala functions that operate on >>> internal formats and avoid this? >>> >> >> Currently this is true, but there are plans to avoid unnecessary >> conversions (back to back maps / filters, etc) and only convert when we >> need to (shuffles, sorting, hashing, SQL operations). >> >> There are also plans to allow you to directly access some of the more >> efficient internal types by using them as fields in your classes (mutable >> UTF8 String instead of the immutable java.lang.String). >> >> >