I don't know about the second one but for question #1:
When you convert from a cached DF to an RDD (via a map function or the
"rdd" value) the types are converted from the off-heap types to on-heap
types. If your rows are fairly large/complex this can have a pretty big
performance impact so I
Hi all,
I've been looking at some of my query plans and noticed that pretty much
every explode that I run (which is always over a column with ArrayData) is
prefixed with a ConvertToSafe call in the physical plan. Looking at
Generate.scala it looks like it doesn't override canProcessUnsafeRows in
Hi all,
So we have these UDFs which take <1ms to operate and we're seeing pretty
poor performance around them in practice, the overhead being >10ms for the
projections (this data is deeply nested with ArrayTypes and MapTypes so
that could be the cause). Looking at the logs and code for ScalaUDF,
Hi Michael,
Thanks for the response. I am just extracting part of the nested structure
and returning only a piece that same structure.
I haven't looked at Encoders or Datasets since we're bound to 1.6 for now
but I'll look at encoders to see if that covers it. Datasets seems like it
would solve
30, 2016 at 11:47 AM Hamel Kothari <hamelkoth...@gmail.com>
wrote:
> Hi all,
>
> I've been trying for the last couple of days to define a UDF which takes
> in a deeply nested Row object and performs some extraction to pull out a
> portion of of the Row and return it. This
Hi all,
I've been trying for the last couple of days to define a UDF which takes in
a deeply nested Row object and performs some extraction to pull out a
portion of of the Row and return it. This row object is nested not just
with StructTypes but a bunch of ArrayTypes and MapTypes. From this
Dan,
You could probably just register a JVM shutdown hook yourself:
https://docs.oracle.com/javase/7/docs/api/java/lang/Runtime.html#addShutdownHook(java.lang.Thread
)
This at least would let you close the connections when the application as a
whole has completed (in standalone) or when your
Java, Scala)
>
> 2. API stability (both backward and forward)
>
>
>
> On Fri, Feb 26, 2016 at 8:44 AM, Hamel Kothari <hamelkoth...@gmail.com>
> wrote:
>
>> Hi devs,
>>
>> Has there been any discussion around changing the DataSource parameters
&g
Hi devs,
Has there been any discussion around changing the DataSource parameters
arguments be something more sophisticated than Map[String, String]? As you
write more complex DataSources there are likely to be a variety of
parameters of varying formats which are needed and having to coerce them
and reservation.
>
> The question is how much preemption tries to preempt the queue A if it
> holds the entire resource without releasing? Could not able to share the
> actual configuration, but the answer to the question here will help us.
>
>
> Thanks,
> Prabhu Joseph
&g
If all queues are identical, this behavior should not be happening.
Preemption as designed in fair scheduler (IIRC) takes place based on the
instantaneous fair share, not the steady state fair share. The fair
scheduler docs
I noticed that the Jackson dependency was bumped to 2.5 in master for
something spark-streaming related. Is there any reason that this upgrade
can't be included with 1.6.1?
According to later comments on this thread:
https://issues.apache.org/jira/browse/SPARK-8332 and my personal experience
Hey spark-devs,
I'm in the process of writing a DataSource for what is essentially a java
web service. Each relation which we create will consist of a series of
queries to this webservice which returns a pretty much known amount of data
(eg. 2000 rows, 5 string columns or similar which we can
Are you running on YARN? Another possibility here is that your shuffle
managers are facing GC pain and becoming less responsive, thus missing
timeouts. Can you try increasing the memory on the node managers and see if
that helps?
On Sun, Jan 24, 2016 at 4:58 PM Ted Yu wrote:
The "Too Many Files" part of the exception is just indicative of the fact
that when that call was made, too many files were already open. It doesn't
necessarily mean that that line is the source of all of the open files,
that's just the point at which it hit its limit.
What I would recommend is
15 matches
Mail list logo