Both groupByKey and join() accept Partitioner as parameter.
Maybe you can specify a custom Partitioner so that the amount of shuffle is
reduced.
On Sat, Jan 16, 2016 at 9:39 AM, Daniel Imberman
wrote:
> Hi Ted,
>
> I think I might have figured something out!(Though I haven't tested it at
> scal
Thanks Cody.
One reason I was thinking of using Akka is that some of the copies take much
longer than others (or get stuck). We've seen this with our current streaming
job. This can cause the entire streaming micro-batch to take longer.
If we had a set of Akka actors than each copy would b
Hi Koert,
So I actually just mentioned something somewhat similar in the thread (your
email actually came through as I was sending it :) ).
One question I have is if I do a groupByKey and I have been smart about my
partitioning up to this point would I have that benefit of not needing to
shuffle
Hi Ted,
I think I might have figured something out!(Though I haven't tested it at
scale yet)
My current thought is that I can do a groupByKey on the RDD of vectors and
then do a join with the invertedIndex.
It would look something like this:
val InvIndexes:RDD[(Int,InvertedIndex)]
val partitione
Just doing a join is not an option? If you carefully manage your
partitioning then this can be pretty efficient (meaning no extra shuffle,
basically map-side join)
On Jan 13, 2016 2:30 PM, "Daniel Imberman"
wrote:
> I'm looking for a way to send structures to pre-determined partitions so
> that
>
Hi,
I am using "ooyala/spark-jobserver".
Regards,
Rajesh
On Sat, Jan 16, 2016 at 8:36 PM, Ted Yu wrote:
> Which distro are you using ?
>
> From the error message, compute-classpath.sh was not found.
> I searched Spark 1.6 built for hadoop 2.6 but didn't find
> either compute-classpath.sh or se
Hi all,I have some data on the driver side. Then I will broadcast the data to
all workers side to ensure each worker has same data. Due to there is no RDD in
the memory, I don't know how to make workers to start tasks to do some
transformation based on the data. I have try to write code like thi
Which distro are you using ?
>From the error message, compute-classpath.sh was not found.
I searched Spark 1.6 built for hadoop 2.6 but didn't find
either compute-classpath.sh or server_start.sh
Cheers
On Sat, Jan 16, 2016 at 5:33 AM, Madabhattula Rajesh Kumar <
mrajaf...@gmail.com> wrote:
> Hi
Hi everyone,
I’m trying to use the Scala interpreter, IMain, to interpret some Scala code
that executes a job with Spark:
@Test
public void countToFive() throws ScriptException {
SparkConf conf = new SparkConf().setAppName("Spark
interpreter").setMaster("local[2]");
SparkContext sc = ne
Hi,
I am not able to start spark job sever. I am facing below error. Please let
me know, how to resolve this issue.
I have configured one master and two workers in cluster mode.
./server_start.sh
*./server_start.sh: line 52: kill: (19621) - No such
process./server_start.sh: line 78:
/home/spar
Thanks for your response.
As a notice that , when my spark version is 1.4.1 when that kind of error won’t
cause driver stop. Another wise spark 1.5.2 will cause driver stop, I think
there must be some change. As I notice the code @spark 1.5.2
JobScheduler.scala : jobScheduler.reportError("Err
11 matches
Mail list logo