Thanks for the clarifications Mrithul.
Thanks
Best Regards
On Fri, Aug 14, 2015 at 1:04 PM, Mridul Muralidharan
wrote:
> What I understood from Imran's mail (and what was referenced in his
> mail) the RDD mentioned seems to be violating some basic contracts on
> how partitions are used in spark
What I understood from Imran's mail (and what was referenced in his
mail) the RDD mentioned seems to be violating some basic contracts on
how partitions are used in spark [1].
They cannot be arbitrarily numbered,have duplicates, etc.
Extending RDD to add functionality is typically for niche cases
Yep, and it works fine for operations which does not involve any shuffle
(like foreach,, count etc) and those which involves shuffle operations ends
up in an infinite loop. Spark should somehow indicate this instead of going
in an infinite loop.
Thanks
Best Regards
On Thu, Aug 13, 2015 at 11:37 P
oh I see, you are defining your own RDD & Partition types, and you had a
bug where partition.index did not line up with the partitions slot in
rdd.getPartitions. Is that correct?
On Thu, Aug 13, 2015 at 2:40 AM, Akhil Das
wrote:
> I figured that out, And these are my findings:
>
> -> It just en
yikes.
Was this a one-time thing? Or does it happen consistently? can you turn
on debug logging for o.a.s.scheduler (dunno if it will help, but maybe ...)
On Tue, Aug 11, 2015 at 8:59 AM, Akhil Das
wrote:
> Hi
>
> My Spark job (running in local[*] with spark 1.4.1) reads data from a
> thrift
Hi
My Spark job (running in local[*] with spark 1.4.1) reads data from a
thrift server(Created an RDD, it will compute the partitions in
getPartitions() call and in computes hasNext will return records from these
partitions), count(), foreach() is working fine it returns the correct
number of reco