Re: Concurrent execution of actions within a driver

2015-10-26 Thread Rishitesh Mishra
ion 2 > > } > > Does Spark run Action 1 & 2 run in parallel? ( some kind of a pass through > the driver code and than start the execution)? > > if not than is using threads safe for independent actions/red's? > > > -- Regards, Rishitesh Mishra, SnappyData . (http://www.snappydata.io/) https://in.linkedin.com/in/rishiteshmishra

Re: Node afinity for Kafka-Direct Stream

2015-10-14 Thread Rishitesh Mishra
to be restarted somewhere else. > > As I understand, the direct-kafka streaming model just computes offsets > and relays the work to a KafkaRDD. How is the execution locality compared > to the receiver-based approach? > > thanks, Gerard. > -- Regards, Rishitesh Mishra, Sn

Re: Spark DataFrame GroupBy into List

2015-10-13 Thread Rishitesh Mishra
duceByKey(reduceF) rdd3.foreach(r => println(r)) You can always reconvert the obtained RDD after tranformation and reduce to a DataFrame. Regards, Rishitesh Mishra, SnappyData . (http://www.snappydata.io/) https://www.linkedin.com/profile/view?id=AAIAAAIFdkMB_v-nolCrFH6_pKf9oH6tZD8Qlgo&tr

Re: JdbcRDD Constructor

2015-09-23 Thread Rishitesh Mishra
gards, > Satish Chandra > > On Wed, Sep 23, 2015 at 4:02 PM, Rishitesh Mishra < > rishi80.mis...@gmail.com> wrote: > >> Which version of Spark you are using ?? I can get correct results using >> JdbcRDD. Infact there is a test suite precisely for this (JdbcRDDSuit

Re: JdbcRDD Constructor

2015-09-23 Thread Rishitesh Mishra
Which version of Spark you are using ?? I can get correct results using JdbcRDD. Infact there is a test suite precisely for this (JdbcRDDSuite) . I changed according to your input and got correct results from this test suite. On Wed, Sep 23, 2015 at 11:00 AM, satish chandra j wrote: > HI All, >

Re: in joins, does one side stream?

2015-09-19 Thread Rishitesh Mishra
in > key in order to perform the join. > > > > On Sat, Sep 19, 2015 at 12:55 PM, Rishitesh Mishra < > rishi80.mis...@gmail.com> wrote: > >> Hi Reynold, >> Can you please elaborate on this. I thought RDD also opens only an >> iterator. Does it get materialize

Re: in joins, does one side stream?

2015-09-19 Thread Rishitesh Mishra
Hi Reynold, Can you please elaborate on this. I thought RDD also opens only an iterator. Does it get materialized for joins? Rishi On Saturday, September 19, 2015, Reynold Xin wrote: > Yes for RDD -- both are materialized. No for DataFrame/SQL - one side > streams. > > > On Thu, Sep 17, 2015 at

Re: RDD from partitions

2015-08-28 Thread Rishitesh Mishra
Hi Jem, A simple way to get this is to use MapPartitionedRDD. Please see the below code. For this you need to know your parent RDD's partition numbers that you want to exclude. One drawback here is the new RDD will also invoke similar number of tasks as parent RDDs as both the RDDs have same numbe

Re: Spark driver locality

2015-08-28 Thread Rishitesh Mishra
get assigned to worker node to read data from > remote hadoop cluster? I am more interested to know how mapr NFS layer is > accessed in parallel. > > - > Swapnil > > > On Thu, Aug 27, 2015 at 2:53 PM, Rishitesh Mishra < > rishi80.mis...@gmail.com> wrote: > >>

Re: Spark driver locality

2015-08-27 Thread Rishitesh Mishra
Hi Swapnil, Let me try to answer some of the questions. Answers inline. Hope it helps. On Thursday, August 27, 2015, Swapnil Shinde wrote: > Hello > I am new to spark world and started to explore recently in standalone > mode. It would be great if I get clarifications on below doubts- > > 1. Dri

Re: Spark streaming multi-tasking during I/O

2015-08-21 Thread Rishitesh Mishra
Hi Sateesh, It is interesting to know , how did you determine that the Dstream runs on a single core. Did you mean receivers? Coming back to your question, could you not start disk io in a separate thread, so that the sceduler can go ahead and assign other tasks ? On 21 Aug 2015 16:06, "Sateesh Ka

Re: How to list all dataframes and RDDs available in current session?

2015-08-20 Thread Rishitesh Mishra
I am not sure if you can view all RDDs in a session. Tables are maintained in a catalogue . Hence its easier. However you can see the DAG representation , which lists all the RDDs in a job , with Spark UI. On 20 Aug 2015 22:34, "Dhaval Patel" wrote: > Apologies > > I accidentally included Sp

Subscribe

2015-08-16 Thread Rishitesh Mishra