from:"Rishitesh Mishra"

Re: Concurrent execution of actions within a driver

2015-10-26 Thread Rishitesh Mishra

ion 2 > > } > > Does Spark run Action 1 & 2 run in parallel? ( some kind of a pass through > the driver code and than start the execution)? > > if not than is using threads safe for independent actions/red's? > > > -- Regards, Rishitesh Mishra, SnappyData . (http://www.snappydata.io/) https://in.linkedin.com/in/rishiteshmishra

Re: Node afinity for Kafka-Direct Stream

2015-10-14 Thread Rishitesh Mishra

to be restarted somewhere else. > > As I understand, the direct-kafka streaming model just computes offsets > and relays the work to a KafkaRDD. How is the execution locality compared > to the receiver-based approach? > > thanks, Gerard. > -- Regards, Rishitesh Mishra, Sn

Re: Spark DataFrame GroupBy into List

2015-10-13 Thread Rishitesh Mishra

duceByKey(reduceF) rdd3.foreach(r => println(r)) You can always reconvert the obtained RDD after tranformation and reduce to a DataFrame. Regards, Rishitesh Mishra, SnappyData . (http://www.snappydata.io/) https://www.linkedin.com/profile/view?id=AAIAAAIFdkMB_v-nolCrFH6_pKf9oH6tZD8Qlgo&tr

Re: JdbcRDD Constructor

2015-09-23 Thread Rishitesh Mishra

gards, > Satish Chandra > > On Wed, Sep 23, 2015 at 4:02 PM, Rishitesh Mishra < > rishi80.mis...@gmail.com> wrote: > >> Which version of Spark you are using ?? I can get correct results using >> JdbcRDD. Infact there is a test suite precisely for this (JdbcRDDSuit

Re: JdbcRDD Constructor

2015-09-23 Thread Rishitesh Mishra

Which version of Spark you are using ?? I can get correct results using JdbcRDD. Infact there is a test suite precisely for this (JdbcRDDSuite) . I changed according to your input and got correct results from this test suite. On Wed, Sep 23, 2015 at 11:00 AM, satish chandra j wrote: > HI All, >

Re: in joins, does one side stream?

2015-09-19 Thread Rishitesh Mishra

in > key in order to perform the join. > > > > On Sat, Sep 19, 2015 at 12:55 PM, Rishitesh Mishra < > rishi80.mis...@gmail.com> wrote: > >> Hi Reynold, >> Can you please elaborate on this. I thought RDD also opens only an >> iterator. Does it get materialize

Re: in joins, does one side stream?

2015-09-19 Thread Rishitesh Mishra

Hi Reynold, Can you please elaborate on this. I thought RDD also opens only an iterator. Does it get materialized for joins? Rishi On Saturday, September 19, 2015, Reynold Xin wrote: > Yes for RDD -- both are materialized. No for DataFrame/SQL - one side > streams. > > > On Thu, Sep 17, 2015 at

Re: RDD from partitions

2015-08-28 Thread Rishitesh Mishra

Hi Jem, A simple way to get this is to use MapPartitionedRDD. Please see the below code. For this you need to know your parent RDD's partition numbers that you want to exclude. One drawback here is the new RDD will also invoke similar number of tasks as parent RDDs as both the RDDs have same numbe

Re: Spark driver locality

2015-08-28 Thread Rishitesh Mishra

get assigned to worker node to read data from > remote hadoop cluster? I am more interested to know how mapr NFS layer is > accessed in parallel. > > - > Swapnil > > > On Thu, Aug 27, 2015 at 2:53 PM, Rishitesh Mishra < > rishi80.mis...@gmail.com> wrote: > >>

Re: Spark driver locality

2015-08-27 Thread Rishitesh Mishra

Hi Swapnil, Let me try to answer some of the questions. Answers inline. Hope it helps. On Thursday, August 27, 2015, Swapnil Shinde wrote: > Hello > I am new to spark world and started to explore recently in standalone > mode. It would be great if I get clarifications on below doubts- > > 1. Dri

Re: Spark streaming multi-tasking during I/O

2015-08-21 Thread Rishitesh Mishra

Hi Sateesh, It is interesting to know , how did you determine that the Dstream runs on a single core. Did you mean receivers? Coming back to your question, could you not start disk io in a separate thread, so that the sceduler can go ahead and assign other tasks ? On 21 Aug 2015 16:06, "Sateesh Ka

Re: How to list all dataframes and RDDs available in current session?

2015-08-20 Thread Rishitesh Mishra

I am not sure if you can view all RDDs in a session. Tables are maintained in a catalogue . Hence its easier. However you can see the DAG representation , which lists all the RDDs in a job , with Spark UI. On 20 Aug 2015 22:34, "Dhaval Patel" wrote: > Apologies > > I accidentally included Sp

2015-08-16 Thread Rishitesh Mishra

Re: Concurrent execution of actions within a driver

Re: Node afinity for Kafka-Direct Stream

Re: Spark DataFrame GroupBy into List

Re: JdbcRDD Constructor

Re: JdbcRDD Constructor

Re: in joins, does one side stream?

Re: in joins, does one side stream?

Re: RDD from partitions

Re: Spark driver locality

Re: Spark driver locality

Re: Spark streaming multi-tasking during I/O

Re: How to list all dataframes and RDDs available in current session?

Subscribe

13 matches

Site Navigation

Mail list logo

Footer information