Thanks ! I am using spark streaming 1.3 , And if some post fails because of any reason, I will store the offset of that message in another kafka topic. I want to read these offsets in another spark job and from them the original kafka topic's messages based on these offsets- So is it possible in spark job to get kafka messages based on random offsets ? Or is there any better alternative to handle failure of post request?
On Wed, Jul 22, 2015 at 11:31 AM, Tathagata Das <t...@databricks.com> wrote: > Yes, you could unroll from the iterator in batch of 100-200 and then post > them in multiple rounds. > If you are using the Kafka receiver based approach (not Direct), then the > raw Kafka data is stored in the executor memory. If you are using Direct > Kafka, then it is read from Kafka directly at the time of filtering. > > TD > > On Tue, Jul 21, 2015 at 9:34 PM, Shushant Arora <shushantaror...@gmail.com > > wrote: > >> I can post multiple items at a time. >> >> Data is being read from kafka and filtered after that its posted . Does >> foreachPartition >> load complete partition in memory or use an iterator of batch underhood? If >> compete batch is not loaded will using custim size of 100-200 request in >> one batch and post will help instead of whole partition ? >> >> On Wed, Jul 22, 2015 at 12:18 AM, Tathagata Das <t...@databricks.com> >> wrote: >> >>> If you can post multiple items at a time, then use foreachPartition to >>> post the whole partition in a single request. >>> >>> On Tue, Jul 21, 2015 at 9:35 AM, Richard Marscher < >>> rmarsc...@localytics.com> wrote: >>> >>>> You can certainly create threads in a map transformation. We do this to >>>> do concurrent DB lookups during one stage for example. I would recommend, >>>> however, that you switch to mapPartitions from map as this allows you to >>>> create a fixed size thread pool to share across items on a partition as >>>> opposed to spawning a future per record in the RDD for example. >>>> >>>> On Tue, Jul 21, 2015 at 4:11 AM, Shushant Arora < >>>> shushantaror...@gmail.com> wrote: >>>> >>>>> Hi >>>>> >>>>> Can I create user threads in executors. >>>>> I have a streaming app where after processing I have a requirement to >>>>> push events to external system . Each post request costs ~90-100 ms. >>>>> >>>>> To make post parllel, I can not use same thread because that is >>>>> limited by no of cores available in system , can I useuser therads in >>>>> spark >>>>> App? I tried to create 2 thredas in a map tasks and it worked. >>>>> >>>>> Is there any upper limit on no of user threds in spark executor ? Is >>>>> it a good idea to create user threads in spark map task? >>>>> >>>>> Thanks >>>>> >>>>> >>>> >>>> >>>> -- >>>> *Richard Marscher* >>>> Software Engineer >>>> Localytics >>>> Localytics.com <http://localytics.com/> | Our Blog >>>> <http://localytics.com/blog> | Twitter <http://twitter.com/localytics> >>>> | Facebook <http://facebook.com/localytics> | LinkedIn >>>> <http://www.linkedin.com/company/1148792?trk=tyah> >>>> >>> >>> >> >