Re: user threads in executors

Shushant Arora Wed, 22 Jul 2015 09:21:09 -0700

Thanks !

I am using spark streaming 1.3 , And if some post fails because of any
reason, I will store the offset of that message in another kafka topic. I
want to read these offsets in another spark job  and from them the original
kafka topic's messages based on these offsets-
 So is it possible in spark job to get kafka messages based on random
offsets ? Or is there any better alternative to handle failure of post
request?


On Wed, Jul 22, 2015 at 11:31 AM, Tathagata Das <t...@databricks.com> wrote:

> Yes, you could unroll from the iterator in batch of 100-200 and then post
> them in multiple rounds.
> If you are using the Kafka receiver based approach (not Direct), then the
> raw Kafka data is stored in the executor memory. If you are using Direct
> Kafka, then it is read from Kafka directly at the time of filtering.
>
> TD
>
> On Tue, Jul 21, 2015 at 9:34 PM, Shushant Arora <shushantaror...@gmail.com
> > wrote:
>
>> I can post multiple items at a time.
>>
>> Data is being read from kafka and filtered after that its posted . Does 
>> foreachPartition
>> load complete partition in memory or use an iterator of batch underhood? If
>> compete batch is not loaded will using custim size of 100-200 request in
>> one batch and post will help instead of whole partition ?
>>
>> On Wed, Jul 22, 2015 at 12:18 AM, Tathagata Das <t...@databricks.com>
>> wrote:
>>
>>> If you can post multiple items at a time, then use foreachPartition to
>>> post the whole partition in a single request.
>>>
>>> On Tue, Jul 21, 2015 at 9:35 AM, Richard Marscher <
>>> rmarsc...@localytics.com> wrote:
>>>
>>>> You can certainly create threads in a map transformation. We do this to
>>>> do concurrent DB lookups during one stage for example. I would recommend,
>>>> however, that you switch to mapPartitions from map as this allows you to
>>>> create a fixed size thread pool to share across items on a partition as
>>>> opposed to spawning a future per record in the RDD for example.
>>>>
>>>> On Tue, Jul 21, 2015 at 4:11 AM, Shushant Arora <
>>>> shushantaror...@gmail.com> wrote:
>>>>
>>>>> Hi
>>>>>
>>>>> Can I create user threads in executors.
>>>>> I have a streaming app where after processing I have a requirement to
>>>>> push events to external system . Each post request costs ~90-100 ms.
>>>>>
>>>>> To make post parllel, I can not use same thread because that is
>>>>> limited by no of cores available in system , can I useuser therads in 
>>>>> spark
>>>>> App? I tried to create 2 thredas in a map tasks and it worked.
>>>>>
>>>>> Is there any upper limit on no of user threds in spark executor ? Is
>>>>> it a good idea to create user threads in spark map task?
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> *Richard Marscher*
>>>> Software Engineer
>>>> Localytics
>>>> Localytics.com <http://localytics.com/> | Our Blog
>>>> <http://localytics.com/blog> | Twitter <http://twitter.com/localytics>
>>>>  | Facebook <http://facebook.com/localytics> | LinkedIn
>>>> <http://www.linkedin.com/company/1148792?trk=tyah>
>>>>
>>>
>>>
>>
>

Re: user threads in executors

Reply via email to