you can use spark rest job server(or any other solution that provides long
running spark context) so that you won't pay this bootstrap time on each
query
in addition : if you have some rdd that u want your queries to be executed
on, you can cache this rdd in memory(depends on ur cluster memory size) so
that you wont pay reading from disk time


On 22 July 2015 at 14:46, Louis Hust <louis.h...@gmail.com> wrote:

> I do a simple test using spark in standalone mode(not cluster),
>  and found a simple action take a few seconds, the data size is small,
> just few rows.
> So each spark job will cost some time for init or prepare work no matter
> what the job is?
> I mean if the basic framework of spark job will cost seconds?
>
> 2015-07-22 19:17 GMT+08:00 Robin East <robin.e...@xense.co.uk>:
>
>> Real-time is, of course, relative but you’ve mentioned microsecond level.
>> Spark is designed to process large amounts of data in a distributed
>> fashion. No distributed system I know of could give any kind of guarantees
>> at the microsecond level.
>>
>> Robin
>>
>> > On 22 Jul 2015, at 11:14, Louis Hust <louis.h...@gmail.com> wrote:
>> >
>> > Hi, all
>> >
>> > I am using spark jar in standalone mode, fetch data from different
>> mysql instance and do some action, but i found the time is at second level.
>> >
>> > So i want to know if spark job is suitable for real time query which at
>> microseconds?
>>
>>
>

Reply via email to