Re: Query data in Spark RRD

Tathagata Das Mon, 23 Feb 2015 11:56:07 -0800

You could build a rest API, but you may have issue if you want to return
back arbitrary binary data. A more complex but robust alternative is to use
some RPC libraries like Akka, Thrift, etc.


TD

On Mon, Feb 23, 2015 at 12:45 AM, Nikhil Bafna <nikhil.ba...@flipkart.com>
wrote:

>
> Tathagata - Yes, I'm thinking on that line.
>
> The problem is how to send to send the query to the backend? Bundle a http
> server into a spark streaming job, that will accept the parameters?
>
> --
> Nikhil Bafna
>
> On Mon, Feb 23, 2015 at 2:04 PM, Tathagata Das <t...@databricks.com>
> wrote:
>
>> You will have a build a split infrastructure - a front end that takes the
>> queries from the UI and sends them to the backend, and the backend (running
>> the Spark Streaming app) will actually run the queries on table created in
>> the contexts. The RPCs necessary between the frontend and backend will need
>> to be implemented by you.
>>
>> On Sat, Feb 21, 2015 at 11:57 PM, Nikhil Bafna <nikhil.ba...@flipkart.com
>> > wrote:
>>
>>>
>>> Yes. As my understanding, it would allow me to write SQLs to query a
>>> spark context. But, the query needs to be specified within a job & deployed.
>>>
>>> What I want is to be able to run multiple dynamic queries specified at
>>> runtime from a dashboard.
>>>
>>>
>>>
>>> --
>>> Nikhil Bafna
>>>
>>> On Sat, Feb 21, 2015 at 8:37 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>>>
>>>> Have you looked at
>>>> http://spark.apache.org/docs/1.2.0/api/scala/index.html#org.apache.spark.sql.SchemaRDD
>>>> ?
>>>>
>>>> Cheers
>>>>
>>>> On Sat, Feb 21, 2015 at 4:24 AM, Nikhil Bafna <
>>>> nikhil.ba...@flipkart.com> wrote:
>>>>
>>>>>
>>>>> Hi.
>>>>>
>>>>> My use case is building a realtime monitoring system over
>>>>> multi-dimensional data.
>>>>>
>>>>> The way I'm planning to go about it is to use Spark Streaming to store
>>>>> aggregated count over all dimensions in 10 sec interval.
>>>>>
>>>>> Then, from a dashboard, I would be able to specify a query over some
>>>>> dimensions, which will need re-aggregation from the already computed job.
>>>>>
>>>>> My query is, how can I run dynamic queries over data in schema RDDs?
>>>>>
>>>>> --
>>>>> Nikhil Bafna
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Query data in Spark RRD

Reply via email to