Mark, We have an application that use data from different kind of source,
and we build a engine able to handle that, but cant scale with big data(we
could but is to time expensive), and doesn't have Machine learning module,
etc, we came across with Spark and it's looks like it have all we need,
actually it does, but our latency is very low right now, and when we do
some testing it took too long time for the same kind of results, always
against RDBM which is our primary source.

So, we want to expand our sources, to cvs, web service, big data, etc, we
can extend our engine or use something like Spark, which give as power of
clustering, different kind of source access, streaming, machine learning,
easy extensibility and so on.

On Tue, Dec 1, 2015 at 9:36 PM, Mark Hamstra <m...@clearstorydata.com>
wrote:

> I'd ask another question first: If your SQL query can be executed in a
> performant fashion against a conventional (RDBMS?) database, why are you
> trying to use Spark?  How you answer that question will be the key to
> deciding among the engineering design tradeoffs to effectively use Spark or
> some other solution.
>
> On Tue, Dec 1, 2015 at 4:23 PM, Andrés Ivaldi <iaiva...@gmail.com> wrote:
>
>> Ok, so latency problem is being generated because I'm using SQL as
>> source? how about csv, hive, or another source?
>>
>> On Tue, Dec 1, 2015 at 9:18 PM, Mark Hamstra <m...@clearstorydata.com>
>> wrote:
>>
>>> It is not designed for interactive queries.
>>>
>>>
>>> You might want to ask the designers of Spark, Spark SQL, and
>>> particularly some things built on top of Spark (such as BlinkDB) about
>>> their intent with regard to interactive queries.  Interactive queries are
>>> not the only designed use of Spark, but it is going too far to claim that
>>> Spark is not designed at all to handle interactive queries.
>>>
>>> That being said, I think that you are correct to question the wisdom of
>>> expecting lowest-latency query response from Spark using SQL (sic,
>>> presumably a RDBMS is intended) as the datastore.
>>>
>>> On Tue, Dec 1, 2015 at 4:05 PM, Jörn Franke <jornfra...@gmail.com>
>>> wrote:
>>>
>>>> Hmm it will never be faster than SQL if you use SQL as an underlying
>>>> storage. Spark is (currently) an in-memory batch engine for iterative
>>>> machine learning workloads. It is not designed for interactive queries.
>>>> Currently hive is going into the direction of interactive queries.
>>>> Alternatives are Hbase on Phoenix or Impala.
>>>>
>>>> On 01 Dec 2015, at 21:58, Andrés Ivaldi <iaiva...@gmail.com> wrote:
>>>>
>>>> Yes,
>>>> The use case would be,
>>>> Have spark in a service (I didnt invertigate this yet), through api
>>>> calls of this service we perform some aggregations over data in SQL, We are
>>>> already doing this with an internal development
>>>>
>>>> Nothing complicated, for instance, a table with Product, Product
>>>> Family, cost, price, etc. Columns like Dimension and Measures,
>>>>
>>>> I want to Spark for query that table to perform a kind of rollup, with
>>>> cost as Measure and Prodcut, Product Family as Dimension
>>>>
>>>> Only 3 columns, it takes like 20s to perform that query and the
>>>> aggregation, the  query directly to the database with a grouping at the
>>>> columns takes like 1s
>>>>
>>>> regards
>>>>
>>>>
>>>>
>>>> On Tue, Dec 1, 2015 at 5:38 PM, Jörn Franke <jornfra...@gmail.com>
>>>> wrote:
>>>>
>>>>> can you elaborate more on the use case?
>>>>>
>>>>> > On 01 Dec 2015, at 20:51, Andrés Ivaldi <iaiva...@gmail.com> wrote:
>>>>> >
>>>>> > Hi,
>>>>> >
>>>>> > I'd like to use spark to perform some transformations over data
>>>>> stored inSQL, but I need low Latency, I'm doing some test and I run into
>>>>> spark context creation and data query over SQL takes too long time.
>>>>> >
>>>>> > Any idea for speed up the process?
>>>>> >
>>>>> > regards.
>>>>> >
>>>>> > --
>>>>> > Ing. Ivaldi Andres
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Ing. Ivaldi Andres
>>>>
>>>>
>>>
>>
>>
>> --
>> Ing. Ivaldi Andres
>>
>
>


-- 
Ing. Ivaldi Andres

Reply via email to