Thank you Sanchay and thank you for blog.

No dia terça-feira, 19 de julho de 2022, sanchay javeria <
sanchay.jave...@gmail.com> escreveu:

> I ran into this issue and solved it roughly the way you described your
> second approach. You can modify the SQLIntepreter
> <https://github.com/apache/incubator-livy/blob/master/repl/src/main/scala/org/apache/livy/repl/SQLInterpreter.scala#L97>
>  to
> write the output dataframe to S3 instead, and on your client you can
> retrieve the results in a paginated manner from S3. I wrote about this
> problem in a blog post
> <https://medium.com/pinterest-engineering/interactive-querying-with-apache-spark-sql-at-pinterest-2a3eaf60ac1b>
> last year (see "Large Result Handling and Status Tracking") if you're
> interested.
>
> Best
>
> On Mon, 18 Jul 2022 at 11:38, Gos Os <goosro...@gmail.com> wrote:
>
>> Hello folks,
>>
>> I am new to Apache Livy and currently trying to understand how feasible
>> will Livy be for interactive query for a 300 user app.
>>
>> Latency to first results is critical for customer experience.
>>
>> The biggest concern I have is the 1000 limit associated with the
>> take/collect. Most of the ad-how queries will return more than 10k rows
>> easily.
>>
>> In my view there are two options:
>>
>> 1- Livy batch submission with S3 as the destination. Then read the
>> results from the app from S3. This will not be the best experience as
>> customer can’t see results right away.
>>
>> 2- interactive query submission via Livy. Then add a mechanism to perform
>> pagination or write results to S3 if more than 1000 rows returned. The app
>> would know this query has more than 1000 rows and automatically start
>> paginating from S3 after 1000.
>>
>> My question: how have other Livy users with a requirement of low latency
>> to first result solve this?
>>
>> Thank you,
>> Gos.
>>
>

Reply via email to