I ran into this issue and solved it roughly the way you described your
second approach. You can modify the SQLIntepreter
<https://github.com/apache/incubator-livy/blob/master/repl/src/main/scala/org/apache/livy/repl/SQLInterpreter.scala#L97>
to
write the output dataframe to S3 instead, and on your client you can
retrieve the results in a paginated manner from S3. I wrote about this
problem in a blog post
<https://medium.com/pinterest-engineering/interactive-querying-with-apache-spark-sql-at-pinterest-2a3eaf60ac1b>
last year (see "Large Result Handling and Status Tracking") if you're
interested.

Best

On Mon, 18 Jul 2022 at 11:38, Gos Os <goosro...@gmail.com> wrote:

> Hello folks,
>
> I am new to Apache Livy and currently trying to understand how feasible
> will Livy be for interactive query for a 300 user app.
>
> Latency to first results is critical for customer experience.
>
> The biggest concern I have is the 1000 limit associated with the
> take/collect. Most of the ad-how queries will return more than 10k rows
> easily.
>
> In my view there are two options:
>
> 1- Livy batch submission with S3 as the destination. Then read the results
> from the app from S3. This will not be the best experience as customer
> can’t see results right away.
>
> 2- interactive query submission via Livy. Then add a mechanism to perform
> pagination or write results to S3 if more than 1000 rows returned. The app
> would know this query has more than 1000 rows and automatically start
> paginating from S3 after 1000.
>
> My question: how have other Livy users with a requirement of low latency
> to first result solve this?
>
> Thank you,
> Gos.
>

Reply via email to