I found this:

https://drill.apache.org/docs/drop-table/ :

"Currently, Drill does not have a mechanism in place, such as read locks on
files, to address concurrency issues. For example, if one user runs a query
that references a table that another user simultaneously issues the DROP
TABLE command against, there is no mechanism in place to prevent a
collision of the two processes. In such a scenario, Drill may return
partial query results or a system error to the user running the query when
the table is dropped."

A solution would be to perform a delete operation in HDFS when no queries
are in "running" state or schedule a delete outside business hours.



On Tue, Mar 1, 2016 at 11:25 AM, François Méthot <[email protected]>
wrote:

> Hi,
>
>   We need to manage a rolling window of parquet data within drill.
>
> Our parquet files are partitioned by hour,
> Once hdfs reach a certain usage threshold, we want to delete the oldest
> partition folder.
>
> A simple approach would be to run a cron job that check the hdfs usage and
> delete the oldest partition folder if necessary, would that cause issue if
> this operation occurs while a query is  running on those files?
>
> Would you recommend instead writing a script/app that submit a "drop
> table" on the oldest partition folder using odbc interface?
>
> Any other ideas are welcome.
>
> Thanks a lot!
> François
>
>
>
>
>
>
>
>
  • Rolling Window François Méthot
    • Re: Rolling Window François Méthot

Reply via email to