[ 
https://issues.apache.org/jira/browse/HIVE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez resolved HIVE-14468.
--------------------------------------------
       Resolution: Fixed
    Fix Version/s: 2.2.0

Pushed in HIVE-14217.

> Implement Druid query based input format
> ----------------------------------------
>
>                 Key: HIVE-14468
>                 URL: https://issues.apache.org/jira/browse/HIVE-14468
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Druid integration
>    Affects Versions: 2.2.0
>            Reporter: Jesus Camacho Rodriguez
>            Assignee: Jesus Camacho Rodriguez
>             Fix For: 2.2.0
>
>
> It is responsible of generating the splits and creating the record readers.
> * For *Timeseries*, *TopN*, *GroupBy* queries. Create a single split 
> containing the broker address and the query. Then the record reader will 
> submit the query to the broker, retrieve the results, and parse them and 
> generate records.
> * For *Select* queries. Druid has the concept of threshold (limit) in Select 
> query. In fact, it is used for retrieving the query results in multiple 
> requests. Hence, we will emit a Druid Segment Metadata query to obtain the 
> number of rows in the datasource. Then we create _number of rows / 
> default\_threshold_ splits; _default\_threshold_ is a Hive configuration 
> property defined as {{hive.druid.select.threshold}}. Each split generated 
> contains the broker address and a Select JSON query with _start_ and _end_ 
> date range (currently we assume uniform distribution of records across the 
> time dimension). The splits are handled independently by the record readers, 
> which submit the query to the broker, retrieve the results, and parse them 
> and generate records. This way we can parallelize the retrieval of results 
> for these queries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to