Re: Array in Drill

Paul Rogers Wed, 09 Jan 2019 11:30:38 -0800

Hi Ben,

Drill is a SQL relational engine that was enhanced to support extended JSON 
types, including arrays. The need to use only SQL to work with array imposes 
some major constraints.


As you noted, there is no way to project specific array items while leaving 
them in an array. When you project array elements, as in your example, you 
create new, top-level scalar columns. So:

SELECT columns[0] AS `name`, columns[1] AS `address` ...

Is a common use case: it not only projects certain array entries, it projects 
them as new, named, top-level columns. There is no way to say: "give me a new 
array with just elements 0 and 1 from the old array." Note that this problem 
also occurs for maps. 

Now, is this a problem? Not really. The primary clients for Drill are JDBC and 
OBDC, neither of which can handle arrays. (The native Drill client, which 
almost no one uses, can handle arrays and maps.) So, somewhere in the query, 
maps and arrays must be "flattened" to scalar columns. So, it is actually handy 
that the projection shown above both identifies the array items you want, AND 
puts them into a form that ODBC/JDBC can use.


Your example, by the way, is specific to CSV (really, any delimited text file) 
that does not have headers. It is natural to want to project those columns into 
something named. SQL is much easier to read with "`name` = 'Bob'" than 
"columns[0] = 'Bob'". As you noted, Parquet also supports arrays as does JSON.

As Charles pointed out, there are a number of array-aware functions available. 
If you can't find what you need, recall that Drill is open: you can write your 
own UDF. Then, you can do as Charles and others have done: contribute them to 
Drill to enrich the set of operations available to others. We Drill core 
developers are good at the internals, but you, Charles and others better 
understand the kind of operations you want to apply to your data. We need your 
help to build out the set of functions to solve your needs.

All that said, what are you trying to accomplish? Perhaps we can help you solve 
your specific challenge.

Thanks,
- Paul

 

    On Wednesday, January 9, 2019, 7:07:05 AM PST, [email protected] 
<[email protected]> wrote:  
 
 Hi,

Although it's possible to work with ARRAY in DRILL :
SELECT columns[0], columns[1]... FROM ...;

I don't find any array function to know length or dimension of array or any 
other useful fct

I don't find any possibility to "build" my array with different data like :
SELECT ARRAY[data1, data2, data3] FROM ...;

Aggregate function that build array doesn't exists
Example : SELECT mycolumn1, array_agg(mycolumn2) FROM mytable GROUP BY 
mycolomn1;

Or would I miss something ?

I know that array type doesn't exists for Parquet files but these capacities 
will be very useful
and the limitation of Parquet should not limit the possibilities of DRILL.

Appreciate any info or return on this subject ?

Regards,

Re: Array in Drill

Reply via email to