Hi Chris,

Thank you for this.  In fact, doing a join will probably suffice for now,
and there are surely many 'join tricks' I ought to learn about before I
reject this approach. (aka Sadly lacking in SQL skills)

I see a list of reserved SQL keywords in the Drill documentation; can I
assume that although reserved, not all these have been implemented?

https://drill.apache.org/docs/reserved-keywords/

Re. Pig:
A typical problem I encounter in data is, for example, splitting fields
that have bilingual names into separate french and english columns,
converting unknown non-UTF characters to utf-8, and exiting when this
process fails to adhere to some rule.   I just assumed that this is beyond
the scope of SQL in the context of drill.  I wonder if this is the type of
thing Pig might be useful for, before I invest time in learning about it.

Thank you  :)

P.


On Wed, Jan 6, 2016 at 8:31 AM, Christopher Matta <[email protected]> wrote:

> Are you asking about a simple JOIN?
>
> Below I have a simple list of country_data.csv that I’m joining to an iso
> dataset:
>
> 0: jdbc:drill:> select * from maprfs.cmatta.`iso_3166_2_countries.csv`
> limit 10;
> +---------+
> | columns |
> +---------+
> | ["Sort Order","Common Name","Formal Name","Type","Sub
> Type","Sovereignty","Capital","ISO 4217 Currency Code","ISO 4217
> Currency Name","ITU-T Telephone Code","ISO 3166-1 2 Letter Code","ISO
> 3166-1 3 Letter Code","ISO 3166-1 Number","IANA Country Code TLD\r"] |
> | ["1","Afghanistan","Islamic State of Afghanistan","Independent
> State","","","Kabul","AFN","Afghani","93","AF","AFG","4",".af\r"] |
> | ["2","Albania","Republic of Albania","Independent
> State","","","Tirana","ALL","Lek","355","AL","ALB","8",".al\r"] |
> | ["3","Algeria","People's Democratic Republic of
> Algeria","Independent
> State","","","Algiers","DZD","Dinar","213","DZ","DZA","12",".dz\r"] |
> | ["4","Andorra","Principality of Andorra","Independent
> State","","","Andorra la
> Vella","EUR","Euro","376","AD","AND","20",".ad\r"] |
> | ["5","Angola","Republic of Angola","Independent
> State","","","Luanda","AOA","Kwanza","244","AO","AGO","24",".ao\r"] |
> | ["6","Antigua and Barbuda","","Independent State","","","Saint
> John's","XCD","Dollar","-267","AG","ATG","28",".ag\r"] |
> | ["7","Argentina","Argentine Republic","Independent
> State","","","Buenos Aires","ARS","Peso","54","AR","ARG","32",".ar\r"]
> |
> | ["8","Armenia","Republic of Armenia","Independent
> State","","","Yerevan","AMD","Dram","374","AM","ARM","51",".am\r"] |
> | ["9","Australia","Commonwealth of Australia","Independent
> State","","","Canberra","AUD","Dollar","61","AU","AUS","36",".au\r"] |
> +---------+
> 10 rows selected (0.905 seconds)
> 0: jdbc:drill:> select * from maprfs.cmatta.`country_data.csv` limit 10;
> +--------------------------------------------------------------------+
> |                              columns                               |
> +--------------------------------------------------------------------+
> | ["1","Kabul","AFN","Afghani","93","AF","AFG","4",".af"]            |
> | ["2","Tirana","ALL","Lek","355","AL","ALB","8",".al"]              |
> | ["3","Algiers","DZD","Dinar","213","DZ","DZA","12",".dz"]          |
> | ["4","Andorra la Vella","EUR","Euro","376","AD","AND","20",".ad"]  |
> | ["5","Luanda","AOA","Kwanza","244","AO","AGO","24",".ao"]          |
> +--------------------------------------------------------------------+
> 5 rows selected (0.35 seconds)
> 0: jdbc:drill:> select c.`columns`[1] as capital, i.`columns`[1] as
> country from maprfs.cmatta.`country_data.csv` c JOIN
> maprfs.cmatta.`iso_3166_2_countries.csv` i ON c.`columns`[7] =
> i.`columns`[12];
> +-------------------+--------------+
> |      capital      |   country    |
> +-------------------+--------------+
> | Kabul             | Afghanistan  |
> | Tirana            | Albania      |
> | Algiers           | Algeria      |
> | Andorra la Vella  | Andorra      |
> | Luanda            | Angola       |
> +-------------------+--------------+
> 5 rows selected (0.82 seconds)
> 0: jdbc:drill:>
>
> What do you hope to accomplish with Pig? A lot of manipulation of data can
> be accomplished with Drill and SQL.
>
> On Tuesday, January 5, 2016, Peder Jakobsen | gmail <[email protected]>
> wrote:
>
> Does drill have something analogous  to  SQL  CONSTRAINT and REFERENCES for
> > doing table type  lookups?
> >
> > I have a large data that uses numerical ISO country codes; they need to
> be
> > replaced with country names that are mapped in a small CSV file
> (iso_code,
> > country_name).
> >
> > Any suggestions?
> >
> > Thanks you,
> >
> > Peder
> >
> > P.S. Is it worth learning Pig Latin or something similar for more complex
> > queries and data manipulation in Drill?
> >
> ​
>

Reply via email to