Hi Chris, Thank you for this. In fact, doing a join will probably suffice for now, and there are surely many 'join tricks' I ought to learn about before I reject this approach. (aka Sadly lacking in SQL skills)
I see a list of reserved SQL keywords in the Drill documentation; can I assume that although reserved, not all these have been implemented? https://drill.apache.org/docs/reserved-keywords/ Re. Pig: A typical problem I encounter in data is, for example, splitting fields that have bilingual names into separate french and english columns, converting unknown non-UTF characters to utf-8, and exiting when this process fails to adhere to some rule. I just assumed that this is beyond the scope of SQL in the context of drill. I wonder if this is the type of thing Pig might be useful for, before I invest time in learning about it. Thank you :) P. On Wed, Jan 6, 2016 at 8:31 AM, Christopher Matta <[email protected]> wrote: > Are you asking about a simple JOIN? > > Below I have a simple list of country_data.csv that I’m joining to an iso > dataset: > > 0: jdbc:drill:> select * from maprfs.cmatta.`iso_3166_2_countries.csv` > limit 10; > +---------+ > | columns | > +---------+ > | ["Sort Order","Common Name","Formal Name","Type","Sub > Type","Sovereignty","Capital","ISO 4217 Currency Code","ISO 4217 > Currency Name","ITU-T Telephone Code","ISO 3166-1 2 Letter Code","ISO > 3166-1 3 Letter Code","ISO 3166-1 Number","IANA Country Code TLD\r"] | > | ["1","Afghanistan","Islamic State of Afghanistan","Independent > State","","","Kabul","AFN","Afghani","93","AF","AFG","4",".af\r"] | > | ["2","Albania","Republic of Albania","Independent > State","","","Tirana","ALL","Lek","355","AL","ALB","8",".al\r"] | > | ["3","Algeria","People's Democratic Republic of > Algeria","Independent > State","","","Algiers","DZD","Dinar","213","DZ","DZA","12",".dz\r"] | > | ["4","Andorra","Principality of Andorra","Independent > State","","","Andorra la > Vella","EUR","Euro","376","AD","AND","20",".ad\r"] | > | ["5","Angola","Republic of Angola","Independent > State","","","Luanda","AOA","Kwanza","244","AO","AGO","24",".ao\r"] | > | ["6","Antigua and Barbuda","","Independent State","","","Saint > John's","XCD","Dollar","-267","AG","ATG","28",".ag\r"] | > | ["7","Argentina","Argentine Republic","Independent > State","","","Buenos Aires","ARS","Peso","54","AR","ARG","32",".ar\r"] > | > | ["8","Armenia","Republic of Armenia","Independent > State","","","Yerevan","AMD","Dram","374","AM","ARM","51",".am\r"] | > | ["9","Australia","Commonwealth of Australia","Independent > State","","","Canberra","AUD","Dollar","61","AU","AUS","36",".au\r"] | > +---------+ > 10 rows selected (0.905 seconds) > 0: jdbc:drill:> select * from maprfs.cmatta.`country_data.csv` limit 10; > +--------------------------------------------------------------------+ > | columns | > +--------------------------------------------------------------------+ > | ["1","Kabul","AFN","Afghani","93","AF","AFG","4",".af"] | > | ["2","Tirana","ALL","Lek","355","AL","ALB","8",".al"] | > | ["3","Algiers","DZD","Dinar","213","DZ","DZA","12",".dz"] | > | ["4","Andorra la Vella","EUR","Euro","376","AD","AND","20",".ad"] | > | ["5","Luanda","AOA","Kwanza","244","AO","AGO","24",".ao"] | > +--------------------------------------------------------------------+ > 5 rows selected (0.35 seconds) > 0: jdbc:drill:> select c.`columns`[1] as capital, i.`columns`[1] as > country from maprfs.cmatta.`country_data.csv` c JOIN > maprfs.cmatta.`iso_3166_2_countries.csv` i ON c.`columns`[7] = > i.`columns`[12]; > +-------------------+--------------+ > | capital | country | > +-------------------+--------------+ > | Kabul | Afghanistan | > | Tirana | Albania | > | Algiers | Algeria | > | Andorra la Vella | Andorra | > | Luanda | Angola | > +-------------------+--------------+ > 5 rows selected (0.82 seconds) > 0: jdbc:drill:> > > What do you hope to accomplish with Pig? A lot of manipulation of data can > be accomplished with Drill and SQL. > > On Tuesday, January 5, 2016, Peder Jakobsen | gmail <[email protected]> > wrote: > > Does drill have something analogous to SQL CONSTRAINT and REFERENCES for > > doing table type lookups? > > > > I have a large data that uses numerical ISO country codes; they need to > be > > replaced with country names that are mapped in a small CSV file > (iso_code, > > country_name). > > > > Any suggestions? > > > > Thanks you, > > > > Peder > > > > P.S. Is it worth learning Pig Latin or something similar for more complex > > queries and data manipulation in Drill? > > > >
