Hi Jason, Makes sense. Use Drill is to extract a manageable view in smaller size and/ or better format, and then work with the data elsewhere. (?)
Thanks, P. On Wed, Jan 6, 2016 at 9:29 AM, Jason Altekruse <[email protected]> wrote: > Peter, > > If you are looking for a way to constrain inserts into your dataset, Drill > isn't going to manage this for you. Using CREATE TABLE, you can create a > new directory to add data to a dataset (effectively a bulk insert). > > Drill is focused on analytical workloads, and supports CREATE TABLE to > enable the creation of materialized views, or for ETLing into a more > efficient format like parquet. > > If you are receiving data in a format that is processable by Drill, like > delimited text, you could run a query against the raw data to check for > your constraint and only insert data that meets the constraint. For your > example here, you could use an inner join to find all of the records that > have valid county codes, and only insert those into your table. > > - Jason Altekruse > > On Wed, Jan 6, 2016 at 7:31 AM, Christopher Matta <[email protected]> wrote: > > > Are you asking about a simple JOIN? > > > > Below I have a simple list of country_data.csv that I’m joining to an iso > > dataset: > > > > 0: jdbc:drill:> select * from maprfs.cmatta.`iso_3166_2_countries.csv` > > limit 10; > > +---------+ > > | columns | > > +---------+ > > | ["Sort Order","Common Name","Formal Name","Type","Sub > > Type","Sovereignty","Capital","ISO 4217 Currency Code","ISO 4217 > > Currency Name","ITU-T Telephone Code","ISO 3166-1 2 Letter Code","ISO > > 3166-1 3 Letter Code","ISO 3166-1 Number","IANA Country Code TLD\r"] | > > | ["1","Afghanistan","Islamic State of Afghanistan","Independent > > State","","","Kabul","AFN","Afghani","93","AF","AFG","4",".af\r"] | > > | ["2","Albania","Republic of Albania","Independent > > State","","","Tirana","ALL","Lek","355","AL","ALB","8",".al\r"] | > > | ["3","Algeria","People's Democratic Republic of > > Algeria","Independent > > State","","","Algiers","DZD","Dinar","213","DZ","DZA","12",".dz\r"] | > > | ["4","Andorra","Principality of Andorra","Independent > > State","","","Andorra la > > Vella","EUR","Euro","376","AD","AND","20",".ad\r"] | > > | ["5","Angola","Republic of Angola","Independent > > State","","","Luanda","AOA","Kwanza","244","AO","AGO","24",".ao\r"] | > > | ["6","Antigua and Barbuda","","Independent State","","","Saint > > John's","XCD","Dollar","-267","AG","ATG","28",".ag\r"] | > > | ["7","Argentina","Argentine Republic","Independent > > State","","","Buenos Aires","ARS","Peso","54","AR","ARG","32",".ar\r"] > > | > > | ["8","Armenia","Republic of Armenia","Independent > > State","","","Yerevan","AMD","Dram","374","AM","ARM","51",".am\r"] | > > | ["9","Australia","Commonwealth of Australia","Independent > > State","","","Canberra","AUD","Dollar","61","AU","AUS","36",".au\r"] | > > +---------+ > > 10 rows selected (0.905 seconds) > > 0: jdbc:drill:> select * from maprfs.cmatta.`country_data.csv` limit 10; > > +--------------------------------------------------------------------+ > > | columns | > > +--------------------------------------------------------------------+ > > | ["1","Kabul","AFN","Afghani","93","AF","AFG","4",".af"] | > > | ["2","Tirana","ALL","Lek","355","AL","ALB","8",".al"] | > > | ["3","Algiers","DZD","Dinar","213","DZ","DZA","12",".dz"] | > > | ["4","Andorra la Vella","EUR","Euro","376","AD","AND","20",".ad"] | > > | ["5","Luanda","AOA","Kwanza","244","AO","AGO","24",".ao"] | > > +--------------------------------------------------------------------+ > > 5 rows selected (0.35 seconds) > > 0: jdbc:drill:> select c.`columns`[1] as capital, i.`columns`[1] as > > country from maprfs.cmatta.`country_data.csv` c JOIN > > maprfs.cmatta.`iso_3166_2_countries.csv` i ON c.`columns`[7] = > > i.`columns`[12]; > > +-------------------+--------------+ > > | capital | country | > > +-------------------+--------------+ > > | Kabul | Afghanistan | > > | Tirana | Albania | > > | Algiers | Algeria | > > | Andorra la Vella | Andorra | > > | Luanda | Angola | > > +-------------------+--------------+ > > 5 rows selected (0.82 seconds) > > 0: jdbc:drill:> > > > > What do you hope to accomplish with Pig? A lot of manipulation of data > can > > be accomplished with Drill and SQL. > > > > On Tuesday, January 5, 2016, Peder Jakobsen | gmail <[email protected] > > > > wrote: > > > > Does drill have something analogous to SQL CONSTRAINT and REFERENCES > for > > > doing table type lookups? > > > > > > I have a large data that uses numerical ISO country codes; they need to > > be > > > replaced with country names that are mapped in a small CSV file > > (iso_code, > > > country_name). > > > > > > Any suggestions? > > > > > > Thanks you, > > > > > > Peder > > > > > > P.S. Is it worth learning Pig Latin or something similar for more > complex > > > queries and data manipulation in Drill? > > > > > > > >
