You can also create UDFs in Drill which would allow you to do more custom processing like you're describing. That way you can use a single framework for everything.
On Wed, Jan 6, 2016 at 8:25 AM, Peder Jakobsen | gmail <[email protected]> wrote: > Hi Jason, > > Makes sense. Use Drill is to extract a manageable view in smaller size > and/ or better format, and then work with the data elsewhere. (?) > > Thanks, > > P. > > > > > > > > On Wed, Jan 6, 2016 at 9:29 AM, Jason Altekruse <[email protected]> > wrote: > > > Peter, > > > > If you are looking for a way to constrain inserts into your dataset, > Drill > > isn't going to manage this for you. Using CREATE TABLE, you can create a > > new directory to add data to a dataset (effectively a bulk insert). > > > > Drill is focused on analytical workloads, and supports CREATE TABLE to > > enable the creation of materialized views, or for ETLing into a more > > efficient format like parquet. > > > > If you are receiving data in a format that is processable by Drill, like > > delimited text, you could run a query against the raw data to check for > > your constraint and only insert data that meets the constraint. For your > > example here, you could use an inner join to find all of the records that > > have valid county codes, and only insert those into your table. > > > > - Jason Altekruse > > > > On Wed, Jan 6, 2016 at 7:31 AM, Christopher Matta <[email protected]> > wrote: > > > > > Are you asking about a simple JOIN? > > > > > > Below I have a simple list of country_data.csv that I’m joining to an > iso > > > dataset: > > > > > > 0: jdbc:drill:> select * from maprfs.cmatta.`iso_3166_2_countries.csv` > > > limit 10; > > > +---------+ > > > | columns | > > > +---------+ > > > | ["Sort Order","Common Name","Formal Name","Type","Sub > > > Type","Sovereignty","Capital","ISO 4217 Currency Code","ISO 4217 > > > Currency Name","ITU-T Telephone Code","ISO 3166-1 2 Letter Code","ISO > > > 3166-1 3 Letter Code","ISO 3166-1 Number","IANA Country Code TLD\r"] | > > > | ["1","Afghanistan","Islamic State of Afghanistan","Independent > > > State","","","Kabul","AFN","Afghani","93","AF","AFG","4",".af\r"] | > > > | ["2","Albania","Republic of Albania","Independent > > > State","","","Tirana","ALL","Lek","355","AL","ALB","8",".al\r"] | > > > | ["3","Algeria","People's Democratic Republic of > > > Algeria","Independent > > > State","","","Algiers","DZD","Dinar","213","DZ","DZA","12",".dz\r"] | > > > | ["4","Andorra","Principality of Andorra","Independent > > > State","","","Andorra la > > > Vella","EUR","Euro","376","AD","AND","20",".ad\r"] | > > > | ["5","Angola","Republic of Angola","Independent > > > State","","","Luanda","AOA","Kwanza","244","AO","AGO","24",".ao\r"] | > > > | ["6","Antigua and Barbuda","","Independent State","","","Saint > > > John's","XCD","Dollar","-267","AG","ATG","28",".ag\r"] | > > > | ["7","Argentina","Argentine Republic","Independent > > > State","","","Buenos Aires","ARS","Peso","54","AR","ARG","32",".ar\r"] > > > | > > > | ["8","Armenia","Republic of Armenia","Independent > > > State","","","Yerevan","AMD","Dram","374","AM","ARM","51",".am\r"] | > > > | ["9","Australia","Commonwealth of Australia","Independent > > > State","","","Canberra","AUD","Dollar","61","AU","AUS","36",".au\r"] | > > > +---------+ > > > 10 rows selected (0.905 seconds) > > > 0: jdbc:drill:> select * from maprfs.cmatta.`country_data.csv` limit > 10; > > > +--------------------------------------------------------------------+ > > > | columns | > > > +--------------------------------------------------------------------+ > > > | ["1","Kabul","AFN","Afghani","93","AF","AFG","4",".af"] | > > > | ["2","Tirana","ALL","Lek","355","AL","ALB","8",".al"] | > > > | ["3","Algiers","DZD","Dinar","213","DZ","DZA","12",".dz"] | > > > | ["4","Andorra la Vella","EUR","Euro","376","AD","AND","20",".ad"] | > > > | ["5","Luanda","AOA","Kwanza","244","AO","AGO","24",".ao"] | > > > +--------------------------------------------------------------------+ > > > 5 rows selected (0.35 seconds) > > > 0: jdbc:drill:> select c.`columns`[1] as capital, i.`columns`[1] as > > > country from maprfs.cmatta.`country_data.csv` c JOIN > > > maprfs.cmatta.`iso_3166_2_countries.csv` i ON c.`columns`[7] = > > > i.`columns`[12]; > > > +-------------------+--------------+ > > > | capital | country | > > > +-------------------+--------------+ > > > | Kabul | Afghanistan | > > > | Tirana | Albania | > > > | Algiers | Algeria | > > > | Andorra la Vella | Andorra | > > > | Luanda | Angola | > > > +-------------------+--------------+ > > > 5 rows selected (0.82 seconds) > > > 0: jdbc:drill:> > > > > > > What do you hope to accomplish with Pig? A lot of manipulation of data > > can > > > be accomplished with Drill and SQL. > > > > > > On Tuesday, January 5, 2016, Peder Jakobsen | gmail < > [email protected] > > > > > > wrote: > > > > > > Does drill have something analogous to SQL CONSTRAINT and REFERENCES > > for > > > > doing table type lookups? > > > > > > > > I have a large data that uses numerical ISO country codes; they need > to > > > be > > > > replaced with country names that are mapped in a small CSV file > > > (iso_code, > > > > country_name). > > > > > > > > Any suggestions? > > > > > > > > Thanks you, > > > > > > > > Peder > > > > > > > > P.S. Is it worth learning Pig Latin or something similar for more > > complex > > > > queries and data manipulation in Drill? > > > > > > > > > > > > > -- Tomer Shiran CEO and Co-Founder, Dremio
