You can also create UDFs in Drill which would allow you to do more custom
processing like you're describing. That way you can use a single framework
for everything.

On Wed, Jan 6, 2016 at 8:25 AM, Peder Jakobsen | gmail <[email protected]>
wrote:

> Hi Jason,
>
> Makes sense.  Use Drill is to extract a manageable view in smaller size
> and/ or better format, and then work with the data elsewhere.  (?)
>
> Thanks,
>
> P.
>
>
>
>
>
>
>
> On Wed, Jan 6, 2016 at 9:29 AM, Jason Altekruse <[email protected]>
> wrote:
>
> > Peter,
> >
> > If you are looking for a way to constrain inserts into your dataset,
> Drill
> > isn't going to manage this for you. Using CREATE TABLE, you can create a
> > new directory to add data to a dataset (effectively a bulk insert).
> >
> > Drill is focused on analytical workloads, and supports CREATE TABLE to
> > enable the creation of materialized views, or for ETLing into a more
> > efficient format like parquet.
> >
> > If you are receiving data in a format that is processable by Drill, like
> > delimited text, you could run a query against the raw data to check for
> > your constraint and only insert data that meets the constraint. For your
> > example here, you could use an inner join to find all of the records that
> > have valid county codes, and only insert those into your table.
> >
> > - Jason Altekruse
> >
> > On Wed, Jan 6, 2016 at 7:31 AM, Christopher Matta <[email protected]>
> wrote:
> >
> > > Are you asking about a simple JOIN?
> > >
> > > Below I have a simple list of country_data.csv that I’m joining to an
> iso
> > > dataset:
> > >
> > > 0: jdbc:drill:> select * from maprfs.cmatta.`iso_3166_2_countries.csv`
> > > limit 10;
> > > +---------+
> > > | columns |
> > > +---------+
> > > | ["Sort Order","Common Name","Formal Name","Type","Sub
> > > Type","Sovereignty","Capital","ISO 4217 Currency Code","ISO 4217
> > > Currency Name","ITU-T Telephone Code","ISO 3166-1 2 Letter Code","ISO
> > > 3166-1 3 Letter Code","ISO 3166-1 Number","IANA Country Code TLD\r"] |
> > > | ["1","Afghanistan","Islamic State of Afghanistan","Independent
> > > State","","","Kabul","AFN","Afghani","93","AF","AFG","4",".af\r"] |
> > > | ["2","Albania","Republic of Albania","Independent
> > > State","","","Tirana","ALL","Lek","355","AL","ALB","8",".al\r"] |
> > > | ["3","Algeria","People's Democratic Republic of
> > > Algeria","Independent
> > > State","","","Algiers","DZD","Dinar","213","DZ","DZA","12",".dz\r"] |
> > > | ["4","Andorra","Principality of Andorra","Independent
> > > State","","","Andorra la
> > > Vella","EUR","Euro","376","AD","AND","20",".ad\r"] |
> > > | ["5","Angola","Republic of Angola","Independent
> > > State","","","Luanda","AOA","Kwanza","244","AO","AGO","24",".ao\r"] |
> > > | ["6","Antigua and Barbuda","","Independent State","","","Saint
> > > John's","XCD","Dollar","-267","AG","ATG","28",".ag\r"] |
> > > | ["7","Argentina","Argentine Republic","Independent
> > > State","","","Buenos Aires","ARS","Peso","54","AR","ARG","32",".ar\r"]
> > > |
> > > | ["8","Armenia","Republic of Armenia","Independent
> > > State","","","Yerevan","AMD","Dram","374","AM","ARM","51",".am\r"] |
> > > | ["9","Australia","Commonwealth of Australia","Independent
> > > State","","","Canberra","AUD","Dollar","61","AU","AUS","36",".au\r"] |
> > > +---------+
> > > 10 rows selected (0.905 seconds)
> > > 0: jdbc:drill:> select * from maprfs.cmatta.`country_data.csv` limit
> 10;
> > > +--------------------------------------------------------------------+
> > > |                              columns                               |
> > > +--------------------------------------------------------------------+
> > > | ["1","Kabul","AFN","Afghani","93","AF","AFG","4",".af"]            |
> > > | ["2","Tirana","ALL","Lek","355","AL","ALB","8",".al"]              |
> > > | ["3","Algiers","DZD","Dinar","213","DZ","DZA","12",".dz"]          |
> > > | ["4","Andorra la Vella","EUR","Euro","376","AD","AND","20",".ad"]  |
> > > | ["5","Luanda","AOA","Kwanza","244","AO","AGO","24",".ao"]          |
> > > +--------------------------------------------------------------------+
> > > 5 rows selected (0.35 seconds)
> > > 0: jdbc:drill:> select c.`columns`[1] as capital, i.`columns`[1] as
> > > country from maprfs.cmatta.`country_data.csv` c JOIN
> > > maprfs.cmatta.`iso_3166_2_countries.csv` i ON c.`columns`[7] =
> > > i.`columns`[12];
> > > +-------------------+--------------+
> > > |      capital      |   country    |
> > > +-------------------+--------------+
> > > | Kabul             | Afghanistan  |
> > > | Tirana            | Albania      |
> > > | Algiers           | Algeria      |
> > > | Andorra la Vella  | Andorra      |
> > > | Luanda            | Angola       |
> > > +-------------------+--------------+
> > > 5 rows selected (0.82 seconds)
> > > 0: jdbc:drill:>
> > >
> > > What do you hope to accomplish with Pig? A lot of manipulation of data
> > can
> > > be accomplished with Drill and SQL.
> > >
> > > On Tuesday, January 5, 2016, Peder Jakobsen | gmail <
> [email protected]
> > >
> > > wrote:
> > >
> > > Does drill have something analogous  to  SQL  CONSTRAINT and REFERENCES
> > for
> > > > doing table type  lookups?
> > > >
> > > > I have a large data that uses numerical ISO country codes; they need
> to
> > > be
> > > > replaced with country names that are mapped in a small CSV file
> > > (iso_code,
> > > > country_name).
> > > >
> > > > Any suggestions?
> > > >
> > > > Thanks you,
> > > >
> > > > Peder
> > > >
> > > > P.S. Is it worth learning Pig Latin or something similar for more
> > complex
> > > > queries and data manipulation in Drill?
> > > >
> > > ​
> > >
> >
>



-- 
Tomer Shiran
CEO and Co-Founder, Dremio

Reply via email to