Various ramblings of a newbie

2015-07-11 Thread Stefán Baxter
Hi, I'm new to Drill and Parquet and the following are questions/observations I made during my initial discovery phase. I'm sharing them here for other newbies but also to see if some of these concerns are invalid or based on misunderstanding. I made no list of the things that I like of what I h

Re: Various ramblings of a newbie

2015-07-11 Thread andrew
Hi, Thanks for this list. I think it would be helpful if you could create some JIRA tickets around these bugs at https://issues.apache.org/jira/browse/DRILL/ . It’s easier to track issues that way than by email. - A > On Jul 11, 2015, at 11:40 AM

Re: Various ramblings of a newbie

2015-07-11 Thread Jacques Nadeau
Hi Stefán, great to hear your thoughts. I'll try to shed some light where I can. > *Misc. observations:* > >- *Foreign key lookups (joins)* >- Coming from the traditional RDBM world I have a hard time wrapping my >head around how this can efficient/fast > Drill is primarily focused

Re: Various ramblings of a newbie

2015-07-11 Thread Stefán Baxter
Hi Jacques, and thank you for answering swiftly and clearly :). Some additional questions did arise (see inline): >- *Foreign key lookups (joins)* > I'm guessing my fk_lookup scenario would/could benefit from using other storage options for that. Currently most of this is in Postgres and a

Re: Various ramblings of a newbie

2015-07-11 Thread Ted Dunning
The problem being referred to was one where the type of the data changed and the order in which it was encountered made a difference. For files where the schema is known early (the only thing that ordinary SQL can handle), this won't happen. Also, the problem only occurred in nested data in which

Re: Various ramblings of a newbie

2015-07-13 Thread Jacques Nadeau
> I'm guessing my fk_lookup scenario would/could benefit from using other > storage options for that. > Currently most of this is in Postgres and a think I saw some mention of > supporting traditional data sources soon :) Agreed. > Yeah, I saw the one involving metadata caching. That seem qui

Re: Various ramblings of a newbie

2015-07-13 Thread Stefán Baxter
Hi and thanks, Regarding "/part2": I think that append table would allow for a "cleaner" setup. Adding data once a day would lead to a fairly messy directory structure (perhaps irrelevant). We are dealing with multi tenancy and Partition by sounds like a good way for that. I'm guessing Partition