Re: Tricks for Copying Data where Drill is actively querying

Vince Gonzalez Thu, 30 Jun 2016 05:09:07 -0700

I know it doesn't go right to the question of how to make drill ignore
things, but could you copy the data into some parallel tree, then rename it
into the appropriate directory once the copy is done?


Or could that still cause a running query to fail?

On Thursday, June 30, 2016, John Omernik <[email protected]> wrote:

> I am doing query of source data that is two levels deep.
>
> tablename/p_day=2016-05-01/p_hour=1/file1.parquet
>
> I wasn't able to get wildcards at that level to work with dir0 etc.
>
>
>
>
> On Thu, Jun 30, 2016 at 12:39 AM, Ted Dunning <[email protected]
> <javascript:;>> wrote:
>
> > Does it work to provide a wild card in your source spec?
> >
> > a la dfs.tdunning.`/user/tdunning/foo/data/*.parquet`
> >
> > ?
> >
> >
> >
> > On Wed, Jun 29, 2016 at 1:06 PM, John Omernik <[email protected]
> <javascript:;>> wrote:
> >
> > > When the Hadoop FS client copies files (say parquet files) It adds a
> > > ._COPYING_ at the end of the file until it's complete.  If that's there
> > > Drill fails (partial files etc).
> > >
> > > I know I can ignore files that start with . (or directories) but is
> > there a
> > > good way to tell Drill to ignore files that are not *.parquet, or that
> > have
> > > ._COPYING_ at the end of them?
> > >
> > > Thanks!
> > >
> > > John
> > >
> >
>


-- 
 ----
 Vince Gonzalez
 Systems Engineer
 212.694.3879

 mapr.com

Re: Tricks for Copying Data where Drill is actively querying

Reply via email to