Hi Daniel,

Looks like it's data schema change related issue. You should be getting this 
error with data uncompressed as well.  Check for any schema change with json 
data structure and see if setting below property helps (Experimental feature as 
per doc).


ALTER SESSION SET `exec.enable_union_type` = true;


Below links may be helpful.


http://drill.apache.org/docs/json-data-model/#limitations-and-workarounds


https://issues.apache.org/jira/browse/DRILL-4520


<https://issues.apache.org/jira/browse/DRILL-4520>Thanks,


Arjun


________________________________
From: Daniel McQuillen <daniel.mcquil...@gmail.com>
Sent: Friday, October 20, 2017 2:27 PM
To: user@drill.apache.org
Subject: Re: S3 with mixed files

Hi Arjun,

Yes! Thanks. I didn't have my "log" storage plugin defined correctly (It
was missing the "extensions" key set to value "log".)

However, when I try to query a file like abc.log.gz

select * from ibios3.root.`/tracking/abc.log.gz`;


I get a different error

org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:
IllegalStateException: You tried to start when you are using a ValueWriter
of type NullableVarCharWriterImpl. Fragment 0:0 [Error Id:
33dedb5f-2e3d-4e54-a918-0ad3553436ce on
ip-10-0-0-24.us-west-1.compute.internal:31010]

I've followed the docs and have my storage plugin defined as:

    "log": {
      "type": "json",
      "extensions": [
        "gz"
      ]
    },

I also tried (thinking maybe I'm misreading the docs and .gz support is
built it)...

    "log": {
      "type": "json",
      "extensions": [
        "log"
      ]
    },

and

    "log": {
      "type": "json",
      "extensions": [
        "log", "gz"
      ]
    },

with no luck.

Thanks for any further direction you can provide!

Best Regards,

Daniel





On Fri, Oct 20, 2017 at 6:52 PM, Arjun kr <arjun...@outlook.com> wrote:

> Hi Daniel,
>
> This error may occur if you don't have format defined in S3 storage plugin
> that handles ".log" extension.
>
> For eg:
>
> -- I have file input.csv and have csv format defined in s3 storage plugin.
>
> 2 rows selected (1.233 seconds)
> 0: jdbc:drill:schema=dfs> select * from s3.root.`test-dir/input.csv`;
> +--------------------------------------------------+
> |                     columns                      |
> +--------------------------------------------------+
> | ["\"Pespsi,Pepsi\",\"Pespsi,Pepsi [100.00]",""]  |
> | ["Pespsi,Pepsi\",\"Pespsi,Pepsi [100.00]",""]    |
> | ["Pespsi,Pepsi","Pespsi,Pepsi [100.00]"]         |
> +--------------------------------------------------+
> 3 rows selected (3.418 seconds)
>
> -- Renamed S3 file input.csv to input.log
>
> 0: jdbc:drill:schema=dfs> select * from s3.root.`test-dir/input.log`;
> Error: VALIDATION ERROR: From line 1, column 15 to line 1, column 16:
> Table 's3.root.test-dir/input.log' not found
>
> SQL Query null
>
> [Error Id: 5996db7d-c886-45a8-bddf-99f11159db66 on arjun-lab-73:31010]
> (state=,code=0)
> 0: jdbc:drill:schema=dfs>
>
> Thanks,
>
> Arjun
>
>
> ________________________________
> From: Divya Gehlot <divya.htco...@gmail.com>
> Sent: Friday, October 20, 2017 12:50 PM
> To: user@drill.apache.org
> Subject: Re: S3 with mixed files
>
> Hi Daniel,
> Can you try select * from ibios3.root.`./tracking/tracking.log`;
> instead of
> select * from ibios3.root.`tracking/tracking.log`;
>
> Thanks,
> Divya
>
>
> On 20 October 2017 at 13:13, Daniel McQuillen <daniel.mcquil...@gmail.com>
> wrote:
>
> > Thanks for your help, Padma!
> >
> > Just tried the following, per your suggestion:
> >
> > select * from ibios3.root.`tracking/tracking.log`;
> >
> > Still getting an error (although as I mentioned before I can do a 'show
> > files;' ok so the credentials must be working):
> >
> >  "org.apache.drill.common.exceptions.UserRemoteException: VALIDATION
> > ERROR:
> > From line 1, column 15 to line 1, column 20: Table
> > 'ibios3.root.tracking/tracking.log' not found SQL Query null [Error Id:
> > fbd59cf8-d6ec-4022-b682-9b51d33f8302 on
> > ip-10-0-0-24.us-west-1.compute.internal:31010]
> >
> >
> > I tried from both the embedded command line and the web interface. Do you
> > have any other suggestions? Thanks in advance.
> >
> > Best Regards,
> >
> > Daniel
> >
> >
> >
> > On Fri, Oct 20, 2017 at 12:25 PM, Padma Penumarthy <ppenumar...@mapr.com
> >
> > wrote:
> >
> > > From your error log, it seems like you may be specifying the table
> > > incorrectly.
> > > Instead of 'ibios3.root.tracking/tracking.log’, can you try
> > > ibios3.root.`tracking/tracking.log`
> > >
> > > i.e. for example, select * from ibios3.root.`tracking/tracking.log`
> > >
> > > Thanks
> > > Padma
> > >
> > >
> > > > On Oct 18, 2017, at 7:15 PM, Daniel McQuillen <
> > > daniel.mcquil...@gmail.com> wrote:
> > > >
> > > > Hi,
> > > >
> > > > Attempting to use Apache Drill to parse Open edX tracking log files I
> > > have
> > > > stored on S3.
> > > >
> > > > I've successfully set up an S3 connection and I can see my different
> > > > directories in the target S3 bucket when I type `show files;` in
> > embedded
> > > > drill. Hooray!
> > > >
> > > > However, I can't seem to do a query. I keep getting a "not found"
> error
> > > >
> > > > SEVERE: org.apache.calcite.runtime.CalciteContextException: From
> line
> > 1,
> > > > column 15 to line 1, column 20: Table 'ibios3.root.tracking/
> > > tracking.log'
> > > > not found
> > > >
> > > > The "tracking" subdirectory has a most recent `tracking.log` file as
> > well
> > > > as a bunch of  gzipped older files, e.g. `tracking-log-20170518-1234.
> > gz`
> > > > ... could this be confusing Drill? I've tried querying an individual
> > file
> > > > (tracking.log) as well as the directory itself, but not luck.
> > > >
> > > > Thanks for any thoughts!
> > > >
> > > >
> > > > - Daniel
> > >
> > >
> >
>

Reply via email to