I have opened JIRA for the same. You can track it here "
https://issues.apache.org/jira/browse/DRILL-4520";

For the time being I am using older approach which is *hive*.

*Below are the steps I am following:*
1. Drill - Creating flattened table from HDFS json logs. But as I mentioned
in trail mail, there are bugs/exceptions. So SKIPPING this step.
1. Repeating step-1 again using hive. Here I am using hive 1.2.1 and
json-serde-1.3.7-jar.

   - a) Defining schema for input json by creating hive external table
   (using json-serde).
   - b) Defining hive parquet table schema which can be filled through hive
   external table. Here all fields in parquet table marked as STRING. This
   step is similar to creating parquet table in Drill.
   - c) hive will create table and stored all data in HDFS and in parquet
   format.
   - d) And later use this HDFS path in drill.

2. Here now onward Using *drill* to process further queries which works
fine.

*Here my question is : *
1. is this write approach to create parquet table using hive and used it
for drill ?
2. Parquet table created by hive and drill will not make any differences
and causes inconsistency ?


We know that drill discovers the schema on-the-fly where hive does not. In
hive, we need to explicitly defined schema.  So I can say here that -
1. Hive explicitly converts data into predefined datatypes where drill
doesn't unless we do cast.
Say for ex: A column with different data types would explicitly convert
into predefined data types in hive But in case of drill it doesn't works
either normal or by casting.

Please provide us any alternative way or any suggestion.
regards,
shankar



For CTAS, I also hit a NPE when storage format was Parquet (default).

With storage format as JSON, i hit this error:
Error: SYSTEM ERROR: IllegalArgumentException: You tried to read a
[readText()] type when you are using a field reader of type
[UnionListReader].

Since this is still an experimental feature, I'm not sure if someone tried
out CTAS previously. Could you open a JIRA for this? Or let me know if you
want me to open one instead.

And since you mention queries without CTAS works fine, can you create views
instead and query that (I tried this and it works fine)?

On Fri, Mar 18, 2016 at 1:29 PM, Shankar Mane <shankar.m...@games24x7.com>
wrote:

> @Abhishek:
>
> Some events in 150 gb json file are like this where they differ in
> structure. I could say there are only 0.1% (per 150gb json file) are such
> events.
>
> And yes, union work perfectly. But only when we use select statements.
>
> Could you please change your select query to CTAS ?   I am getting
> nullpointer exceptions.
> On 19 Mar 2016 01:35, "Abhishek Girish" <abhishek.gir...@gmail.com> wrote:
>
> > Hello Shankar,
> >
> > From the sample data you shared, it looks like you have JSON documents
> > which differ considerably in the schema / structure. This isn't
supported
> > by default.
> >
> > You could try turning on UNION type (an experimental feature).
> >
> > > set `exec.enable_union_type` = true;
> > +-------+----------------------------------+
> > |  ok   |             summary              |
> > +-------+----------------------------------+
> > | true  | exec.enable_union_type updated.  |
> > +-------+----------------------------------+
> > 1 row selected (0.193 seconds)
> >
> > > select
> > > `timestamp`,
> > > sessionid,
> > > gameid,
> > > ajaxUrl,
> > > ajaxData
> > > from dfs.`/tmp/test1.json` t;
> >
> >
>
+----------------+--------------------------------+----------------------------------------------------------------------------------------------+-----------------------+-----------+
> > |   timestamp    |           sessionid            |
> >                    gameid                                            |
> >    ajaxUrl        | ajaxData  |
> >
> >
>
+----------------+--------------------------------+----------------------------------------------------------------------------------------------+-----------------------+-----------+
> > | 1457658600032  | BC497C7C39B3C90AC9E6E9E8194C3  | null
> >                                                                       |
> > /player/updatebonus1  | null      |
> > | 1457771458873  | D18104E8CA3071C7A8F4E141B127   |
> >
> >
>
https://daemon2.com/tournDetails.do?type=myGames&userId=1556148&jsonp_callback=jQuery213043
> >  | []                    | null      |
> > | 1457958600032  | BC497C7C39B3C90AC9E6E9E8194C3  | null
> >                                                                       |
> > /player/updatebonus2  | null      |
> >
> >
>
+----------------+--------------------------------+----------------------------------------------------------------------------------------------+-----------------------+-----------+
> >
> > 3 rows selected (0.36 seconds)
> >
> >
> > Regards,
> > Abhishek
> >
> > On Fri, Mar 18, 2016 at 12:02 PM, Shankar Mane <
> shankar.m...@games24x7.com
> > >
> > wrote:
> >
> > > Guys,
> > >
> > >
> > >    1. I am stuck in the middle of somewhere. Could you please help me
> to
> > >    resolve below error.
> > >    2. I am running query on drill 1.6.0 in cluster on logs json data
> > (150GB
> > >    size of log file) ( 1 json / line).
> > >
> > >
> > > I have just extract 3 lines from logs for test purpose. please find
> > those
> > > lines below.
> > >
> > >
> > > -- ------------------------------------------- *test.json*
> > > -------------------------------------------------
> > >
> > >
> > >
> > >
> >
>
{"ajaxData":null,"metadata":null,"ajaxUrl":"/player/updatebonus1","selectedItem":null,"sessionid":"BC497C7C39B3C90AC9E6E9E8194C3","timestamp":1457658600032}
> > > {"gameId":"
> > >
> > >
> >
>
https://daemon2.com/tournDetails.do?type=myGames&userId=1556148&jsonp_callback=jQuery213043
> > >
> >
>
","ajaxData":null,"metadata":null,"ajaxUrl":[{"R":0,"rNo":1,"gid":4,"wal":0,"d":{"gid":4,"pt":3,"wc":2326,"top":"1","reg":true,"brkt":1457771400268,"sk":"25070010105301000009","id":56312439,"a":0,"st":1457771400000,"e":"0.0","j":0,"n":"Loot
> > > Qualifier
> > >
> > >
> >
>
1","tc":94,"et":0,"syst":1457771456,"rc":145770000,"s":5,"t":10000,"tk":false,"prnId":56311896,"jc":10000,"tp":"100000.0","ro":145400000,"rp":0,"isprn":false},"fl":"192.168.35.42","aaid":"5828"}],"selectedItem":null,"sessionid":"D18104E8CA3071C7A8F4E141B127","timestamp":1457771458873}
> > >
> > >
> >
>
{"ajaxData":null,"metadata":null,"ajaxUrl":"/player/updatebonus2","selectedItem":null,"sessionid":"BC497C7C39B3C90AC9E6E9E8194C3","timestamp":1457958600032}
> > >
> > > -- ------------------------------------------------ *Query*
> > > --------------------------------------------
> > >
> > >
> > > select
> > > `timestamp`,
> > > sessionid,
> > > gameid,
> > > ajaxUrl,
> > > ajaxData
> > > from dfs.`/tmp/test.json` t
> > > ;
> > >
> > >
> > >
> > > Error: DATA_READ ERROR: Error parsing JSON - You tried to start when
> you
> > > are using a ValueWriter of type NullableVarCharWriterImpl.
> > >
> > > File  /tmp/test.json
> > > Record  2
> > > Fragment 0:0
> > >
> >
>

Reply via email to