The link Kristine posted gave me a 404, here is the corrected link hosted on the apache server.
http://drill.apache.org/docs/json-data-model/#handling-type-differences On Fri, Apr 3, 2015 at 7:49 AM, Kristine Hahn <kh...@maprtech.com> wrote: > You solve the "Needed to be in state INIT or IN_VARCHAR but in mode > IN_BIGINT" by using all_text_mode to resolve the schema differences, as > described in > > http://apache.github.io/drill/docs/json-data-model/handling-type-differences > . > On my jdbc connection, for example: > > > select * from dfs.`/Users/opendata.json` limit 1; > > > > > >> Query failed: Query stopped., Needed to be in state INIT or IN_VARCHAR > >> but in mode IN_BIGINT [ da707fe9-e62c-4e9b-a62a-49b7cab37dfd on > >> 10.0.0.6:31010 ] > > > > > >> . . . > > > > > >> 0: jdbc:drill:zk=local> ALTER SYSTEM SET `store.json.all_text_mode` = > >> true; > > > > > >> +------------+------------+ > > > > | ok | summary | > > > > > >> +------------+------------+ > > > > | true | store.json.all_text_mode updated. | > > > > +------------+------------+ > > > > > >> 1 row selected (0.047 seconds) > > > > > >> 0: jdbc:drill:zk=local> select * from dfs.`/Users/opendata.json` limit > 1; > > > > > >> +------------+------------+ > > > > | meta | data | > > > > +------------+------------+ > > > > | {"view":{"id":"n2rk-fwkj","name":"Unclaimed bank > >> accounts","averageRating":"0","category":"Government"," > > > > . . . > > > > Now, how exactly to flatten that big array is another question, answer > TBD. > > Kristine Hahn > Sr. Technical Writer > 415-497-8107 @krishahn > > > On Fri, Apr 3, 2015 at 5:41 AM, Muthu Pandi <muthu1...@gmail.com> wrote: > > > Tried with the Flatten but the result is same , Kindly help with pointers > > > > "ERROR [HY000] [MapR][Drill] (1040) Drill failed to execute the query: > > SELECT * FROM `HDFS`.`root`.`./user/hadoop2/unclaimedaccount.json` LIMIT > > 100 > > [30024]Query execution error. Details:[ > > Query stopped., Needed to be in state INIT or IN_VARCHAR but in mode > > IN_BIGINT [ 7185da78-7759-4a8d-aebb-005f067a12e7 on nn01:31010 ] > > > > ] " > > > > > > > > *RegardsMuthupandi.K* > > > > Think before you print. > > > > > > > > On Fri, Apr 3, 2015 at 10:12 AM, Muthu Pandi <muthu1...@gmail.com> > wrote: > > > > > Thankyou Jason for ur detailed answer. > > > > > > Will try to use the Flatten on data column and let u know the status. > > > > > > Error message got from ODBC is > > > > > > "ERROR [HY000] [MapR][Drill] (1040) Drill failed to execute the query: > > > SELECT * FROM `HDFS`.`root`.`./user/hadoop2/unclaimedaccount.json` > LIMIT > > 100 > > > [30024]Query execution error. Details:[ > > > Query stopped., Needed to be in state INIT or IN_VARCHAR but in mode > > > IN_BIGINT [ 7185da78-7759-4a8d-aebb-005f067a12e7 on nn01:31010 ] > > > > > > ] " > > > > > > Is there any way to normalise or convert this nested data to simpler > JSON > > > so that i can play with DRILL? > > > > > > > > > > > > *RegardsMuthupandi.K* > > > > > > Think before you print. > > > > > > > > > > > > On Thu, Apr 2, 2015 at 9:23 PM, Jason Altekruse < > > altekruseja...@gmail.com> > > > wrote: > > > > > >> To answer Andries' question, with an enhancement in the 0.8 release, > > there > > >> should be no hard limit on the size of Drill records supported. That > > being > > >> said, Drill is not fundamentally set up for processing enormous rows, > so > > >> we > > >> do not have a clear idea of the performance impact of working with > such > > >> datasets. > > >> > > >> This document is going to be read as a single record originally, and I > > >> think the 0.8 release should be able to read it in. From there, > flatten > > >> should be able to produce individual records suitable for further > > >> analysis, > > >> these records will be be a more reasonable size and get you good > > >> performance for further analysis. > > >> > > >> -Jason > > >> > > >> On Thu, Apr 2, 2015 at 8:49 AM, Jason Altekruse < > > altekruseja...@gmail.com > > >> > > > >> wrote: > > >> > > >> > Hi Muthu, > > >> > > > >> > Welcome to the Drill community! > > >> > > > >> > Unfortunately the mailing list does not allow attachments, please > send > > >> > along the error log copied into a mail message. > > >> > > > >> > If you are working with the 0.7 version of Drill, I would recommend > > >> > upgrading the the new 0.8 release that just came out, there were a > lot > > >> of > > >> > bug fixes and enhancements in the release. > > >> > > > >> > We're glad to hear you have been successful with your previous > efforts > > >> > with Drill. Unfortunately Drill is not well suited fro exploring > > >> datasets > > >> > like the one you have linked to. By default Drill supports records > of > > >> the > > >> > format accepted by Mongo DB for bulk import, where individual > records > > >> take > > >> > the form of a JSON object. > > >> > > > >> > Looking at this dataset, it follows a pattern we have seen before, > but > > >> > currently are not well suited for working with in Drill. All of the > > >> data is > > >> > in a single JSON object, at the top of the object are a number of > > >> > dataset-wide metadata fields. These are all nested under a field > > "view", > > >> > with the main data I am guessing you want to analyze nested under > the > > >> field > > >> > "data" in an array. While this format is not ideal for Drill, with > the > > >> size > > >> > of the dataset you might be able to get it working with an operator > in > > >> > Drill that could help make the data more accessible. > > >> > > > >> > The operator is called flatten, and is designed to take an array and > > >> > produce individual records for each element in the array. Optionally > > >> other > > >> > fields from the record can be included alongside each of the newly > > >> spawned > > >> > records to maintain a relationship between the incoming fields in > the > > >> > output of flatten. > > >> > > > >> > For more info on flatten, see this page in the wiki: > > >> > https://cwiki.apache.org/confluence/display/DRILL/FLATTEN+Function > > >> > > > >> > For this dataset, you might be able to get access to the data simply > > by > > >> > running the following: > > >> > > > >> > select flatten(data) from dfs.`/path/to/file.json`; > > >> > > > >> > If you need to have access to some of the other fields from the top > of > > >> the > > >> > dataset, you can include them alongside flatten and they will be > > copied > > >> > into each record produced by the flatten operation: > > >> > > > >> > select flatten(data), view.id, view.category from > > >> > dfs.`/path/to/file.json`; > > >> > > > >> > > > >> > > > >> > On Wed, Apr 1, 2015 at 10:52 PM, Muthu Pandi <muthu1...@gmail.com> > > >> wrote: > > >> > > > >> >> Hi All > > >> >> > > >> >> > > >> >> Am new to the JSON format and exploring the same. I had > > used > > >> >> Drill to analyse simple JSON files which work like a charm, but am > > not > > >> able > > >> >> to load the this " > > >> >> > > >> > > > https://opendata.socrata.com/api/views/n2rk-fwkj/rows.json?accessType=DOWNLOAD > > >> " > > >> >> JSON file for analysis. > > >> >> > > >> >> Am using ODBC connector to connect to the 0.8 Drill. Kindly find > the > > >> >> attachment for the error. > > >> >> > > >> >> > > >> >> > > >> >> *RegardsMuthupandi.K* > > >> >> > > >> >> Think before you print. > > >> >> > > >> >> > > >> >> > > >> > > > >> > > > > > > > > >