You solve the "Needed to be in state INIT or IN_VARCHAR but in mode IN_BIGINT" by using all_text_mode to resolve the schema differences, as described in http://apache.github.io/drill/docs/json-data-model/handling-type-differences. On my jdbc connection, for example:
> select * from dfs.`/Users/opendata.json` limit 1; > > >> Query failed: Query stopped., Needed to be in state INIT or IN_VARCHAR >> but in mode IN_BIGINT [ da707fe9-e62c-4e9b-a62a-49b7cab37dfd on >> 10.0.0.6:31010 ] > > >> . . . > > >> 0: jdbc:drill:zk=local> ALTER SYSTEM SET `store.json.all_text_mode` = >> true; > > >> +------------+------------+ > > | ok | summary | > > >> +------------+------------+ > > | true | store.json.all_text_mode updated. | > > +------------+------------+ > > >> 1 row selected (0.047 seconds) > > >> 0: jdbc:drill:zk=local> select * from dfs.`/Users/opendata.json` limit 1; > > >> +------------+------------+ > > | meta | data | > > +------------+------------+ > > | {"view":{"id":"n2rk-fwkj","name":"Unclaimed bank >> accounts","averageRating":"0","category":"Government"," > > . . . > > Now, how exactly to flatten that big array is another question, answer TBD. Kristine Hahn Sr. Technical Writer 415-497-8107 @krishahn On Fri, Apr 3, 2015 at 5:41 AM, Muthu Pandi <muthu1...@gmail.com> wrote: > Tried with the Flatten but the result is same , Kindly help with pointers > > "ERROR [HY000] [MapR][Drill] (1040) Drill failed to execute the query: > SELECT * FROM `HDFS`.`root`.`./user/hadoop2/unclaimedaccount.json` LIMIT > 100 > [30024]Query execution error. Details:[ > Query stopped., Needed to be in state INIT or IN_VARCHAR but in mode > IN_BIGINT [ 7185da78-7759-4a8d-aebb-005f067a12e7 on nn01:31010 ] > > ] " > > > > *RegardsMuthupandi.K* > > Think before you print. > > > > On Fri, Apr 3, 2015 at 10:12 AM, Muthu Pandi <muthu1...@gmail.com> wrote: > > > Thankyou Jason for ur detailed answer. > > > > Will try to use the Flatten on data column and let u know the status. > > > > Error message got from ODBC is > > > > "ERROR [HY000] [MapR][Drill] (1040) Drill failed to execute the query: > > SELECT * FROM `HDFS`.`root`.`./user/hadoop2/unclaimedaccount.json` LIMIT > 100 > > [30024]Query execution error. Details:[ > > Query stopped., Needed to be in state INIT or IN_VARCHAR but in mode > > IN_BIGINT [ 7185da78-7759-4a8d-aebb-005f067a12e7 on nn01:31010 ] > > > > ] " > > > > Is there any way to normalise or convert this nested data to simpler JSON > > so that i can play with DRILL? > > > > > > > > *RegardsMuthupandi.K* > > > > Think before you print. > > > > > > > > On Thu, Apr 2, 2015 at 9:23 PM, Jason Altekruse < > altekruseja...@gmail.com> > > wrote: > > > >> To answer Andries' question, with an enhancement in the 0.8 release, > there > >> should be no hard limit on the size of Drill records supported. That > being > >> said, Drill is not fundamentally set up for processing enormous rows, so > >> we > >> do not have a clear idea of the performance impact of working with such > >> datasets. > >> > >> This document is going to be read as a single record originally, and I > >> think the 0.8 release should be able to read it in. From there, flatten > >> should be able to produce individual records suitable for further > >> analysis, > >> these records will be be a more reasonable size and get you good > >> performance for further analysis. > >> > >> -Jason > >> > >> On Thu, Apr 2, 2015 at 8:49 AM, Jason Altekruse < > altekruseja...@gmail.com > >> > > >> wrote: > >> > >> > Hi Muthu, > >> > > >> > Welcome to the Drill community! > >> > > >> > Unfortunately the mailing list does not allow attachments, please send > >> > along the error log copied into a mail message. > >> > > >> > If you are working with the 0.7 version of Drill, I would recommend > >> > upgrading the the new 0.8 release that just came out, there were a lot > >> of > >> > bug fixes and enhancements in the release. > >> > > >> > We're glad to hear you have been successful with your previous efforts > >> > with Drill. Unfortunately Drill is not well suited fro exploring > >> datasets > >> > like the one you have linked to. By default Drill supports records of > >> the > >> > format accepted by Mongo DB for bulk import, where individual records > >> take > >> > the form of a JSON object. > >> > > >> > Looking at this dataset, it follows a pattern we have seen before, but > >> > currently are not well suited for working with in Drill. All of the > >> data is > >> > in a single JSON object, at the top of the object are a number of > >> > dataset-wide metadata fields. These are all nested under a field > "view", > >> > with the main data I am guessing you want to analyze nested under the > >> field > >> > "data" in an array. While this format is not ideal for Drill, with the > >> size > >> > of the dataset you might be able to get it working with an operator in > >> > Drill that could help make the data more accessible. > >> > > >> > The operator is called flatten, and is designed to take an array and > >> > produce individual records for each element in the array. Optionally > >> other > >> > fields from the record can be included alongside each of the newly > >> spawned > >> > records to maintain a relationship between the incoming fields in the > >> > output of flatten. > >> > > >> > For more info on flatten, see this page in the wiki: > >> > https://cwiki.apache.org/confluence/display/DRILL/FLATTEN+Function > >> > > >> > For this dataset, you might be able to get access to the data simply > by > >> > running the following: > >> > > >> > select flatten(data) from dfs.`/path/to/file.json`; > >> > > >> > If you need to have access to some of the other fields from the top of > >> the > >> > dataset, you can include them alongside flatten and they will be > copied > >> > into each record produced by the flatten operation: > >> > > >> > select flatten(data), view.id, view.category from > >> > dfs.`/path/to/file.json`; > >> > > >> > > >> > > >> > On Wed, Apr 1, 2015 at 10:52 PM, Muthu Pandi <muthu1...@gmail.com> > >> wrote: > >> > > >> >> Hi All > >> >> > >> >> > >> >> Am new to the JSON format and exploring the same. I had > used > >> >> Drill to analyse simple JSON files which work like a charm, but am > not > >> able > >> >> to load the this " > >> >> > >> > https://opendata.socrata.com/api/views/n2rk-fwkj/rows.json?accessType=DOWNLOAD > >> " > >> >> JSON file for analysis. > >> >> > >> >> Am using ODBC connector to connect to the 0.8 Drill. Kindly find the > >> >> attachment for the error. > >> >> > >> >> > >> >> > >> >> *RegardsMuthupandi.K* > >> >> > >> >> Think before you print. > >> >> > >> >> > >> >> > >> > > >> > > > > >