Thankyou Kristin and Jason for the pointers. Am able to Flatten the data and begin to query.
One thing is that after setting this ALTER SYSTEM SET `store.json.all_text_mode` =true; i got error as this "ERROR [HY000] [MapR][Drill] (1040) Drill failed to execute the query: SELECT * FROM `HDFS`.`default`.`./user/hadoop2/unclaimedaccount.json` LIMIT 100 [30024]Query execution error. Details:[ Query stopped., Invalid UTF-8 start byte 0x96 at [Source: org.apache.drill.exec.vector.complex.fn.JsonReader@3e1a1569; line: 5221, column: 120] [ 6eaaccba-6992-43ec-8e4a-016d8e3c486f on nn01:31010 ] ]" When i dig in to that data i found UTF-8 character which is not recognized after removing the below line am able to use the data. [ 4527, "5F67D9EC-51E3-46D6-8384-3107170CE049", 4527, 1279730940, "388535", 1279730940, "388535", null,* "WILLINGDON � 1980 HOMECOMING * ", null, "217.83", "C O K EWONIAK 12527 52A AVE ", "EDMONTON ", "1983-03-01T12:00:00", "CANADIAN IMPERIAL BANK OF COMMERCE" ] Is there anything which i can do to include this data as well. *RegardsMuthupandi.K* Think before you print. On Fri, Apr 3, 2015 at 11:00 PM, Kristine Hahn <kh...@maprtech.com> wrote: > Here are some examples of queries on the actual data you are using: > > This query extracts some data from the "meta" map. > > select t.meta.`view`.`id` from > dfs.`/Users/khahn/Documents/test_files_source/opendata.json` t; > > > +------------+ > > > > | EXPR$0 | > > > > +------------+ > > > > | n2rk-fwkj | > > > > +------------+ > > > > 1 row selected (0.102 seconds) > > > > As shown here, you need to use a table alias when querying complex data > per the docs. The back ticks are used to escape a reserved word, such as > view. Maybe id is reserved too. > > This query extracts some data from the "data" array at the bottom of the > file: > > select data[2][1] from > dfs.`/Users/khahn/Documents/test_files_source/opendata.json`; > > > +------------+ > > > > | EXPR$0 | > > > > +------------+ > > > > | 8CDB805D-1C8D-434D-AC5B-2D8130F48841 | > > > > +------------+ > > > > > > > > > > Kristine Hahn > Sr. Technical Writer > 415-497-8107 @krishahn > > > On Fri, Apr 3, 2015 at 5:41 AM, Muthu Pandi <muthu1...@gmail.com> wrote: > > > Tried with the Flatten but the result is same , Kindly help with pointers > > > > "ERROR [HY000] [MapR][Drill] (1040) Drill failed to execute the query: > > SELECT * FROM `HDFS`.`root`.`./user/hadoop2/unclaimedaccount.json` LIMIT > > 100 > > [30024]Query execution error. Details:[ > > Query stopped., Needed to be in state INIT or IN_VARCHAR but in mode > > IN_BIGINT [ 7185da78-7759-4a8d-aebb-005f067a12e7 on nn01:31010 ] > > > > ] " > > > > > > > > *RegardsMuthupandi.K* > > > > Think before you print. > > > > > > > > On Fri, Apr 3, 2015 at 10:12 AM, Muthu Pandi <muthu1...@gmail.com> > wrote: > > > > > Thankyou Jason for ur detailed answer. > > > > > > Will try to use the Flatten on data column and let u know the status. > > > > > > Error message got from ODBC is > > > > > > "ERROR [HY000] [MapR][Drill] (1040) Drill failed to execute the query: > > > SELECT * FROM `HDFS`.`root`.`./user/hadoop2/unclaimedaccount.json` > LIMIT > > 100 > > > [30024]Query execution error. Details:[ > > > Query stopped., Needed to be in state INIT or IN_VARCHAR but in mode > > > IN_BIGINT [ 7185da78-7759-4a8d-aebb-005f067a12e7 on nn01:31010 ] > > > > > > ] " > > > > > > Is there any way to normalise or convert this nested data to simpler > JSON > > > so that i can play with DRILL? > > > > > > > > > > > > *RegardsMuthupandi.K* > > > > > > Think before you print. > > > > > > > > > > > > On Thu, Apr 2, 2015 at 9:23 PM, Jason Altekruse < > > altekruseja...@gmail.com> > > > wrote: > > > > > >> To answer Andries' question, with an enhancement in the 0.8 release, > > there > > >> should be no hard limit on the size of Drill records supported. That > > being > > >> said, Drill is not fundamentally set up for processing enormous rows, > so > > >> we > > >> do not have a clear idea of the performance impact of working with > such > > >> datasets. > > >> > > >> This document is going to be read as a single record originally, and I > > >> think the 0.8 release should be able to read it in. From there, > flatten > > >> should be able to produce individual records suitable for further > > >> analysis, > > >> these records will be be a more reasonable size and get you good > > >> performance for further analysis. > > >> > > >> -Jason > > >> > > >> On Thu, Apr 2, 2015 at 8:49 AM, Jason Altekruse < > > altekruseja...@gmail.com > > >> > > > >> wrote: > > >> > > >> > Hi Muthu, > > >> > > > >> > Welcome to the Drill community! > > >> > > > >> > Unfortunately the mailing list does not allow attachments, please > send > > >> > along the error log copied into a mail message. > > >> > > > >> > If you are working with the 0.7 version of Drill, I would recommend > > >> > upgrading the the new 0.8 release that just came out, there were a > lot > > >> of > > >> > bug fixes and enhancements in the release. > > >> > > > >> > We're glad to hear you have been successful with your previous > efforts > > >> > with Drill. Unfortunately Drill is not well suited fro exploring > > >> datasets > > >> > like the one you have linked to. By default Drill supports records > of > > >> the > > >> > format accepted by Mongo DB for bulk import, where individual > records > > >> take > > >> > the form of a JSON object. > > >> > > > >> > Looking at this dataset, it follows a pattern we have seen before, > but > > >> > currently are not well suited for working with in Drill. All of the > > >> data is > > >> > in a single JSON object, at the top of the object are a number of > > >> > dataset-wide metadata fields. These are all nested under a field > > "view", > > >> > with the main data I am guessing you want to analyze nested under > the > > >> field > > >> > "data" in an array. While this format is not ideal for Drill, with > the > > >> size > > >> > of the dataset you might be able to get it working with an operator > in > > >> > Drill that could help make the data more accessible. > > >> > > > >> > The operator is called flatten, and is designed to take an array and > > >> > produce individual records for each element in the array. Optionally > > >> other > > >> > fields from the record can be included alongside each of the newly > > >> spawned > > >> > records to maintain a relationship between the incoming fields in > the > > >> > output of flatten. > > >> > > > >> > For more info on flatten, see this page in the wiki: > > >> > https://cwiki.apache.org/confluence/display/DRILL/FLATTEN+Function > > >> > > > >> > For this dataset, you might be able to get access to the data simply > > by > > >> > running the following: > > >> > > > >> > select flatten(data) from dfs.`/path/to/file.json`; > > >> > > > >> > If you need to have access to some of the other fields from the top > of > > >> the > > >> > dataset, you can include them alongside flatten and they will be > > copied > > >> > into each record produced by the flatten operation: > > >> > > > >> > select flatten(data), view.id, view.category from > > >> > dfs.`/path/to/file.json`; > > >> > > > >> > > > >> > > > >> > On Wed, Apr 1, 2015 at 10:52 PM, Muthu Pandi <muthu1...@gmail.com> > > >> wrote: > > >> > > > >> >> Hi All > > >> >> > > >> >> > > >> >> Am new to the JSON format and exploring the same. I had > > used > > >> >> Drill to analyse simple JSON files which work like a charm, but am > > not > > >> able > > >> >> to load the this " > > >> >> > > >> > > > https://opendata.socrata.com/api/views/n2rk-fwkj/rows.json?accessType=DOWNLOAD > > >> " > > >> >> JSON file for analysis. > > >> >> > > >> >> Am using ODBC connector to connect to the 0.8 Drill. Kindly find > the > > >> >> attachment for the error. > > >> >> > > >> >> > > >> >> > > >> >> *RegardsMuthupandi.K* > > >> >> > > >> >> Think before you print. > > >> >> > > >> >> > > >> >> > > >> > > > >> > > > > > > > > >