Thanks Stephen, Let me explore options. I will let you all know once I am successful.
regards Sunita On Wed, Jun 19, 2013 at 3:08 PM, Stephen Sprague <sprag...@gmail.com> wrote: > try_parsed_json is not trivial imho :) > > start with the very, very basic, for example, { "jobs" : "foo" }. Get > that to work first. :) When that works add a level of nesting and see > what happens. Keep building on it until you either break it (and then you > know that last thing you added broke it and can concentrate on that) or > you'll have worked out all the bugs and your final example will work. > Nothing fancy here except old school trial and error. > > An alternative I keep bringing up when native semantics don't go one's way > is the transform() function. use python, perl, ruby or whatever to parse > the json and go nuts with the rich features of said language. just write > your output to stdout as a delimited serialization of what you want to > store and that's it. That would be another way to get your scalars, arrays > and structs to work. > > Don't give up yet though on the JsonSerde! :) Its probably something very > easy that we just can't see. > > > > On Wed, Jun 19, 2013 at 10:00 AM, Sunita Arvind <sunitarv...@gmail.com>wrote: > >> Thanks for looking into it Ramki. >> Yes I had tried these options. Here is what I get (renamed the table to >> have a meaningful name): >> >> hive> select jobs.values[1].id from linkedinjobsearch; >> ......mapreduce task details.... >> OK >> NULL >> Time taken: 9.586 seconds >> >> >> hive> select jobs.values[0].position.title from linkedinjobsearch; >> Total MapReduce jobs = 1 >> Launching Job 1 out of 1 >> >> OK >> NULL >> Time taken: 9.617 seconds >> >> >> I am trying to connect btrace to the process to be able to trace the code >> but cant get it to respond. Here is what I tried: >> >> [sunita@node01 ~]$ hive --debug, recursive=y, port=7000,mainSuspend=y, >> childSuspend=y >> ERROR: Cannot load this JVM TI agent twice, check your java command line >> for duplicate jdwp options. >> Error occurred during initialization of VM >> agent library failed to init: jdwp >> >> Tried changing the port also. Any idea regarding the debuggers that can >> be used. I also tried explain query and that does not show any issues >> either. >> >> regards >> Sunita >> >> >> >> >> >> >> >> On Wed, Jun 19, 2013 at 12:11 PM, Ramki Palle <ramki.pa...@gmail.com>wrote: >> >>> Can you run some other queries from job1 table and see if any query >>> returns some data? >>> >>> I am guessing your query "select jobs.values.position.title from >>> jobs1;" may have some issue. May be it should be as >>> >>> select jobs.values[0].position.title from jobs1; >>> >>> >>> Regards, >>> Ramki. >>> >>> >>> On Wed, Jun 19, 2013 at 8:24 AM, Sunita Arvind <sunitarv...@gmail.com>wrote: >>> >>>> Thanks Stephen, >>>> >>>> That's just what I tried with the try_parsed table. It is exactly same >>>> data with lesser nesting in the structure and lesser number of entries. >>>> Do you mean to say that highly nested jsons can lead to issues? What >>>> are typical solution to such issues? Write UDFs in hive or parse the JSON >>>> into a delimited file? >>>> I have heard of custom serdes also. Not sure if UDFs and custom serdes >>>> are one and the same. >>>> >>>> regards >>>> Sunita >>>> >>>> >>>> On Wed, Jun 19, 2013 at 10:38 AM, Stephen Sprague >>>> <sprag...@gmail.com>wrote: >>>> >>>>> I think you might have to start small here instead of going for the >>>>> home run on the first swing. when all else fails start with a trivial >>>>> json >>>>> object and then build up from there and see what additional step breaks >>>>> it. that way you know if the trivial example fails is something >>>>> fundamental and not the complexity of your json object that's throwing >>>>> things off. >>>>> >>>>> >>>>> On Wed, Jun 19, 2013 at 4:34 AM, Sunita Arvind >>>>> <sunitarv...@gmail.com>wrote: >>>>> >>>>>> Thanks for sharing your experience Richa. >>>>>> I do have timestamps but in the format of year : INT, day : INT, >>>>>> month : INT. >>>>>> As per your suggestion, I changed them all to string, but still get >>>>>> null as the output. >>>>>> >>>>>> regards >>>>>> Sunita >>>>>> >>>>>> >>>>>> On Wed, Jun 19, 2013 at 2:17 AM, Richa Sharma < >>>>>> mailtorichasha...@gmail.com> wrote: >>>>>> >>>>>>> Do you have any timestamp fields in the table that might contain >>>>>>> null value ? >>>>>>> >>>>>>> I faced a similar situation sometime back - changing the data type >>>>>>> to string made it work. >>>>>>> >>>>>>> But I was working on delimited text files. >>>>>>> Not sure if it applies to JSON .. but its still worth giving a try !! >>>>>>> >>>>>>> Richa >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Wed, Jun 19, 2013 at 7:28 AM, Sunita Arvind < >>>>>>> sunitarv...@gmail.com> wrote: >>>>>>> >>>>>>>> Having the a column name same as the table name, is a problem due >>>>>>>> to which I was not able to reference jobs.values.id from jobs. >>>>>>>> Changing the table name to jobs1 resolved the semantic error. >>>>>>>> However, the query still returns null >>>>>>>> >>>>>>>> hive> select jobs.values.position.title from jobs1; >>>>>>>> Total MapReduce jobs = 1 >>>>>>>> Launching Job 1 out of 1 >>>>>>>> Number of reduce tasks is set to 0 since there's no reduce operator >>>>>>>> Starting Job = job_201306080116_0036, Tracking URL = >>>>>>>> http://node01.expressanalytics.net:50030/jobdetails.jsp?jobid=job_201306080116_0036 >>>>>>>> Kill Command = /usr/lib/hadoop/bin/hadoop job -kill >>>>>>>> job_201306080116_0036 >>>>>>>> Hadoop job information for Stage-1: number of mappers: 1; number of >>>>>>>> reducers: 0 >>>>>>>> 2013-06-18 18:55:52,381 Stage-1 map = 0%, reduce = 0% >>>>>>>> 2013-06-18 18:55:56,394 Stage-1 map = 100%, reduce = 0%, >>>>>>>> Cumulative CPU 0.88 sec >>>>>>>> 2013-06-18 18:55:57,400 Stage-1 map = 100%, reduce = 0%, >>>>>>>> Cumulative CPU 0.88 sec >>>>>>>> 2013-06-18 18:55:58,407 Stage-1 map = 100%, reduce = 100%, >>>>>>>> Cumulative CPU 0.88 sec >>>>>>>> MapReduce Total cumulative CPU time: 880 msec >>>>>>>> Ended Job = job_201306080116_0036 >>>>>>>> MapReduce Jobs Launched: >>>>>>>> Job 0: Map: 1 Cumulative CPU: 0.88 sec HDFS Read: 35374 HDFS >>>>>>>> Write: 3 SUCCESS >>>>>>>> Total MapReduce CPU Time Spent: 880 msec >>>>>>>> OK >>>>>>>> null >>>>>>>> Time taken: 9.591 seconds >>>>>>>> >>>>>>>> >>>>>>>> regards >>>>>>>> Sunita >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Jun 18, 2013 at 9:35 PM, Sunita Arvind < >>>>>>>> sunitarv...@gmail.com> wrote: >>>>>>>> >>>>>>>>> Ok. >>>>>>>>> The data files are quite small. Around 35 KB and 1 KB each. >>>>>>>>> >>>>>>>>> [sunita@node01 tables]$ hadoop fs -ls /user/sunita/tables/jobs >>>>>>>>> Found 1 items >>>>>>>>> -rw-r--r-- 3 sunita hdfs 35172 2013-06-18 18:31 >>>>>>>>> /user/sunita/tables/jobs/jobs_noSite_parsed.json >>>>>>>>> >>>>>>>>> >>>>>>>>> [sunita@node01 tables]$ hadoop fs -text >>>>>>>>> /user/sunita/tables/jobs/jobs_noSite_parsed.json |more >>>>>>>>> {"jobs": {"_total": 1812, "_count": 20, "_start": 0, "values": >>>>>>>>> [{"company": {"i >>>>>>>>> d": 21836, "name": "CyberCoders"}, "postingDate": {"year": 2013, >>>>>>>>> "day": 10, "mo >>>>>>>>> nth": 6}, "descriptionSnippet": "Software Engineer-Hadoop, HDFS, >>>>>>>>> HBase, Pig- Ve >>>>>>>>> rtica Analytics Senior Hadoop Engineer - Skills Required - Hadoop, >>>>>>>>> HDFS, HBase, >>>>>>>>> Pig, SQL, Industrial Software Development, System Integration, >>>>>>>>> Java, high perf >>>>>>>>> ormance, multi-threading, VerticaWe are a well known consumer >>>>>>>>> product developme >>>>>>>>> nt company and we are looking to add a Hadoop Engineer to our >>>>>>>>> Engineering team. >>>>>>>>> You will be working with the latest ", "expirationDate": >>>>>>>>> {"year": 2013, "day" >>>>>>>>> >>>>>>>>> Its a single line, so used 'more' rather than 'head'. But >>>>>>>>> effectively, the file exists and has the data. >>>>>>>>> >>>>>>>>> regards >>>>>>>>> Sunita >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, Jun 18, 2013 at 8:38 PM, Stephen Sprague < >>>>>>>>> sprag...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> As Nitin alluded to its best to confirm the data is definitely in >>>>>>>>>> hdfs using hdfs semantics rather than hive for the first step. >>>>>>>>>> >>>>>>>>>> 1. how big is it? hadoop fs -ls <your hdfs dir> >>>>>>>>>> 2. cat a bit of it and see if anything is there. hadoop fs >>>>>>>>>> -text <your hdfs dir>/<filename> | head -10 >>>>>>>>>> >>>>>>>>>> do you see any data from step #2? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Tue, Jun 18, 2013 at 3:58 PM, Sunita Arvind < >>>>>>>>>> sunitarv...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> I ran some complex queries. Something to the extent of >>>>>>>>>>> select jobs from jobs; >>>>>>>>>>> which triggers map reduce jobs but does not show errors and >>>>>>>>>>> produces the same output "null". If I try referencing the struct >>>>>>>>>>> elements, >>>>>>>>>>> I get error which seems to be the root cause. >>>>>>>>>>> >>>>>>>>>>> Attached are the select statement outputs with the corresponding >>>>>>>>>>> hive logs. >>>>>>>>>>> >>>>>>>>>>> I have also attached my usage details of another table - >>>>>>>>>>> try_parsed which has a subset of the same data which seems to work >>>>>>>>>>> fine. >>>>>>>>>>> Also attached is the input file for this table - try_parsed.json >>>>>>>>>>> Thanks for your help >>>>>>>>>>> >>>>>>>>>>> Sunita >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Tue, Jun 18, 2013 at 4:35 PM, Nitin Pawar < >>>>>>>>>>> nitinpawar...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> can you run a little more complex query >>>>>>>>>>>> >>>>>>>>>>>> select uniq across columns or do some maths. so we know when it >>>>>>>>>>>> fires up a mapreduce >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Jun 19, 2013 at 1:59 AM, Sunita Arvind < >>>>>>>>>>>> sunitarv...@gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Thanks for responding Nitin. Yes I am sure that serde is >>>>>>>>>>>>> working fine and json file is being picked based on all the >>>>>>>>>>>>> errors that >>>>>>>>>>>>> showed up till this stage. What sort of error are you suspecting. >>>>>>>>>>>>> File not >>>>>>>>>>>>> present or serde not parsing it ? >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Tuesday, June 18, 2013, Nitin Pawar wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> select * from table is as good as hdfs -cat >>>>>>>>>>>>>> >>>>>>>>>>>>>> are you sure there is any data in the table? >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Tue, Jun 18, 2013 at 11:54 PM, Sunita Arvind < >>>>>>>>>>>>>> sunitarv...@gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I am able to parse the input JSON file and load it into >>>>>>>>>>>>>>> hive. I do not see any errors with create table, so I am >>>>>>>>>>>>>>> assuming that. But >>>>>>>>>>>>>>> when I try to read the data, I get null >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> hive> select * from jobs; >>>>>>>>>>>>>>> OK >>>>>>>>>>>>>>> null >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I have validated the JSON with JSONLint and Notepad++ JSON >>>>>>>>>>>>>>> plugin and it is a valid JSON. Here is my create table >>>>>>>>>>>>>>> statement and attached is the json input file. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> create external table jobs ( >>>>>>>>>>>>>>> jobs STRUCT< >>>>>>>>>>>>>>> values : ARRAY<STRUCT< >>>>>>>>>>>>>>> company : STRUCT< >>>>>>>>>>>>>>> id : STRING, >>>>>>>>>>>>>>> name : STRING>, >>>>>>>>>>>>>>> postingDate : STRUCT< >>>>>>>>>>>>>>> year : INT, >>>>>>>>>>>>>>> day : INT, >>>>>>>>>>>>>>> month : INT>, >>>>>>>>>>>>>>> descriptionSnippet : STRING, >>>>>>>>>>>>>>> expirationDate : STRUCT< >>>>>>>>>>>>>>> year : INT, >>>>>>>>>>>>>>> day : INT, >>>>>>>>>>>>>>> month : INT>, >>>>>>>>>>>>>>> position : STRUCT< >>>>>>>>>>>>>>> title : STRING, >>>>>>>>>>>>>>> jobFunctions : ARRAY<STRUCT< >>>>>>>>>>>>>>> code : STRING, >>>>>>>>>>>>>>> name : STRING>>, >>>>>>>>>>>>>>> industries : ARRAY<STRUCT< >>>>>>>>>>>>>>> code : STRING, >>>>>>>>>>>>>>> id : STRING, >>>>>>>>>>>>>>> name : STRING>>, >>>>>>>>>>>>>>> jobType : STRUCT< >>>>>>>>>>>>>>> code : STRING, >>>>>>>>>>>>>>> name : STRING>, >>>>>>>>>>>>>>> experienceLevel : STRUCT< >>>>>>>>>>>>>>> code : STRING, >>>>>>>>>>>>>>> name : STRING>>, >>>>>>>>>>>>>>> id : STRING, >>>>>>>>>>>>>>> customerJobCode : STRING, >>>>>>>>>>>>>>> skillsAndExperience : STRING, >>>>>>>>>>>>>>> salary : STRING, >>>>>>>>>>>>>>> jobPoster : STRUCT< >>>>>>>>>>>>>>> id : STRING, >>>>>>>>>>>>>>> firstName : STRING, >>>>>>>>>>>>>>> lastName : STRING, >>>>>>>>>>>>>>> headline : STRING>, >>>>>>>>>>>>>>> referralBonus : STRING, >>>>>>>>>>>>>>> locationDescription : STRING>>> >>>>>>>>>>>>>>> ) >>>>>>>>>>>>>>> ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe' >>>>>>>>>>>>>>> LOCATION '/user/sunita/tables/jobs'; >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The table creation works fine, but when I attempt to query, >>>>>>>>>>>>>>> I get null as the result. >>>>>>>>>>>>>>> I tried adding Input/Output formats, Serde Properties, >>>>>>>>>>>>>>> nothing seems to impact. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I am of the opinion that the libraries cannot handle this >>>>>>>>>>>>>>> level of nesting and I probably will have to write a custom >>>>>>>>>>>>>>> serde or a >>>>>>>>>>>>>>> parser myself. Just wanted to seek guidance before I get into >>>>>>>>>>>>>>> that. >>>>>>>>>>>>>>> Appreciate your help and guidance. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> regards >>>>>>>>>>>>>>> Sunita >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> Nitin Pawar >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Nitin Pawar >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >