Hello Sanita,

If you use a JSON try to add the jar  'hive-json-serde.jar' before you
upload your data in the final table. And also try to make your date
attributes in String format first to debug (if this is the cause).

I don't know if you are using an external table with regular expressions
(regexp) to pasre your data?; if this is, can you send us the definition of
table and the structure of a row from your data.
the final way that I can suggest is to run an operation mapreduce over the
table (select count (1) from your_table) and then see the log of jobtracker
to debug the issue.

hope this can help you ;)




2013/7/30 Sunita Arvind <sunitarv...@gmail.com>

> Hi,
>
> I have written a script which generates JSON files, loads it into a
> dictionary, adds a few attributes and uploads the modified files to HDFS.
> After the files are generated, if I perform a select * from..; on the table
> which points to this location, I get "null, null...." as the result. I also
> tried without the added attributes and it did not make a difference. I
> strongly suspect the data.
> Currently I am using strip() to eliminate trailing and leading whitespaces
> and newlines. Wondering if embedded "\n" that is, json string objects
> containing "\n" in the value, causes such issues.
> There are no parsing errors, so I am not able to debug this issue. Are
> there any flags that I can set to figure out what is happening within the
> parser code?
>
> I set this:
> hive -hiveconf hive.root.logger=DEBUG,console
>
> But the output is not really useful:
>
> blocks=[LocatedBlock{BP-330966259-192.168.1.61-1351349834344:blk_-6076570611719758877_116734;
> getBlockSize()=20635; corrupt=false; offset=0; locs=[192.168.1.61:50010,
> 192.168.1.66:50010, 192.168.1.63:50010]}]
>
> lastLocatedBlock=LocatedBlock{BP-330966259-192.168.1.61-1351349834344:blk_-6076570611719758877_116734;
> getBlockSize()=20635; corrupt=false; offset=0; locs=[192.168.1.61:50010,
> 192.168.1.66:50010, 192.168.1.63:50010]}
>   isLastBlockComplete=true}
> 13/07/30 11:49:41 DEBUG hdfs.DFSClient: Connecting to datanode
> 192.168.1.61:50010
> null
> null
> null
> null
> null
> null
> null
> null
> null
> null
> null
> null
> null
> null
> null
> null
> 13/07/30 11:49:41 INFO exec.
>
> Also, the attributes I am adding are current year, month day and time. So
> they are not null for any record. I even moved existing files which did not
> have these fields set so that there are no records with these fields as
> null. However, I dont think this is an issue as the advantage of JSON/Hive
> JSON serde is that it allows object struct to be dynamic. Right?
>
> Any suggestion regarding debugging would be very helpful.
>
> thanks
> Sunita
>

Reply via email to