OK, I'll go back to my original question ( although this time I know what
tools I'm using).
I am using Pig + ElephantBird.
I have JSON documents with the following structure:
{
g : some-group-identifier,
sg: some-subgroup-identifier,
j : some-job-identifier,
page :
Ok, I found that elephant-bird JsonLoader cannot handle JSON documents that
are pretty-printed. (expanding over multiple-lines) The entire json
document has to be on a single line.
After I reformated some of the source files, now I am getting the expected
output.
On Wed, Jun 19, 2013 at 2:47
Hi Mike,
Yes, I also have thought about HBase or Cassandra but my data is pretty
much a snapshot, it does not require updates. Most of my aggregations will
also need to be computed once and won't change over time with the exception
of some aggregation that is based on the last N days of data.
Hello,
I'm new to Hadoop.
I have a large quantity of JSON documents with a structure similar to
what is shown below.
{
g : some-group-identifier,
sg: some-subgroup-identifier,
j : some-job-identifier,
page : 23,
... // other fields omitted