Re: Aggregating data nested into JSON documents

2013-06-20 Thread Tecno Brain
OK, I'll go back to my original question ( although this time I know what tools I'm using). I am using Pig + ElephantBird. I have JSON documents with the following structure: { g : some-group-identifier, sg: some-subgroup-identifier, j : some-job-identifier, page :

Re: Aggregating data nested into JSON documents

2013-06-19 Thread Tecno Brain
Ok, I found that elephant-bird JsonLoader cannot handle JSON documents that are pretty-printed. (expanding over multiple-lines) The entire json document has to be on a single line. After I reformated some of the source files, now I am getting the expected output. On Wed, Jun 19, 2013 at 2:47

Re: Aggregating data nested into JSON documents

2013-06-13 Thread Tecno Brain
Hi Mike, Yes, I also have thought about HBase or Cassandra but my data is pretty much a snapshot, it does not require updates. Most of my aggregations will also need to be computed once and won't change over time with the exception of some aggregation that is based on the last N days of data.

Aggregating data nested into JSON documents

2013-06-12 Thread Tecno Brain
Hello, I'm new to Hadoop. I have a large quantity of JSON documents with a structure similar to what is shown below. { g : some-group-identifier, sg: some-subgroup-identifier, j : some-job-identifier, page : 23, ... // other fields omitted