Re: Pig and MongoDB

2011-12-31 Thread Russell Jurney
Submitted a pull request: https://github.com/mongodb/mongo-hadoop/pull/29 On Sat, Dec 31, 2011 at 7:50 PM, Russell Jurney wrote: > I fixed MongoStorage.java to work with bags of tuples. > https://gist.github.com/1546174 > > I'll do a pull request on github tomorrow, and try to get it in the next

Re: Pig and MongoDB

2011-12-31 Thread Russell Jurney
I fixed MongoStorage.java to work with bags of tuples. https://gist.github.com/1546174 I'll do a pull request on github tomorrow, and try to get it in the next release. In the meantime, replacing MongoStorage.java with that gist and rebuilding should work. High five! Russ On Thu, Dec 29, 2011 a

Re: Partition keys in LoadMetadata is broken in 0.10?

2011-12-31 Thread Stan Rosenberg
Just to be clear, the concrete syntax had a typo; should have been: A = load 'daily_activity' USING HiveLoader WHERE date_partition >= 20110101 and date_partition <= 20110201; On Sat, Dec 31, 2011 at 10:34 PM, Stan Rosenberg wrote: > > A = load 'daily_activity' from HiveLoader where date_partiti

Re: Partition keys in LoadMetadata is broken in 0.10?

2011-12-31 Thread Stan Rosenberg
Hi Daniel, Thanks for pointing out PIG-2346. However, what happens if the user decides to rename some of the fields using the 'as' statement; we have the same problem, i.e., 'foreach' is generated. As a heuristic, perhaps synthesized operators should be marked as such. This way, pig can skip sy

Re: Partition keys in LoadMetadata is broken in 0.10?

2011-12-31 Thread Daniel Dai
Hi, Stan, Foreach is inserted only if you have "as" in "load" statement. This is to assure the data loaded conforms with "as" clause. At some point there is a bug in implementation, this should be fixed in PIG-2346 and will be included in all subsequent releases. Thanks, Daniel On Fri, Dec 30, 20

Re:Re: trunk is 3 times slower then 0.9?

2011-12-31 Thread Yang Ling
Thanks for reply. I spent yesterday and find out my 40 minutes is spent on JsonMetadta.findMetaFile. It seems this is new for trunk. In my setting, I have several thousand file/folders in my input, findMetaFile read it one by one and it takes a long time. I also see there is an option in PigSto