You would have to write a udf that takes the bag and calculates what you want. I'd use the accumuator interface. A bit annoying to have to learn at first, but worth it as it will turn pig from useful to very powerful.
Your implementation would be quite similar to the accumulator implementation of max, except that the updating conditions would be trickier. One nice thing about the udf is that it could very easily handle a file that is "stock price. Date price." Sent via BlackBerry -----Original Message----- From: Todd Lee <[email protected]> Date: Sat, 15 Jan 2011 01:52:56 To: <[email protected]> Reply-To: [email protected] Subject: Loop through records row by row? Hi, Newbie here. So let's say I have a file which contains the closing market price of a stock in 2010. i.e. <Date>, <Price> =================== 2010-1-1, 10.1 2010-1-2, 10.2 2010-1-3, 9.9 2010-1-4, 10.0 2010-1-7, 11.0 ... and all I want is to find out the max number of consecutive days in which the stock has been in a UP trend. (for the above example, the result should be 3) It is fairly simple to solve in other programming languages using a for-loop and a couple of temp variables, but is this possible to do in Pig? the dataset is pretty big. None of the examples and tutorials I found online had this kind of data relationship between rows so I really could use your help. Thanks a lot, T
