Hello, I have some log files coming in, and they are named like my.log.1, my.log.2, etc.
When I run the pig script, I store the results like: STORE HITS INTO '/var/results/site.hits' USING PigStorage(); STORE UNQVISITS INTO '/var/results/site.visits' USING PigStorage(); which in turn makes a directory named site.hits and site.visits, with a file in them named part-r-00000. when i run my script the second time, (with different data loaded, like my.log.2) pig will give me an error saying directory site.visits, and site.hits already exists. What I need is a cumulative count of hits and unique visitors per item. so if the second file has hit to an item that has been previously counted in part-r-00000, it would require to reprocess the first log file. How can I do this counting business incrementally? Best Regards, C.B.
