Hi, I'm trying this use case: do a simple select from an existing table and pass the results through a reduce script to do some analysis. The table has web logs so the select uses a pseudo user ID as the key and the rest of the data as values. My expectation is that a single reduce script should receive all logs for a given user so that I can do some path based analysis. Are there any issues with this idea so far?
When I try it though, hive is not doing what I'd expect. The particular query is not generating any reduce tasks at all. Here's a sample query: FROM( SELECT userid, time, url FROM weblogs ) weblogs reduce weblogs.userid, weblogs.time, weblogs.url using 'counter.pl' as user, count; Thanks, Vijay