I'm using a TRANSFORM mapper script to expand web logs and am wondering if there is a recommended way to capture side channel stats in an accurate manner. For e.g. if I wanted to count the number of 404 entries.
What I want to avoid is writing this data out to some side channel which, if a map attempt failed, would result in inflated counts. I found https://issues.apache.org/jira/browse/HIVE-939. Would that be the best way? Is anyone using some other mechanism for this purpose? Thanks in advance for any suggestions. yun