UDF's are instantiated at job construction time a couple of times in order to inspect various properties about them. This is subideal, but alas. I generally lazily initialize in exec, as that is only called on the mapper/reducer. The lifecycle of UDF's can be a bit confusing in this way.
2012/7/3 Yang <teddyyyy...@gmail.com> > normally job tracker and task tracker is on different nodes. > > when I submit a pig script using UDF. I think the UDF constructor is first > run (several times, don't know why) > on the job tracker, and then it's run on each of the task trackers. > > now I want to do some custom work inside the constructor, such as checking > the existence of certain files > which are specific to only task trackers. such work only needs to be done > on task trackers. > So , is there a way to figure out whether the UDF is being run on task > tracker or job tracker? > > Thanks! > yang >