Hi,
I’m new to pig and hadoop and I would like someone to explain some behavior to me. When I use run htop (just a fancy version of top that lets me see which threads/processes spawned which other ones) while a pig job is running, I get the following tree (use a wide screen to view this): 1279 root 20 0 66604 520 420 S 0.0 0.0 0:00.47 ├─ /usr/sbin/sshd 24438 root 20 0 100M 4224 3260 S 0.0 0.1 0:00.01 │ ├─ sshd: mschultz [priv] 24440 mschultz 20 0 100M 1796 800 S 0.0 0.0 0:00.26 │ │ └─ sshd: mschultz@pts/0 24441 mschultz 20 0 105M 2036 1540 S 0.0 0.1 0:00.37 │ │ └─ -bash 25143 mschultz 20 0 1820M 143M 19920 S 23.0 3.7 0:19.80 │ │ └─ /usr/java/default//bin/java -Xmx1024m -Djava.net.preferIPv4Stack=true -Xmx1000m -Dpig.log.dir=/usr/lib/pig/bin/../logs -Dpig.log.file=pig.log -Dpig.home.dir=/usr/lib/pig/bin/.. -Dhadoop.log.dir=/var/log/hadoop-mschultz -Dhadoop.lo 25206 mschultz 20 0 1820M 143M 19920 S 0.0 3.7 0:00.00 │ │ ├─ /usr/java/default//bin/java -Xmx1024m -Djava.net.preferIPv4Stack=true -Xmx1000m -Dpig.log.dir=/usr/lib/pig/bin/../logs -Dpig.log.file=pig.log -Dpig.home.dir=/usr/lib/pig/bin/.. -Dhadoop.log.dir=/var/log/hadoop-mschultz -Dhadoop 25205 mschultz 20 0 1820M 143M 19920 S 0.0 3.7 0:00.00 │ │ ├─ /usr/java/default//bin/java -Xmx1024m -Djava.net.preferIPv4Stack=true -Xmx1000m -Dpig.log.dir=/usr/lib/pig/bin/../logs -Dpig.log.file=pig.log -Dpig.home.dir=/usr/lib/pig/bin/.. -Dhadoop.log.dir=/var/log/hadoop-mschultz -Dhadoop 25196 mschultz 20 0 1820M 143M 19920 S 0.0 3.7 0:00.00 │ │ ├─ /usr/java/default//bin/java -Xmx1024m -Djava.net.preferIPv4Stack=true -Xmx1000m -Dpig.log.dir=/usr/lib/pig/bin/../logs -Dpig.log.file=pig.log -Dpig.home.dir=/usr/lib/pig/bin/.. -Dhadoop.log.dir=/var/log/hadoop-mschultz -Dhadoop 25194 mschultz 20 0 1820M 143M 19920 S 0.0 3.7 0:00.00 │ │ ├─ /usr/java/default//bin/java -Xmx1024m -Djava.net.preferIPv4Stack=true -Xmx1000m -Dpig.log.dir=/usr/lib/pig/bin/../logs -Dpig.log.file=pig.log -Dpig.home.dir=/usr/lib/pig/bin/.. -Dhadoop.log.dir=/var/log/hadoop-mschultz -Dhadoop 25193 mschultz 20 0 1820M 143M 19920 S 0.0 3.7 0:00.94 │ │ ├─ /usr/java/default//bin/java -Xmx1024m -Djava.net.preferIPv4Stack=true -Xmx1000m -Dpig.log.dir=/usr/lib/pig/bin/../logs -Dpig.log.file=pig.log -Dpig.home.dir=/usr/lib/pig/bin/.. -Dhadoop.log.dir=/var/log/hadoop-mschultz -Dhadoop 25192 mschultz 20 0 1820M 143M 19920 S 0.0 3.7 0:00.03 │ │ ├─ /usr/java/default//bin/java -Xmx1024m -Djava.net.preferIPv4Stack=true -Xmx1000m -Dpig.log.dir=/usr/lib/pig/bin/../logs -Dpig.log.file=pig.log -Dpig.home.dir=/usr/lib/pig/bin/.. -Dhadoop.log.dir=/var/log/hadoop-mschultz -Dhadoop 25191 mschultz 20 0 1820M 143M 19920 S 0.0 3.7 0:00.09 │ │ ├─ /usr/java/default//bin/java -Xmx1024m -Djava.net.preferIPv4Stack=true -Xmx1000m -Dpig.log.dir=/usr/lib/pig/bin/../logs -Dpig.log.file=pig.log -Dpig.home.dir=/usr/lib/pig/bin/.. -Dhadoop.log.dir=/var/log/hadoop-mschultz -Dhadoop 25189 mschultz 20 0 1820M 143M 19920 S 0.0 3.7 0:00.00 │ │ ├─ /usr/java/default//bin/java -Xmx1024m -Djava.net.preferIPv4Stack=true -Xmx1000m -Dpig.log.dir=/usr/lib/pig/bin/../logs -Dpig.log.file=pig.log -Dpig.home.dir=/usr/lib/pig/bin/.. -Dhadoop.log.dir=/var/log/hadoop-mschultz -Dhadoop 25187 mschultz 20 0 1820M 143M 19920 S 0.0 3.7 0:00.00 │ │ ├─ /usr/java/default//bin/java -Xmx1024m -Djava.net.preferIPv4Stack=true -Xmx1000m -Dpig.log.dir=/usr/lib/pig/bin/../logs -Dpig.log.file=pig.log -Dpig.home.dir=/usr/lib/pig/bin/.. -Dhadoop.log.dir=/var/log/hadoop-mschultz -Dhadoop 25186 mschultz 20 0 1820M 143M 19920 S 0.0 3.7 0:00.00 │ │ ├─ /usr/java/default//bin/java -Xmx1024m -Djava.net.preferIPv4Stack=true -Xmx1000m -Dpig.log.dir=/usr/lib/pig/bin/../logs -Dpig.log.file=pig.log -Dpig.home.dir=/usr/lib/pig/bin/.. -Dhadoop.log.dir=/var/log/hadoop-mschultz -Dhadoop 25185 mschultz 20 0 1820M 143M 19920 S 1.0 3.7 0:01.62 │ │ ├─ /usr/java/default//bin/java -Xmx1024m -Djava.net.preferIPv4Stack=true -Xmx1000m -Dpig.log.dir=/usr/lib/pig/bin/../logs -Dpig.log.file=pig.log -Dpig.home.dir=/usr/lib/pig/bin/.. -Dhadoop.log.dir=/var/log/hadoop-mschultz -Dhadoop 25184 mschultz 20 0 1820M 143M 19920 S 17.0 3.7 0:07.29 │ │ ├─ /usr/java/default//bin/java -Xmx1024m -Djava.net.preferIPv4Stack=true -Xmx1000m -Dpig.log.dir=/usr/lib/pig/bin/../logs -Dpig.log.file=pig.log -Dpig.home.dir=/usr/lib/pig/bin/.. -Dhadoop.log.dir=/var/log/hadoop-mschultz -Dhadoop 25183 mschultz 20 0 1820M 143M 19920 S 0.0 3.7 0:00.00 │ │ ├─ /usr/java/default//bin/java -Xmx1024m -Djava.net.preferIPv4Stack=true -Xmx1000m -Dpig.log.dir=/usr/lib/pig/bin/../logs -Dpig.log.file=pig.log -Dpig.home.dir=/usr/lib/pig/bin/.. -Dhadoop.log.dir=/var/log/hadoop-mschultz -Dhadoop 25182 mschultz 20 0 1820M 143M 19920 S 0.0 3.7 0:00.01 │ │ ├─ /usr/java/default//bin/java -Xmx1024m -Djava.net.preferIPv4Stack=true -Xmx1000m -Dpig.log.dir=/usr/lib/pig/bin/../logs -Dpig.log.file=pig.log -Dpig.home.dir=/usr/lib/pig/bin/.. -Dhadoop.log.dir=/var/log/hadoop-mschultz -Dhadoop 25181 mschultz 20 0 1820M 143M 19920 S 0.0 3.7 0:00.03 │ │ ├─ /usr/java/default//bin/java -Xmx1024m -Djava.net.preferIPv4Stack=true -Xmx1000m -Dpig.log.dir=/usr/lib/pig/bin/../logs -Dpig.log.file=pig.log -Dpig.home.dir=/usr/lib/pig/bin/.. -Dhadoop.log.dir=/var/log/hadoop-mschultz -Dhadoop 25180 mschultz 20 0 1820M 143M 19920 S 0.0 3.7 0:00.16 │ │ ├─ /usr/java/default//bin/java -Xmx1024m -Djava.net.preferIPv4Stack=true -Xmx1000m -Dpig.log.dir=/usr/lib/pig/bin/../logs -Dpig.log.file=pig.log -Dpig.home.dir=/usr/lib/pig/bin/.. -Dhadoop.log.dir=/var/log/hadoop-mschultz -Dhadoop 25179 mschultz 20 0 1820M 143M 19920 S 0.0 3.7 0:00.17 │ │ ├─ /usr/java/default//bin/java -Xmx1024m -Djava.net.preferIPv4Stack=true -Xmx1000m -Dpig.log.dir=/usr/lib/pig/bin/../logs -Dpig.log.file=pig.log -Dpig.home.dir=/usr/lib/pig/bin/.. -Dhadoop.log.dir=/var/log/hadoop-mschultz -Dhadoop 25178 mschultz 20 0 1820M 143M 19920 S 0.0 3.7 0:00.18 │ │ ├─ /usr/java/default//bin/java -Xmx1024m -Djava.net.preferIPv4Stack=true -Xmx1000m -Dpig.log.dir=/usr/lib/pig/bin/../logs -Dpig.log.file=pig.log -Dpig.home.dir=/usr/lib/pig/bin/.. -Dhadoop.log.dir=/var/log/hadoop-mschultz -Dhadoop The pig script that I ran was pretty simple. Just a LOAD, SAMPLE, and DUMP. No fancy udfs or series of map-reduce plans. So why does a single pig call end up generating so many java threads/processes? What are they for? And how can I limit the number of them per pig command? Regards, Michael Schultz [Description: Macintosh HD:Users:mschultz:Nominum:work:microsoft-files:corporate-docs:corporate_logo_for_web.png] Data Engineer (650) 381 - 6062 http://www.nominum.com/