Hi,

I’m new to pig and hadoop and I would like someone to explain some behavior to 
me.

When I use run htop (just a fancy version of top that lets me see which 
threads/processes spawned which other ones) while a pig job is running, I get 
the following tree (use a wide screen to view this):

 1279 root       20   0 66604   520   420 S  0.0  0.0  0:00.47 ├─ /usr/sbin/sshd
24438 root       20   0  100M  4224  3260 S  0.0  0.1  0:00.01 │  ├─ sshd: 
mschultz [priv]
24440 mschultz   20   0  100M  1796   800 S  0.0  0.0  0:00.26 │  │  └─ sshd: 
mschultz@pts/0
24441 mschultz   20   0  105M  2036  1540 S  0.0  0.1  0:00.37 │  │     └─ -bash
25143 mschultz   20   0 1820M  143M 19920 S 23.0  3.7  0:19.80 │  │        └─ 
/usr/java/default//bin/java -Xmx1024m -Djava.net.preferIPv4Stack=true -Xmx1000m 
-Dpig.log.dir=/usr/lib/pig/bin/../logs -Dpig.log.file=pig.log 
-Dpig.home.dir=/usr/lib/pig/bin/.. -Dhadoop.log.dir=/var/log/hadoop-mschultz 
-Dhadoop.lo
25206 mschultz   20   0 1820M  143M 19920 S  0.0  3.7  0:00.00 │  │           
├─ /usr/java/default//bin/java -Xmx1024m -Djava.net.preferIPv4Stack=true 
-Xmx1000m -Dpig.log.dir=/usr/lib/pig/bin/../logs -Dpig.log.file=pig.log 
-Dpig.home.dir=/usr/lib/pig/bin/.. -Dhadoop.log.dir=/var/log/hadoop-mschultz 
-Dhadoop
25205 mschultz   20   0 1820M  143M 19920 S  0.0  3.7  0:00.00 │  │           
├─ /usr/java/default//bin/java -Xmx1024m -Djava.net.preferIPv4Stack=true 
-Xmx1000m -Dpig.log.dir=/usr/lib/pig/bin/../logs -Dpig.log.file=pig.log 
-Dpig.home.dir=/usr/lib/pig/bin/.. -Dhadoop.log.dir=/var/log/hadoop-mschultz 
-Dhadoop
25196 mschultz   20   0 1820M  143M 19920 S  0.0  3.7  0:00.00 │  │           
├─ /usr/java/default//bin/java -Xmx1024m -Djava.net.preferIPv4Stack=true 
-Xmx1000m -Dpig.log.dir=/usr/lib/pig/bin/../logs -Dpig.log.file=pig.log 
-Dpig.home.dir=/usr/lib/pig/bin/.. -Dhadoop.log.dir=/var/log/hadoop-mschultz 
-Dhadoop
25194 mschultz   20   0 1820M  143M 19920 S  0.0  3.7  0:00.00 │  │           
├─ /usr/java/default//bin/java -Xmx1024m -Djava.net.preferIPv4Stack=true 
-Xmx1000m -Dpig.log.dir=/usr/lib/pig/bin/../logs -Dpig.log.file=pig.log 
-Dpig.home.dir=/usr/lib/pig/bin/.. -Dhadoop.log.dir=/var/log/hadoop-mschultz 
-Dhadoop
25193 mschultz   20   0 1820M  143M 19920 S  0.0  3.7  0:00.94 │  │           
├─ /usr/java/default//bin/java -Xmx1024m -Djava.net.preferIPv4Stack=true 
-Xmx1000m -Dpig.log.dir=/usr/lib/pig/bin/../logs -Dpig.log.file=pig.log 
-Dpig.home.dir=/usr/lib/pig/bin/.. -Dhadoop.log.dir=/var/log/hadoop-mschultz 
-Dhadoop
25192 mschultz   20   0 1820M  143M 19920 S  0.0  3.7  0:00.03 │  │           
├─ /usr/java/default//bin/java -Xmx1024m -Djava.net.preferIPv4Stack=true 
-Xmx1000m -Dpig.log.dir=/usr/lib/pig/bin/../logs -Dpig.log.file=pig.log 
-Dpig.home.dir=/usr/lib/pig/bin/.. -Dhadoop.log.dir=/var/log/hadoop-mschultz 
-Dhadoop
25191 mschultz   20   0 1820M  143M 19920 S  0.0  3.7  0:00.09 │  │           
├─ /usr/java/default//bin/java -Xmx1024m -Djava.net.preferIPv4Stack=true 
-Xmx1000m -Dpig.log.dir=/usr/lib/pig/bin/../logs -Dpig.log.file=pig.log 
-Dpig.home.dir=/usr/lib/pig/bin/.. -Dhadoop.log.dir=/var/log/hadoop-mschultz 
-Dhadoop
25189 mschultz   20   0 1820M  143M 19920 S  0.0  3.7  0:00.00 │  │           
├─ /usr/java/default//bin/java -Xmx1024m -Djava.net.preferIPv4Stack=true 
-Xmx1000m -Dpig.log.dir=/usr/lib/pig/bin/../logs -Dpig.log.file=pig.log 
-Dpig.home.dir=/usr/lib/pig/bin/.. -Dhadoop.log.dir=/var/log/hadoop-mschultz 
-Dhadoop
25187 mschultz   20   0 1820M  143M 19920 S  0.0  3.7  0:00.00 │  │           
├─ /usr/java/default//bin/java -Xmx1024m -Djava.net.preferIPv4Stack=true 
-Xmx1000m -Dpig.log.dir=/usr/lib/pig/bin/../logs -Dpig.log.file=pig.log 
-Dpig.home.dir=/usr/lib/pig/bin/.. -Dhadoop.log.dir=/var/log/hadoop-mschultz 
-Dhadoop
25186 mschultz   20   0 1820M  143M 19920 S  0.0  3.7  0:00.00 │  │           
├─ /usr/java/default//bin/java -Xmx1024m -Djava.net.preferIPv4Stack=true 
-Xmx1000m -Dpig.log.dir=/usr/lib/pig/bin/../logs -Dpig.log.file=pig.log 
-Dpig.home.dir=/usr/lib/pig/bin/.. -Dhadoop.log.dir=/var/log/hadoop-mschultz 
-Dhadoop
25185 mschultz   20   0 1820M  143M 19920 S  1.0  3.7  0:01.62 │  │           
├─ /usr/java/default//bin/java -Xmx1024m -Djava.net.preferIPv4Stack=true 
-Xmx1000m -Dpig.log.dir=/usr/lib/pig/bin/../logs -Dpig.log.file=pig.log 
-Dpig.home.dir=/usr/lib/pig/bin/.. -Dhadoop.log.dir=/var/log/hadoop-mschultz 
-Dhadoop
25184 mschultz   20   0 1820M  143M 19920 S 17.0  3.7  0:07.29 │  │           
├─ /usr/java/default//bin/java -Xmx1024m -Djava.net.preferIPv4Stack=true 
-Xmx1000m -Dpig.log.dir=/usr/lib/pig/bin/../logs -Dpig.log.file=pig.log 
-Dpig.home.dir=/usr/lib/pig/bin/.. -Dhadoop.log.dir=/var/log/hadoop-mschultz 
-Dhadoop
25183 mschultz   20   0 1820M  143M 19920 S  0.0  3.7  0:00.00 │  │           
├─ /usr/java/default//bin/java -Xmx1024m -Djava.net.preferIPv4Stack=true 
-Xmx1000m -Dpig.log.dir=/usr/lib/pig/bin/../logs -Dpig.log.file=pig.log 
-Dpig.home.dir=/usr/lib/pig/bin/.. -Dhadoop.log.dir=/var/log/hadoop-mschultz 
-Dhadoop
25182 mschultz   20   0 1820M  143M 19920 S  0.0  3.7  0:00.01 │  │           
├─ /usr/java/default//bin/java -Xmx1024m -Djava.net.preferIPv4Stack=true 
-Xmx1000m -Dpig.log.dir=/usr/lib/pig/bin/../logs -Dpig.log.file=pig.log 
-Dpig.home.dir=/usr/lib/pig/bin/.. -Dhadoop.log.dir=/var/log/hadoop-mschultz 
-Dhadoop
25181 mschultz   20   0 1820M  143M 19920 S  0.0  3.7  0:00.03 │  │           
├─ /usr/java/default//bin/java -Xmx1024m -Djava.net.preferIPv4Stack=true 
-Xmx1000m -Dpig.log.dir=/usr/lib/pig/bin/../logs -Dpig.log.file=pig.log 
-Dpig.home.dir=/usr/lib/pig/bin/.. -Dhadoop.log.dir=/var/log/hadoop-mschultz 
-Dhadoop
25180 mschultz   20   0 1820M  143M 19920 S  0.0  3.7  0:00.16 │  │           
├─ /usr/java/default//bin/java -Xmx1024m -Djava.net.preferIPv4Stack=true 
-Xmx1000m -Dpig.log.dir=/usr/lib/pig/bin/../logs -Dpig.log.file=pig.log 
-Dpig.home.dir=/usr/lib/pig/bin/.. -Dhadoop.log.dir=/var/log/hadoop-mschultz 
-Dhadoop
25179 mschultz   20   0 1820M  143M 19920 S  0.0  3.7  0:00.17 │  │           
├─ /usr/java/default//bin/java -Xmx1024m -Djava.net.preferIPv4Stack=true 
-Xmx1000m -Dpig.log.dir=/usr/lib/pig/bin/../logs -Dpig.log.file=pig.log 
-Dpig.home.dir=/usr/lib/pig/bin/.. -Dhadoop.log.dir=/var/log/hadoop-mschultz 
-Dhadoop
25178 mschultz   20   0 1820M  143M 19920 S  0.0  3.7  0:00.18 │  │           
├─ /usr/java/default//bin/java -Xmx1024m -Djava.net.preferIPv4Stack=true 
-Xmx1000m -Dpig.log.dir=/usr/lib/pig/bin/../logs -Dpig.log.file=pig.log 
-Dpig.home.dir=/usr/lib/pig/bin/.. -Dhadoop.log.dir=/var/log/hadoop-mschultz 
-Dhadoop

The pig script that I ran was pretty simple. Just a LOAD, SAMPLE, and DUMP. No 
fancy udfs or series of map-reduce plans.

So why does a single pig call end up generating so many java threads/processes?
What are they for?
And how can I limit the number of them per pig command?


Regards,
Michael Schultz [Description: Macintosh 
HD:Users:mschultz:Nominum:work:microsoft-files:corporate-docs:corporate_logo_for_web.png]
Data Engineer
(650) 381 - 6062
http://www.nominum.com/


Reply via email to