Re: message transmission in Hadoop

2011-01-30 Thread Da Zheng
Jeff, thanks for your reply. unfortunately, the website is in maintenance. The reason I monitored the system calls of HDFS is to try to find out what activities cause so much system CPU time. Other than writing to the disk and sending and receiving packets, I cannot think of anything else that can

Re: message transmission in Hadoop

2011-01-30 Thread Jeff Hammerbacher
Hey Da, You may have observed https://issues.apache.org/jira/browse/HDFS-1601. Regards, Jeff On Fri, Jan 28, 2011 at 7:08 PM, Da Zheng wrote: > Hello, > > I monitored system calls of HDFS with systemtap and found HDFS actually > sends > many 1-byte data to the network. I could also see many 8-

Re: Click Stream Data

2011-01-30 Thread Bruce Williams
Thanks Aaron, it has to be click stream and the more the better. Thanks everyone. Bruce Williams Concepts, like individuals, have their histories and are just as incapable of withstanding the ravages of time as are individuals. But in and through all this they retain a kind of homesickness for t

Re: Click Stream Data

2011-01-30 Thread Aaron Kimball
Start with the student's CS department's web server? I believe the wikimedia foundation also makes the access logs to wikipedia et al. available publicly. That is quite a lot of data though. - Aaron On Sun, Jan 30, 2011 at 10:54 AM, Bruce Williams wrote: > Does anyone know of a source of click s

Re: Click Stream Data

2011-01-30 Thread brien colwell
Forgot to mention sheet music or tabs as another good source of sequence data ;) On Jan 30, 2011 2:45 PM, "brien colwell" wrote: > You might consider starting with other sequence data like file bytes or DNA. > The main difference between those and click stream is how you model the > steps. > On Ja

Re: Click Stream Data

2011-01-30 Thread brien colwell
You might consider starting with other sequence data like file bytes or DNA. The main difference between those and click stream is how you model the steps. On Jan 30, 2011 1:55 PM, "Bruce Williams" wrote:

Click Stream Data

2011-01-30 Thread Bruce Williams
Does anyone know of a source of click stream data for a student research project? Bruce Williams Concepts, like individuals, have their histories and are just as incapable of withstanding the ravages of time as are individuals. But in and through all this they retain a kind of homesickness for th

Re: Draining/Decommisioning a tasktracker

2011-01-30 Thread phil young
This is the specific information I referred to in my post. http://hadoop.apache.org/common/docs/r0.20.0/fair_scheduler.html mapred.fairscheduler.loadmanager An extensibility point that lets you specify a class that determines how many maps and reduces can run on a given TaskTracker. This class sh