Re: HDFS File Read

2007-11-16 Thread Raghu Angadi
Taj, I don't know what you are trying to do but simultaneous write and read won't work on any filesystem (unless reader is more complicated that what you had). For now, I think you will get most predictable behaviour if you read after writer has closed the file. Raghu. j2eeiscool wrote:

RE: Test Data for Hadoop Student

2007-11-16 Thread Bruce Williams
When I mentioned Creative Commons materials, I had the University of Washington materials in mind. Thank you for your response Bruce Williams -Original Message- From: Aaron Kimball [mailto:[EMAIL PROTECTED] Sent: Friday, November 16, 2007 6:21 PM To: hadoop-user@lucene.apache.org Subje

Re: Test Data for Hadoop Student

2007-11-16 Thread Ted Dunning
The recent O'Reilly book " Programming Collective Intelligence" might be an interesting resource for problems and data sources as well. On 11/16/07 6:21 PM, "Aaron Kimball" <[EMAIL PROTECTED]> wrote: > Bruce, > > I helped design and teach an undergrad course based on Hadoop last year. > Along

RE: HDFS File Read

2007-11-16 Thread j2eeiscool
Hi Dhruba, For my test I do have a Reader and Writer thread. The Reader blocks till the InputStream is available: The Reader gets the following exception till the Writer is done : org.apache.hadoop.ipc.RemoteException: java.io.IOException: Cannot open filename /hadoopdata0.txt at org.ap

Re: Test Data for Hadoop Student

2007-11-16 Thread Aaron Kimball
Bruce, I helped design and teach an undergrad course based on Hadoop last year. Along with some folks at Google, we then made the resources available together to distribute to other universities and the public at large (via Creative Commons license, actually). All the materials are available

RE: HDFS File Read

2007-11-16 Thread dhruba Borthakur
This could happen if one of your threads was reading a file when another thread deleted the file and created another new file with the same name. The first reader wants to fetch more blocks for the file but detects that the file has a different blocklist. One option for you is to re-open the file

Re: HDFS File Read

2007-11-16 Thread j2eeiscool
Thanx for your reply Ted, I get this in the middle of a file read (towards the end actually). No change to the cluster config during this operation. Programatically what would be the best way to recover from this : Open the inputstream again and seek to the failure position ? Thanx, Taj Te

Re: HDFS File Read

2007-11-16 Thread Ted Dunning
Run hadoop fsck / It sounds like you have some blocks that have been lost somehow. This is pretty easy to do as you reconfigure a new cluster. On 11/16/07 12:21 PM, "j2eeiscool" <[EMAIL PROTECTED]> wrote: > > Raghu/Ted, > > This turned out to be a sub-optimal network pipe between client and

Re: HDFS File Read

2007-11-16 Thread j2eeiscool
Raghu/Ted, This turned out to be a sub-optimal network pipe between client and data-node. Now the average read time is around 35 secs (for 68 megs ). On to the next issue: 07/11/16 20:05:37 WARN fs.DFSClient: DFS Read: java.io.IOException: Blocklist for /hadoopdata0.txt has changed! at

Re: 答复: HBase PerformanceEvaluation failing

2007-11-16 Thread Kareem Dana
I am using xen with Linux 2.6.18. dfs -put works fine. I can read data I have put and all other dfs operations work. They work before I run the PE test and then after the PE test fails dfs still works fine on its own. However I found some more DFS errors in the logs that happen right before the PE

performance test tips?

2007-11-16 Thread jonathan doklovic
Hi, We've finally got our hadoop cluster up, some data to crunch and a map/reduce job. After running a few configurations, i'm not sure about our performance and would like to get some advice We have a 20 node ec2 cluster. We have 750MB of data. currently our job seems to be doing 1%/min on

Re: performance test tips?

2007-11-16 Thread Arun C Murthy
Jonathan On Fri, Nov 16, 2007 at 12:00:21PM -0600, jonathan doklovic wrote: >Hi, > >We've finally got our hadoop cluster up, some data to crunch and a >map/reduce job. > >After running a few configurations, i'm not sure about our performance >and would like to get some advice > >We have a 20 n

Re: Removing nodes from the cluster?

2007-11-16 Thread Nate Carlson
On Fri, 16 Nov 2007, Doug Cutting wrote: I'm testing out a Hadoop cluster on EC2.. we've currently got 20 nodes, and for some silly reason, I started the dfs daemon on all of the nodes. I'd like to drop back down to 3 nodes after we've finished testing the apps; is there any way to pull the oth

Re: Removing nodes from the cluster?

2007-11-16 Thread Doug Cutting
Nate Carlson wrote: I'm testing out a Hadoop cluster on EC2.. we've currently got 20 nodes, and for some silly reason, I started the dfs daemon on all of the nodes. I'd like to drop back down to 3 nodes after we've finished testing the apps; is there any way to pull the other nodes from dfs wit

Test Data for Hadoop Student

2007-11-16 Thread Edward Bruce Williams
Hello I am a student doing an independent study project investigating the possibility of teaching large scale computing on a small scale budget. Th My thought is to use available Open Source ( Hadoop) and Creative Commons and other materials as the text. A student could then do significan

Removing nodes from the cluster?

2007-11-16 Thread Nate Carlson
I'm testing out a Hadoop cluster on EC2.. we've currently got 20 nodes, and for some silly reason, I started the dfs daemon on all of the nodes. I'd like to drop back down to 3 nodes after we've finished testing the apps; is there any way to pull the other nodes from dfs without breaking anythi

Re: map/reduce with hbase

2007-11-16 Thread Billy
Thanks for the patches they seam to be working atm. I will use it for a while and see what suggestions or bugs I can find and let you know. right now the only thing I can think of is return 1 for successful insert/delete and 0 for failed insert/delete Thanks again. Billy "edward yoon" <[EMAI