RE: Platform reliability with Hadoop

2008-01-21 Thread Jeff Eastman
(/users/jeastman/...) but that throws 'input path does not exist' errors. Jeff -Original Message- From: Jeff Eastman [mailto:[EMAIL PROTECTED] Sent: Monday, January 21, 2008 11:15 AM To: hadoop-user@lucene.apache.org Subject: RE: Platform reliability with Hadoop Is it really t

Re: hadoop file system browser

2008-01-21 Thread Ted Dunning
There has been significant work on building a web-DAV interface for HDFS. I haven't heard any news for some time, however. On 1/21/08 11:32 AM, "Dawid Weiss" <[EMAIL PROTECTED]> wrote: > >> The Eclipse plug-in also features a DFS browser. > > Yep. That's all true, I don't mean to self-promot

Re: hadoop file system browser

2008-01-21 Thread Dawid Weiss
The Eclipse plug-in also features a DFS browser. Yep. That's all true, I don't mean to self-promote, because there really isn't that much to advertise ;) I was just quite attached to file manager-like user interface; the mucommander clone I posted served me as a browser, but also for rudime

RE: Platform reliability with Hadoop

2008-01-21 Thread Jeff Eastman
unday, January 20, 2008 11:44 AM To: hadoop-user@lucene.apache.org Subject: Re: Platform reliability with Hadoop you might want to change hadoop.tmp.dir entry alone. since others are derived out of this, everything should be fine. i am wondering if hadoop.tmp.dir might be used elsewhere thanks, lohi

Re: hadoop file system browser

2008-01-21 Thread Christophe Taton
ttps://issues.apache.org/jira/browse/HADOOP-2534 Works well for me, read the docs -- you'll need to update JARs (and re-sign them) if you work with the HEAD. Dawid

Re: hadoop file system browser

2008-01-21 Thread Johannes Zillmann
l need to update JARs (and re-sign them) if you work with the HEAD. Dawid -- ~~~ 101tec GmbH Halle (Saale), Saxony-Anhalt, Germany http://www.101tec.com

Re: how to terminate a program in hadoop?

2008-01-21 Thread Ted Dunning
The web interface can also be used. This is handy if you are following the progress of the job via the web. Scroll to the bottom of the page. On 1/20/08 11:39 PM, "Jeff Hammerbacher" <[EMAIL PROTECTED]> wrote: > ./bin/hadoop job -Dmapred.job.tracker=: > -kill > > you can find the required c

Re: hadoop file system browser

2008-01-21 Thread Dawid Weiss
I posted something like this a while ago. See here. https://issues.apache.org/jira/browse/HADOOP-2534 Works well for me, read the docs -- you'll need to update JARs (and re-sign them) if you work with the HEAD. Dawid

Re: write files to HDFS under windows enviroment

2008-01-21 Thread Ryan Wang
bin/hadoop dfs -copyFromLocal (Command Line) or read the hadop wiki (Programming) On Jan 21, 2008 3:31 PM, wang daming <[EMAIL PROTECTED]> wrote: > I have setup the HDFS cluster under linux enviroment, but I need write > files to the DFS from windows application, how can I achieve this? thanks

Re: hadoop file system browser

2008-01-21 Thread Cam Bazz
Hello, I meant like a gui. I found out this is possible tru the web interface however. thanks. On Jan 21, 2008 3:17 PM, Miles Osborne <[EMAIL PROTECTED]> wrote: > hadoop dfs -ls DIR > > (etc) allows you to see the file system > > Miles > > On 21/01/2008, Cam Bazz <[EMAIL PROTECTED]> wrote: > > >

Re: hadoop file system browser

2008-01-21 Thread Miles Osborne
hadoop dfs -ls DIR (etc) allows you to see the file system Miles On 21/01/2008, Cam Bazz <[EMAIL PROTECTED]> wrote: > > hello > is there any utility to browse hadoop fs? > Best Regards, > -C.B. >

Re: how to read a table in hbase from the first row to the last?

2008-01-21 Thread edward yoon
Use the scanner as describe below: HTable table = new HTable(HBaseConfiguration conf, Text tableName); HScannerInterface scan = table.obtainScanner(Text[] columns, new Text("")); HStoreKey key = new HStoreKey(); TreeMap results = new TreeMap(); while (scan.next(key, results) { .. } On 1/21/0

Re: how to terminate a program in hadoop?

2008-01-20 Thread Jeff Hammerbacher
no, the jobtracker will take care of that for you. On Jan 20, 2008 11:44 PM, ma qiang <[EMAIL PROTECTED]> wrote: > So I must terminate these jobs using this method you mentioned on all > the computers in my cluster, am I right? > > > > > On Jan 21, 2008 3:39 PM, Jeff Hammerbacher <[EMAIL PROTECTE

Re: how to terminate a program in hadoop?

2008-01-20 Thread ma qiang
So I must terminate these jobs using this method you mentioned on all the computers in my cluster, am I right? On Jan 21, 2008 3:39 PM, Jeff Hammerbacher <[EMAIL PROTECTED]> wrote: > ./bin/hadoop job -Dmapred.job.tracker=: > -kill > > you can find the required constants by pointing your browse

Re: how to terminate a program in hadoop?

2008-01-20 Thread Jeff Hammerbacher
./bin/hadoop job -Dmapred.job.tracker=: -kill you can find the required constants by pointing your browser to your jobtracker. On Jan 20, 2008 11:36 PM, ma qiang <[EMAIL PROTECTED]> wrote: > Dear colleagues: > I use eclipse to develop some programs in hadoop, when > exceptions took plac

Re: How to use hadoop wiht tomcat!!

2008-01-20 Thread Ted Dunning
I would say that it is generally better practice to deploy hadoop.jar in the lib directory of the war file that you are deploying so that you can change versions of hadoop more easily. Your problem is that you have dropped the tomcat support classes from your CLASSPATH in the process of getting h

Re: Does anyone have any experience with running a Map/Red node that is not also a DFS node?

2008-01-20 Thread Ted Dunning
We effectively have this situation on a significant fraction of our work-load as well. Much of our data is summarized hourly and is encrypted and compressed which makes it unsplittable. This means that the map processes are often not local to the data since the data is typically spread only to

Re: Platform reliability with Hadoop

2008-01-20 Thread lohit . vijayarenu
g Sent: Sunday, January 20, 2008 11:05:28 AM Subject: RE: Platform reliability with Hadoop I am almost operational again but something in my configuration is still not quite right. Here's what I did: - I created a directory /u1/cloud-data on every machine's local disk - I created a new u

RE: Platform reliability with Hadoop

2008-01-20 Thread Jeff Eastman
ilto:[EMAIL PROTECTED] Sent: Wednesday, January 16, 2008 10:04 AM To: hadoop-user@lucene.apache.org Subject: Re: Platform reliability with Hadoop The /tmp default has caught us once or twice too. Now we put the files elsewhere. [EMAIL PROTECTED] wrote: >> The DFS is stored in /tmp o

Re: Does anyone have any experience with running a Map/Red node that is not also a DFS node?

2008-01-20 Thread Allen Wittenauer
On 1/18/08 3:29 PM, "Jason Venner" <[EMAIL PROTECTED]> wrote: > We were thinking of doing this with some machines that do not have > decent disks but have plenty of netbandwidth. We were doing it for a while, in particular for our data loaders* but that was months and months ago.

Re: without insert method in HTable class

2008-01-20 Thread edward yoon
Try like this: HTable table = new HTable(conf, Text tableName); long lockId = table.startUpdate(Text row); table.put(lockId, Text columnName, byte[] value); table.commit(lockId); On 1/20/08, ma qiang <[EMAIL PROTECTED]> wrote: > Dear colleagues: > I don't know how to insert new rows

Re: without insert method in HTable class

2008-01-20 Thread edward yoon
Try like this : HTable table = new HTable(conf, Text tableName); long lockId = table.startUpdate(getRow()); table.put(lockId, Text columnName, byte[] value); table.commit(lockId); On 1/20/08, ma qiang <[EMAIL PROTECTED]> wrote: > Dear colleagues: > I don't know how to insert new

Re: Hbase FATAL error

2008-01-19 Thread stack
HADOOP-2343 describes regionservers 'hanging' inexplicably. Do you think you are experiencing a similar phenomenon? St.Ack Billy wrote: I thank it might be related to something in the region server as it never happens to more then one region at a time but they all have failed over time even t

Re: Hbase FATAL error

2008-01-19 Thread Billy
I thank it might be related to something in the region server as it never happens to more then one region at a time but they all have failed over time even the one on the same node as the master so that rules out network/switch problems. if it was the master then all the regions server would go

Re: Hbase FATAL error

2008-01-19 Thread stack
regionservers will shut themselves down if they are unable to contact the master. Can you figure what the master was doing such that it became non-responsive during this time? St.Ack Billy wrote: I been getting these errors from time to time seams like when the region servers are under a load

RE: Reduce hangs

2008-01-18 Thread Yunhong Gu1
Oh, so it is the task running on the other node (ncdm-15) fails and Hadoop re-run the task on the local node (ncdm-8). (I only have two nodes, ncdm-8 and ncdm-15. Both namenode and jobtracker are running on ncdm-8. The program is also started on ncdm-8). 08/01/18 19:08:27 INFO

RE: Reduce hangs

2008-01-18 Thread Devaraj Das
Hi Yunhong, As per the output it seems the job ran to successful completion (albeit with some failures)... Devaraj > -Original Message- > From: Yunhong Gu1 [mailto:[EMAIL PROTECTED] > Sent: Saturday, January 19, 2008 8:56 AM > To: hadoop-user@lucene.apache.org > Subj

Re: Reduce hangs

2008-01-18 Thread Yunhong Gu1
Yes, it looks like HADOOP-1374 The program actually failed after a while: [EMAIL PROTECTED]:~/hadoop-0.15.2$ ./bin/hadoop jar hadoop-0.15.2-test.jar mrbench MRBenchmark.0.0.2 08/01/18 18:53:08 INFO mapred.MRBench: creating control file: 1 numLines, ASCENDING sortOrder 08/01/18 18:53:08 INFO

Re: Reduce hangs

2008-01-18 Thread Konstantin Shvachko
Looks like we still have this unsolved mysterious problem: http://issues.apache.org/jira/browse/HADOOP-1374 Could it be related to HADOOP-1246? Arun? Thanks, --Konstantin Yunhong Gu1 wrote: Hi, If someone knows how to fix the problem described below, please help me out. Thanks! I am test

Re: Reduce hangs

2008-01-18 Thread Yunhong Gu1
The program "mrbench" takes 1 second on a single node, so I think waiting for 1 minute should be long enough. And I also restarted Hadoop after I updated the config file. Yunhong On Fri, 18 Jan 2008, Miles Osborne wrote: I think it takes a while to actually work, so be patient! Miles On

Re: Reduce hangs

2008-01-18 Thread Yunhong Gu1
I am using 0.15.2, and in my case, CPUs on both nodes are idle. It looks like the program is trapped into a synchronization deadlock or some waiting state that will never be awaken. Yunhong On Fri, 18 Jan 2008, Jason Venner wrote: When this was happening to us, there was a block replication

Re: Reduce hangs

2008-01-18 Thread Jason Venner
When this was happening to us, there was a block replication error and one node was in an endless loop trying to replicate a block to another node which would not accept it. In our case most of the cluster was idle but a cpu on the machine trying send the block was heavily used. We never were

Re: Reduce hangs

2008-01-18 Thread Miles Osborne
I think it takes a while to actually work, so be patient! Miles On 18/01/2008, Yunhong Gu1 <[EMAIL PROTECTED]> wrote: > > > Hi, Miles, > > Thanks for your information. I applied this but the problem still exists. > By the way, when this happens, the CPUs are idle and doing nothing. > > Yunhong >

Re: Reduce hangs

2008-01-18 Thread Yunhong Gu1
Hi, Miles, Thanks for your information. I applied this but the problem still exists. By the way, when this happens, the CPUs are idle and doing nothing. Yunhong On Fri, 18 Jan 2008, Miles Osborne wrote: I had the same problem. If I recall, the fix is to add the following to your hadoop-si

Re: Reduce hangs

2008-01-18 Thread Miles Osborne
I had the same problem. If I recall, the fix is to add the following to your hadoop-site.xml file: mapred.reduce.copy.backoff 5 See hadoop-1984 Miles On 18/01/2008, Yunhong Gu1 <[EMAIL PROTECTED]> wrote: > > > Hi, > > If someone knows how to fix the problem described below, please help me >

Re: Hadoop only processing the first 64 meg block of a 2 gig file

2008-01-18 Thread lohit . vijayarenu
may be running on different input tell us if this is map reduce / data problem thanks, lohit - Original Message From: Ted Dunning <[EMAIL PROTECTED]> To: hadoop-user@lucene.apache.org Sent: Friday, January 18, 2008 9:04:37 AM Subject: Re: Hadoop only processing the first 64 meg bl

RE: Hadoop only processing the first 64 meg block of a 2 gig file

2008-01-18 Thread Matt Herndon
more into this when I'm back next week. Thanks for your help so far. If anyone else has suggestions over the weekend feel free to share. --Matt -Original Message- From: Ted Dunning [mailto:[EMAIL PROTECTED] Sent: Friday, January 18, 2008 12:05 PM To: hadoop-user@lucene.apache.org

Re: Hadoop only processing the first 64 meg block of a 2 gig file

2008-01-18 Thread Ted Dunning
3 AM > To: hadoop-user@lucene.apache.org > Subject: Re: Hadoop only processing the first 64 meg block of a 2 gig > file > > > Go into the web interface and look at the file. > > See if you can see all of the blocks. > > > On 1/18/08 7:46 AM, "Matt Herndon"

RE: Hadoop only processing the first 64 meg block of a 2 gig file

2008-01-18 Thread Matt Herndon
Subject: Re: Hadoop only processing the first 64 meg block of a 2 gig file Go into the web interface and look at the file. See if you can see all of the blocks. On 1/18/08 7:46 AM, "Matt Herndon" <[EMAIL PROTECTED]> wrote: > Hello, > > > > I'm trying to ge

Re: Hadoop only processing the first 64 meg block of a 2 gig file

2008-01-18 Thread Ted Dunning
Go into the web interface and look at the file. See if you can see all of the blocks. On 1/18/08 7:46 AM, "Matt Herndon" <[EMAIL PROTECTED]> wrote: > Hello, > > > > I'm trying to get Hadoop to process a 2 gig file but it seems to only be > processing the first block. I'm running the exact

Re: the "127.0.1.1:60020" in HBase

2008-01-18 Thread Mafish Liu
It seems that your hbase master did not startup properly. By default, hadoop and hbase should run correctly without any changes. To locate your problem, 1. run HADOOP_HOME/bin/start-dfs.sh 2. run HADOOP_HBASE/bin/start-hbase.sh 3. run HADOOP_HBASE/bin/hbase shell 4. in hbase shell, run "show tables

Re: the "127.0.1.1:60020" in HBase

2008-01-18 Thread ma qiang
Dear Mafish liu: I haven't changed anything in my hbase-default.xml . I only changed hbase-site.xml . I don't know what happened in my hadoop. I deleted all the hadoop to reinstall the whole hadoop then the error disappear. I don't know what's wrong . Thanks very much !

Re: the "127.0.1.1:60020" in HBase

2008-01-18 Thread Mafish Liu
Port 60020 is default hbase.regionserver port in my hbase-defaut.xml. hbase.regionserver 0.0.0.0:60020 The host and port a HBase region server runs at. how about yours? On Jan 18, 2008 4:04 PM, Mafish Liu <[EMAIL PROTECTED]> wrote: > Hi, ma qiang: > It seems that you

Re: the "127.0.1.1:60020" in HBase

2008-01-18 Thread Mafish Liu
Hi, ma qiang: It seems that you had modified hadoop-site.xml to assign new ip/port. Try to do following (I'm not quite sure it will work): 1. Stop your hadoop and hbase. 2. In your HADOOP_HOME, run "ant jar" 3. Restart hadoop and hbase. On Jan 18, 2008 3:28 PM, ma qiang <[EMAIL PROTECTED]> wro

Re: the "127.0.1.1:60020" in HBase

2008-01-17 Thread ma qiang
Yes, I'm sure I have my hadoop run ar first, and I can run my another test in my hadoop . On Jan 18, 2008 3:20 PM, Mafish Liu <[EMAIL PROTECTED]> wrote: > Are you sure that you hava had your hadoop run before you run your test? > 127.0.1.1:60020 is the namenode's ip:port, hbase is based on hadoop

Re: the "127.0.1.1:60020" in HBase

2008-01-17 Thread Mafish Liu
Are you sure that you hava had your hadoop run before you run your test? 127.0.1.1:60020 is the namenode's ip:port, hbase is based on hadoop, it need to connent to hadoop. On Jan 18, 2008 12:16 PM, ma qiang <[EMAIL PROTECTED]> wrote: > Dear colleagues; > I run the code as described below: >

Re: how to deploy hadoop on many PCs quickly?

2008-01-17 Thread Bin YANG
thanks, russell smith. I fix the grub, but the ext3 file system which is restored from the Norton Ghost seems not preserve correctly. At last, I use G4L (Ghost for Linux) to copy whole hard disk from source drive to destination drive, it works very well. Both GRUB and ext3 file system works corre

Re: Problem with autogenerated C++ Records using bin/rcc

2008-01-17 Thread Ned Rockson
Okay, that makes sense. It seems strange to return a reference to a string object that's not const. Doesn't this break some C++ coding standards? It seems it should return a pointer or have a set method to maintain symmetry between the two languages. The counter argument, of course, is that it'

Re: how to deploy hadoop on many PCs quickly?

2008-01-17 Thread Russell Smith
Bin, Did you try using dd from the source to the destination drive? That should preserve grub. Russell Smith UKD1 Limited Bin YANG wrote: I use the Norton Ghost 8.0 ghost a whole ubuntu hard disk to a image, and restore another hard disk from the image, but the restored hard disk cannnot s

RE: about using HBase?

2008-01-17 Thread edward yoon
Dear ma qiang. Firstly, please check the hbase status using hbase shell or hbase deamons log. Also, If you didn't restart the hadoop, please re-start. (It means that configuration file doesn't loaded) B. Regards, Edward yoon @ NHN, corp. > Date: Thu, 17 Jan 2008 20:21:47

Re: about using HBase?

2008-01-17 Thread ma qiang
rom: [EMAIL PROTECTED] > > To: hadoop-user@lucene.apache.org > > Subject: Re: about using HBase? > > > > I have met this problem, When I run the code HBaseAdmin admin = new > > HbaseAdmin(conf); the console print these messeages as below: > > 08/01/17 18:46:46 INFO ipc.Client: R

RE: about using HBase?

2008-01-17 Thread edward yoon
hbase-site.xml should located in ${hadoop-home}/conf. B. Regards, Edward yoon @ NHN, corp. > Date: Thu, 17 Jan 2008 18:51:11 +0800 > From: [EMAIL PROTECTED] > To: hadoop-user@lucene.apache.org > Subject: Re: about using HBase? > > I have met this problem, When I run the cod

Re: about using HBase?

2008-01-17 Thread ma qiang
gt; > > > Date: Thu, 17 Jan 2008 16:50:29 +0800 > > From: [EMAIL PROTECTED] > > To: hadoop-user@lucene.apache.org > > Subject: Re: about using HBase? > > > > Thank you for your help! > > > > You metioned hadoop-0.16.* , but I still use hadoop-0.15 ,I can&#

RE: about using HBase?

2008-01-17 Thread edward yoon
Thu, 17 Jan 2008 16:50:29 +0800 > From: [EMAIL PROTECTED] > To: hadoop-user@lucene.apache.org > Subject: Re: about using HBase? > > Thank you for your help! > > You metioned hadoop-0.16.* , but I still use hadoop-0.15 ,I can't see > hadoop-0.16.* in the http:

Re: about using HBase?

2008-01-17 Thread ma qiang
. > > > > From: [EMAIL PROTECTED] > > To: hadoop-user@lucene.apache.org > > Subject: RE: about using HBase? > > Date: Thu, 17 Jan 2008 08:33:58 + > > > > > > > It's a org.apache.hadoop.hbase.hql. > > > > Or, Simply just us

RE: about using HBase?

2008-01-17 Thread edward yoon
Grandmotherly, If you want to retrieval the cell values, you should use the HTable.class because Result Set return policy doesn't implemented yet. Thanks. B. Regards, Edward yoon @ NHN, corp. > From: [EMAIL PROTECTED] > To: hadoop-user@lucene.apache.org > Subject: RE: abo

RE: about using HBase?

2008-01-17 Thread edward yoon
It's a org.apache.hadoop.hbase.hql. Or, Simply just use the 'ctrl + shift + o' key on eclipse tool. (simle) Thanks. B. Regards, Edward yoon @ NHN, corp. > Date: Thu, 17 Jan 2008 16:29:28 +0800 > From: [EMAIL PROTECTED] > To: hadoop-user@lucene.apache.org > Sub

Re: about using HBase?

2008-01-17 Thread ma qiang
"); > > > B. Regards, > > Edward yoon @ NHN, corp. > > > > Date: Thu, 17 Jan 2008 15:58:24 +0800 > > From: [EMAIL PROTECTED] > > To: hadoop-user@lucene.apache.org > > Subject: Re: about using HBase? > > > > Thanks very much!

RE: about using HBase?

2008-01-17 Thread edward yoon
uot;create table webtable('content','title');"); B. Regards, Edward yoon @ NHN, corp. > Date: Thu, 17 Jan 2008 15:58:24 +0800 > From: [EMAIL PROTECTED] > To: hadoop-user@lucene.apache.org > Subject: Re: about using HBase? > > Thanks very much! > When

Re: about using HBase?

2008-01-16 Thread ma qiang
t want the info server to run. > > > > hbase.rootdir > /tmp/hbase > location of HBase instance in dfs > > > > > > > > B. Regards, > > Edward yoon @ NHN, corp. > > > > From: [EMAIL PROTECTED] > > To: hadoop-user@lu

RE: about using HBase?

2008-01-16 Thread edward yoon
o run. hbase.rootdir /tmp/hbase location of HBase instance in dfs B. Regards, Edward yoon @ NHN, corp. > From: [EMAIL PROTECTED] > To: hadoop-user@lucene.apache.org > Subject: RE: about using HBase? > Date: Thu, 17 Jan 2008 07:46:00 + > > &

RE: about using HBase?

2008-01-16 Thread edward yoon
Please copy the hadoop-0.16.*-hbase.jar to ${hadoop_home}/lib folder. And, Here's a exmple of hadoop-site.xml hbase.master a51066.nhncorp.com:6 The port for the hbase master web UI Set to -1 if you do not want the info server to run. hbase

Re: Single output file per reduce key?

2008-01-16 Thread Amar Kamat
Myles Grant wrote: I would like the values for a key to exist in a single file, and only the values for that key. Reducer.reduce() gets invoked once per key, i.e just once per key along with all the values associated with it. Reducer.reduce(key,So what I suggested should help you generate one f

Re: Single output file per reduce key?

2008-01-16 Thread Myles Grant
I would like the values for a key to exist in a single file, and only the values for that key. Each reduced key/value would get its own file. If I understand correctly, all output of the reducers is written to a single file. -Myles On Jan 16, 2008, at 9:29 PM, Amar Kamat wrote: Hi, Why

Re: Single output file per reduce key?

2008-01-16 Thread Amar Kamat
Hi, Why couldn't you just write this logic in your reducer class. The reduce [reduceClass.reduce()] method is invoked with a key and an iterator over the values associated with the key. You can simply dump the values into a file. Since the input to the reducer is sorted you can simply dump the

RE: copyFromLocal bug?

2008-01-16 Thread edward yoon
I made a issue for this problem. https://issues.apache.org/jira/browse/HADOOP-2635 Thanks. B. Regards, Edward yoon @ NHN, corp. > From: [EMAIL PROTECTED] > To: hadoop-user@lucene.apache.org > Subject: RE: copyFromLocal bug? > Date: Thu, 17 Jan 2008 04:55:20 + > > &g

RE: copyFromLocal bug?

2008-01-16 Thread edward yoon
It seems bug about RawLocalFileSystem class. I'll made a issue. B. Regards, Edward yoon @ NHN, corp. > Date: Wed, 16 Jan 2008 23:03:03 -0500 > From: [EMAIL PROTECTED] > To: hadoop-user@lucene.apache.org > Subject: copyFromLocal bug? > > I was trying to copy a bunch of data over to my hadoop in

RE: Platform reliability with Hadoop

2008-01-16 Thread Jeff Eastman
Thanks, I will try a safer place for the DFS. Jeff -Original Message- From: Jason Venner [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 16, 2008 10:04 AM To: hadoop-user@lucene.apache.org Subject: Re: Platform reliability with Hadoop The /tmp default has caught us once or twice too

Re: Platform reliability with Hadoop

2008-01-16 Thread Jason Venner
The /tmp default has caught us once or twice too. Now we put the files elsewhere. [EMAIL PROTECTED] wrote: The DFS is stored in /tmp on each box. The developers who own the machines occasionally reboot and reprofile them Wont you lose your blocks after reboot since /tmp gets cleaned up?

Re: Platform reliability with Hadoop

2008-01-16 Thread lohit . vijayarenu
>The DFS is stored in /tmp on each box. > The developers who own the machines occasionally reboot and reprofile them Wont you lose your blocks after reboot since /tmp gets cleaned up? Could this be the reason you see data corruption? Good idea is to configure DFS to be any place other than /tmp

Re: a question on number of parallel tasks

2008-01-16 Thread Jim the Standing Bear
Thanks, Miles. On Jan 16, 2008 11:51 AM, Miles Osborne <[EMAIL PROTECTED]> wrote: > The number of reduces should be a function of the amount of data needing > reducing, not the number of mappers. > > For example, your mappers might delete 90% of the input data, in which > case you should only ne

Re: a question on number of parallel tasks

2008-01-16 Thread Miles Osborne
The number of reduces should be a function of the amount of data needing reducing, not the number of mappers. For example, your mappers might delete 90% of the input data, in which case you should only need 1/10 of the number of reducers as mappers. Miles On 16/01/2008, Jim the Standing Bear <

Re: writing output files in hadoop streaming

2008-01-16 Thread John Heidemann
>On 1/15/08 12:54 PM, "Miles Osborne" <[EMAIL PROTECTED]> wrote: > >> surely the clean way (in a streaming environment) would be to define a >> representation of some kind which serialises the output. >> >> http://en.wikipedia.org/wiki/Serialization >> >> after your mappers and reducers have comp

Re: a question on number of parallel tasks

2008-01-16 Thread Jim the Standing Bear
hmm.. interesting... these are supposed to be the output from mappers (and default reducers since I didn't specify any for those jobs)... but shouldn't the number of reducers match the number of mappers? If there was only one reducer, it would mean I only had one mapper task running?? That is why

Re: a question on number of parallel tasks

2008-01-16 Thread Ted Dunning
The part nomenclature does not refer to splits. It refers to how many reduce processes were involved in actually writing the output file. Files are split at read-time as necessary. You will get more of them if you have more reducers. On 1/16/08 8:25 AM, "Jim the Standing Bear" <[EMAIL PROTECT

Re: a question on number of parallel tasks

2008-01-16 Thread Jim the Standing Bear
Thanks Ted. I just didn't ask it right. Here is a stupid 101 question, which I am sure the answer lies in the documentation somewhere, just that I was having some difficulties in finding it... when I do an "ls" on the dfs, I would see this: /user/bear/output/part-0 I probably got confused

Re: a question on number of parallel tasks

2008-01-16 Thread Ted Dunning
Parallelizing the processing of data occurs at two steps. The first is during the map phase where the input data file is (hopefully) split across multiple tasks. This should happen transparently most of the time unless you have a perverse data format or use unsplittable compression on your file

Re: how to deploy hadoop on many PCs quickly?

2008-01-16 Thread Ted Dunning
This isn't really a question about Hadoop, but is about system administration basics. You are probably missing a master boot record (MBR) on the disk. Ask a local linux expert to help you or look at the Norton documentation. On 1/16/08 4:59 AM, "Bin YANG" <[EMAIL PROTECTED]> wrote: > I use th

Re: Hadoop overhead

2008-01-16 Thread Ted Dunning
There is some considerable and very understandable confusion about map tasks, mappers and input splits. It is true that for large inputs the input should ultimately be split into chunks so that each core that you have has to process 10-100 pieces of data. To do that, however, you only need one m

Re: unable to figure out this exception from reduce task

2008-01-16 Thread Jim the Standing Bear
wse/HADOOP-2164 > > Runping > > > > > -Original Message- > > From: Vadim Zaliva [mailto:[EMAIL PROTECTED] > > Sent: Tuesday, January 15, 2008 9:59 PM > > To: hadoop-user@lucene.apache.org > > Subject: Re: unable to figure out this exception from red

Re: how to deploy hadoop on many PCs quickly?

2008-01-16 Thread Bin YANG
I use the Norton Ghost 8.0 ghost a whole ubuntu hard disk to a image, and restore another hard disk from the image, but the restored hard disk cannnot start up ubuntu successfully. The GRUB said error 22. Does somebody know how to fix the problem? Thanks. Bin YANG On Jan 16, 2008 4:54 AM, Sagar

Re: Hadoop overhead

2008-01-16 Thread Johan Oskarsson
I simply followed the wiki "The right level of parallelism for maps seems to be around 10-100 maps/node", http://wiki.apache.org/lucene-hadoop/HowManyMapsAndReduces We have 8 cores in each machine, so perhaps 100 mappers ought to be right, it's set to 157 in the config but hadoop used ~200 for

RE: unable to figure out this exception from reduce task

2008-01-15 Thread Runping Qi
I encountered a similar case. Here is the Jira: https://issues.apache.org/jira/browse/HADOOP-2164 Runping > -Original Message- > From: Vadim Zaliva [mailto:[EMAIL PROTECTED] > Sent: Tuesday, January 15, 2008 9:59 PM > To: hadoop-user@lucene.apache.org > Subject: Re: u

Re: unable to figure out this exception from reduce task

2008-01-15 Thread Vadim Zaliva
On Jan 15, 2008, at 22:09, Jim the Standing Bear wrote: Only thing I noticed (compared to my code) is missing: client.setConf(conf); before client.run(conf). In this case default input format is used which uses LongWriteableComparable as a key. Not sure if this is a case, but something wo

Re: unable to figure out this exception from reduce task

2008-01-15 Thread Jim the Standing Bear
Well, I also wish it was this simple, but as I said in the original message, I never wanted to use LongWritable at all. Here is how I set the job conf, and after that, is the reduce task. Also, if I got the incorrect output key/value type, shouldn't it always fail as soon as the reduce task is ru

Re: unable to figure out this exception from reduce task

2008-01-15 Thread Vadim Zaliva
On Jan 15, 2008, at 21:53, Jim the Standing Bear wrote: I was asking lot of questions today, so I am glad to contribute at least one answer. I have this problem when there was type mismatch for key or values. You need to set up right type at your JobConf like this: conf.setOutputKeyCl

Re: single output file

2008-01-15 Thread Vadim Zaliva
On Jan 15, 2008, at 18:08, Ted Dunning wrote: One option that should work reasonably well is to have each mapper output with a constant key (as Rui suggests) and use a combiner to pre- select the top N elements. Communication between mappers and combiners is very fast, so this will be just

Re: single output file

2008-01-15 Thread Ted Dunning
Output a constant key in the map function. On 1/15/08 9:31 PM, "Vadim Zaliva" <[EMAIL PROTECTED]> wrote: > On Jan 15, 2008, at 17:56, Peter W. wrote: > > That would output last 10 values for each key. I need > to do this across all the keys in the set. > > Vadim > >> Hello, >> >> Try using

Re: single output file

2008-01-15 Thread Vadim Zaliva
On Jan 15, 2008, at 17:56, Peter W. wrote: That would output last 10 values for each key. I need to do this across all the keys in the set. Vadim Hello, Try using Java collection. untested code follows... public static class R extends MapReduceBase implements Reducer { public void reduc

Re: single output file

2008-01-15 Thread Ted Dunning
op-user@lucene.apache.org > Sent: Tuesday, January 15, 2008 4:13:11 PM > Subject: Re: single output file > > > > On Jan 15, 2008, at 13:57, Ted Dunning wrote: > >> This is happening because you have many reducers running, only one >> of which >> gets any data

Re: single output file

2008-01-15 Thread lohit . vijayarenu
>the question remains, how to return, say, last 10 records from Reducer. >I need to know when last record is processed. how about storing last 10 records seen so far. each time you see another record discard the old one and keep fresh copy of last seen 10 records. Now to know the end, how about s

Re: single output file

2008-01-15 Thread Peter W.
Hello, Try using Java collection. untested code follows... public static class R extends MapReduceBase implements Reducer { public void reduce(WritableComparable wc,Iterator it, OutputCollector out,Reporter r)throws IOException { Stack s=new Stack(); int cnt=0;

Re: single output file

2008-01-15 Thread Vadim Zaliva
On Jan 15, 2008, at 17:02, Rui Shi wrote: As far as I understand, let mapper produce top N records is not working as each mapper only has partial knowledge of the data, which will not lead to global optimal... I think your mapper needs to output all records (combined) and let the reducer to

Re: single output file

2008-01-15 Thread Rui Shi
- Original Message From: Vadim Zaliva <[EMAIL PROTECTED]> To: hadoop-user@lucene.apache.org Sent: Tuesday, January 15, 2008 4:13:11 PM Subject: Re: single output file On Jan 15, 2008, at 13:57, Ted Dunning wrote: > This is happening because you have many reducers running, only one &

Re: single output file

2008-01-15 Thread Vadim Zaliva
On Jan 15, 2008, at 13:57, Ted Dunning wrote: This is happening because you have many reducers running, only one of which gets any data. Since you have combiners, this probably isn't a problem. That reducer should only get as many records as you have maps. It would be a problem if your

RE: Fsck?

2008-01-15 Thread Jeff Eastman
"Use the code, Jeff" 1) Missing blocks are reported only when all replicas are missing 2) The files are history 3) The dfs won't actually do anything in safe mode 4) Try creating /lost+found first Jeff -Original Message- From: Jeff Eastman [mailto:[EMAIL PROTECTED] Sent: Tuesday, Januar

Re: single output file

2008-01-15 Thread Ted Dunning
This is happening because you have many reducers running, only one of which gets any data. Since you have combiners, this probably isn't a problem. That reducer should only get as many records as you have maps. It would be a problem if your reducer were getting lots of input records. You can

Re: writing output files in hadoop streaming

2008-01-15 Thread Ted Dunning
Also, this gives you a solution to your race condition (by using hadoop's mechanisms) and it also gives you much higher throughput/reliability/scalability than writing to NFS can possibly give you. On 1/15/08 12:54 PM, "Miles Osborne" <[EMAIL PROTECTED]> wrote: > surely the clean way (in a str

Re: writing output files in hadoop streaming

2008-01-15 Thread Miles Osborne
surely the clean way (in a streaming environment) would be to define a representation of some kind which serialises the output. http://en.wikipedia.org/wiki/Serialization after your mappers and reducers have completed, you would then have some code which deserialise (unpacked) the output as desir

Re: how to deploy hadoop on many PCs quickly?

2008-01-15 Thread Sagar Naik
Hi We at Visvo have developed a small script for command processing on a cluster. We would like to share it with you , have a it reviewed . It is available under APL. We would like to make a project so that we all can contribute to this script. For now, you can download this script from http:/

  1   2   3   4   5   6   7   8   9   10   >