Re: ID Service with HBase?

2008-04-17 Thread Thomas Thevis
Hello Jim, Row locks do not apply to reads, only updates. They prevent two applications from updating the same row simultaneously. There is no other locking mechanism in HBase. (It follows Bigtable in this regard. See http://labs.google.com/papers/bigtable.html ) Thank you for

Re: Aborting Map Function

2008-04-17 Thread Andrzej Bialecki
Owen O'Malley wrote: On Apr 16, 2008, at 8:28 AM, Chaman Singh Verma wrote: I am developing one application with MapReduce and in that whenever some MapTask condition is met, I would like to broadcast to all other MapTask to abort their work. I am not quite sure whether such broadcasting

Re: how to set loglevel to debug for DFSClient

2008-04-17 Thread André Martin
H Cagdas, simply adjust your log4.properties (needs to be in CLASSPATH of your DFCClient app): log4j.logger.org.apache.hadoop=DEBUG Cu on the 'net, Bye - bye, André èrbnA Cagdas Gerede wrote: How do you set DFSClient's log to

getting files in hdfs

2008-04-17 Thread Garri Santos
Good Day, I successfully installed and copy a test file to HDFS. I was wondering if is it possible to directly access the file without getting it out first from the HDFS. Regards, Garri

Re: getting files in hdfs

2008-04-17 Thread Thomas Thevis
What do you mean by 'directly access the file'? HDFS provides several file operations. Type '${PATH_TO_HADOOP_INSTALL}/bin/hadoop fs' to see an appropriate usage message. Regards, Thomas Garri Santos schrieb: Good Day, I successfully installed and copy a test file to HDFS. I was wondering

Re: Map reduce classes

2008-04-17 Thread Aayush Garg
One more thing::: The HashMap that I am generating in the reduce phase will be on single node or multiple nodes in the distributed enviornment? If my dataset is large will this approach work? If not what can I do for this? Also same thing with the file that I am writing in the run function (simple

Re: Hadoop summit video capture?

2008-04-17 Thread wuqi
Are the videos and slides available now? - Original Message - From: Jeremy Zawodny [EMAIL PROTECTED] To: core-user@hadoop.apache.org Cc: [EMAIL PROTECTED] Sent: Thursday, March 27, 2008 11:01 AM Subject: Re: Hadoop summit video capture? Slides and video go up next week. It just

Interleaving maps/reduces from multiple jobs on the same tasktracker

2008-04-17 Thread Jiaqi Tan
Hi, Will Hadoop ever interleave multiple maps/reduces from different jobs on the same tasktracker? Suppose I have 2 jobs submitted to a jobtracker, one after the other. Must all maps/reduces from the first submitted job be completed before the tasktrackers will run any of the maps/reduces from

Re: Map reduce classes

2008-04-17 Thread Aayush Garg
My latest problem is :: I can not always rely on writing HashMap to file like this:: FileOutputStream fout = new FileOutputStream(f); ObjectOutputStream objStream = new ObjectOutputStream(fout); objStream.writeObject(HashMap); This writing I am doing in the same run() of the outer class. The

RE: Counters giving double values

2008-04-17 Thread rude
hi devraj, so, i researched the topic with the counters further with some success. for one i can reproduce it now with a Test. i am waiting for the password for my JIRA account to get started there - somehow i didnt get the password after registration, i sent a mail to owen. i am not familiar

Re: How to set up rack awareness?

2008-04-17 Thread Rong-en Fan
On Thu, Apr 17, 2008 at 2:41 AM, Nate Carlson [EMAIL PROTECTED] wrote: I'm setting up a hadoop cluster across two data centers (with gig bandwidth between them).. I'd like to use the rack awareness features to help Hadoop know which nodes are local.. I see that it's possible, but haven't found

Re: Map reduce classes

2008-04-17 Thread Ted Dunning
I am not quite sure what you mean by this. If you mean that the second approach is only an approximation, then you are correct. The only simple correct algorithm that I know of is to do the counts (correctly) and then do the main show (processing with a kill list). On 4/16/08 9:04 PM, Amar

Re: Urgent

2008-04-17 Thread Norbert Burger
Yes -- you need to set your DynDNS record to point at the host (IP address) printed after the text Master is That's the master node of your EC2 cluster. The EC2UI Firefox plugin can be useful here to verify (independently) that EC2 instances have started correctly. Norbert On Wed, Apr 16,

Re: getting files in hdfs

2008-04-17 Thread Ted Dunning
You can also get to the file via HTTP. On 4/17/08 2:43 AM, Thomas Thevis [EMAIL PROTECTED] wrote: What do you mean by 'directly access the file'? HDFS provides several file operations. Type '${PATH_TO_HADOOP_INSTALL}/bin/hadoop fs' to see an appropriate usage message. Regards, Thomas

Re: Map reduce classes

2008-04-17 Thread Ted Dunning
Don't assume that any variables are shared between reducers or between maps, or between maps and reducers. If you want to share data, put it into HDFS. On 4/17/08 4:01 AM, Aayush Garg [EMAIL PROTECTED] wrote: One more thing::: The HashMap that I am generating in the reduce phase will be on

Re: Urgent

2008-04-17 Thread Norbert Burger
You need to create a DynDNS account and then add host records to this account. On Thu, Apr 17, 2008 at 12:03 PM, Prerna Manaktala [EMAIL PROTECTED] wrote: Hi How do we authenticate on dyndns site? I tried d dyndns query tool but failed when I entered prerna.dyndns.org and class as key

RE: Counters giving double values

2008-04-17 Thread Devaraj Das
Thanks! Will take a look at the jira issue 3267 _ From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Thursday, April 17, 2008 7:09 PM To: core-user@hadoop.apache.org Subject: RE: Counters giving double values hi devraj, so, i researched the topic with the counters further

Re: Urgent

2008-04-17 Thread Prerna Manaktala
already done that..but to do ssh to that particular host shouldnt some key be generated on dyndns? since I am not able to do ssh to that host On Thu, Apr 17, 2008 at 12:17 PM, Norbert Burger [EMAIL PROTECTED] wrote: You need to create a DynDNS account and then add host records to this account.

Re: Urgent

2008-04-17 Thread Norbert Burger
The 'hadoop-ec2 run' script will ssh you automatically into your master node after you complete the launch process. If you're manually ssh'ing into the host, you need to specify the private key generated during the EC2 signup process. For command-line ssh (inside Cygwin), use the -i argument

RE: Help: When is it safe to discard a block in the application layer

2008-04-17 Thread dhruba Borthakur
The DFSClient caches small packets (e.g. 64K write buffers) and they are lazily flushed to the datanoeds in the pipeline. So, when an application completes a out.write() call, it is definitely not guaranteed that data is sent to even one datanode. One option would be to retrieve cache hints

Re: Urgent

2008-04-17 Thread Prerna Manaktala
I tried it: ssh -i path to id_rsa 128.205.234.20(the path is correct) but again the same result of: ssh:connect to host 128.205.234.20 port 22:connection refused I tried disabling firewall as well. On Thu, Apr 17, 2008 at 12:56 PM, Norbert Burger [EMAIL PROTECTED] wrote: The 'hadoop-ec2 run'

Re: Map reduce classes

2008-04-17 Thread Aayush Garg
Current structure of my program is:: Upper class{ class Reduce{ reduce function(K1,V1,K2,V2){ // I count the frequency for each key // Add output in HashMap(Key,value) instead of output.collect() } } void run() { runjob(); // Now eliminate top frequency keys in

Re: Map reduce classes

2008-04-17 Thread Amar Kamat
Ted Dunning wrote: I am not quite sure what you mean by this. If you mean that the second approach is only an approximation, then you are correct. Yes. The only simple correct algorithm that I know of is to do the counts (correctly) and then do the main show (processing with a kill

Re: Not able to back up to S3

2008-04-17 Thread mohamedhafez
If I try to specify the ID and Secret as part of the S3 URL, I get the following error: [EMAIL PROTECTED]:~# hadoop distcp /dijkstra.log s3://1W27ZBE2AKDVVFZB9T02:[EMAIL PROTECTED]/ With failures, global counters are inaccurate; consider running with -i Copy failed:

Re: Query

2008-04-17 Thread Prerna Manaktala
thanks Already installed. but still ssh to the newly created host in dyndns site is not working ssh -i path to id_rsa prerna.dyndns.org.. it says connection to port 22 refused Prerna On Wed, Apr 16, 2008 at 9:17 PM, Edward J. Yoon [EMAIL PROTECTED] wrote: I didn't try to run on cygwin, but you

Re: Query

2008-04-17 Thread Edward J. Yoon
[EMAIL PROTECTED] ~ $ ssh usage: ssh [-1246AaCfgKkMNnqsTtVvXxY] [-b bind_address] [-c cipher_spec] [-D [bind_address:]port] [-e escape_char] [-F configfile] [-i identity_file] [-L [bind_address:]port:host:hostport] [-l login_name] [-m mac_spec] [-O ctl_cmd] [-o

Re: Query

2008-04-17 Thread Prerna Manaktala
how to do that? On Thu, Apr 17, 2008 at 7:35 PM, Edward J. Yoon [EMAIL PROTECTED] wrote: Your ISP may be blocking access to critical ports behind their routers. Can you access any ports on your router? You may want to try setting up your router to forward some other port (80?) to your

Re: Query

2008-04-17 Thread Edward J. Yoon
It wasn't too difficult though I do not remember the details. Maybe searching the WWW does reveal the details http://www.google.com/search?q=dyndns+and+ssh -Edward On Fri, Apr 18, 2008 at 8:43 AM, Prerna Manaktala [EMAIL PROTECTED] wrote: how to do that? On Thu, Apr 17, 2008 at 7:35 PM,

Experiences from setting up core source in my development environment

2008-04-17 Thread Karl Wettin
It took me three days and three times as many short attempts to get it test sources to compile. It was all about sitting down and reading the Ant build to figure out RecInt, RecString, et c, that they have to be built using target generate-test-records and then add the output in build/ to my

Re: Query

2008-04-17 Thread Prerna Manaktala
hey thanks but I am still not able to figure out. Please help On Thu, Apr 17, 2008 at 8:00 PM, Edward J. Yoon [EMAIL PROTECTED] wrote: It wasn't too difficult though I do not remember the details. Maybe searching the WWW does reveal the details http://www.google.com/search?q=dyndns+and+ssh

Re: Reusing jobs

2008-04-17 Thread Ted Dunning
Hadoop has enormous startup costs that are relatively inherent in the current design. Most notably, mappers and reducers are executed in a standalone JVM (ostensibly for safety reasons). On 4/17/08 6:00 PM, Karl Wettin [EMAIL PROTECTED] wrote: Is it possible to execute a job more than once?

Re: Reusing jobs

2008-04-17 Thread Karl Wettin
Ted Dunning skrev: Hadoop has enormous startup costs that are relatively inherent in the current design. Most notably, mappers and reducers are executed in a standalone JVM (ostensibly for safety reasons). Is it possible to hack in support to reuse JVMs? Keep it alive until timed out and

Re: Reusing jobs

2008-04-17 Thread Spiros Papadimitriou
Hi -- Not really sure that JVM startup is the main overhead -- you could take a look at the logfiles of the individual TIPs and compare the timestamp of the first log message to the time the jobtracker reports that TIP was started. In my experience, that is well under a second (once the cluster