Re: Error in Cluster Startup: NameNode is not formatted

2009-06-26 Thread Amandeep Khurana
your metadata and that could be causing the system to not come up. Specify that parameter in the config file. Amandeep Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz On Fri, Jun 26, 2009 at 2:33 PM, Boyu Zhang boyuzhan...@gmail.com wrote: Matt

Hadoop0.20 - Class Not Found exception

2009-06-26 Thread Amandeep Khurana
with 0.19.. Not sure what could have changed or I broke to cause this error... Amandeep Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz

Re: FYI, Large-scale graph computing at Google

2009-06-25 Thread Amandeep Khurana
I've been working on some graph stuff using MR as well. I'd be more than interested to chip in as well.. I remember exchanging a few mails with Paolo about having an RDF store over HBase and developing graph algorithms over it. Amandeep Khurana Computer Science Graduate Student University

map.input.file in hadoop0.20

2009-06-25 Thread Amandeep Khurana
How do I read the map.input.file parameter in the mapper class in hadoop 0.20. In earlier versions, this would work: public void configure(JobConf conf) { filename = conf.get(map.input.file); } What about 0.20? Amandeep Amandeep Khurana Computer Science Graduate Student

FairScheduler class not found

2009-06-05 Thread Amandeep Khurana
) at java.lang.Class.forName(Class.java:247) at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:673) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:718) ... 4 more Where do I look to solve this? Amandeep Amandeep Khurana Computer Science Graduate

HBase v0.19.3 with Hadoop v0.19.1?

2009-06-04 Thread Amandeep Khurana
I have a couple of questions: 1. Is Hbase 0.19.3 release stable for a production cluster? 2. Can it be deployed over Hadoop v0.19.1? ..amandeep Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz

Global object for a map task

2009-05-03 Thread Amandeep Khurana
a null pointer exception. Where else can this be created? Amandeep Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz

Re: Global object for a map task

2009-05-03 Thread Amandeep Khurana
Thanks Jason. My object is relatively small. But how do I pass it via the JobConf object? Can you elaborate a bit... Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz On Sat, May 2, 2009 at 11:53 PM, jason hadoop jason.had...@gmail.comwrote

Re: Multiple k,v pairs from a single map - possible?

2009-04-02 Thread Amandeep Khurana
Here's the JIRA for the Oracle fix. https://issues.apache.org/jira/browse/HADOOP-5616 Amandeep Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz On Fri, Mar 27, 2009 at 5:18 AM, Brian MacKay brian.mac...@medecision.comwrote: Amandeep, Add

Re: Please help

2009-03-31 Thread Amandeep Khurana
Have you read the Map Reduce paper? You might be able to find some pointers there for your analysis. Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz On Tue, Mar 31, 2009 at 4:28 PM, Hadooper kusanagiyang.had...@gmail.comwrote: Dear developers

Multiple k,v pairs from a single map - possible?

2009-03-27 Thread Amandeep Khurana
Is it possible to output multiple key value pairs from a single map function run? For example, the mapper outputing name,phone and name, address simultaneously... Can I write multiple output.collect(...) commands? Amandeep Amandeep Khurana Computer Science Graduate Student University

Reducer handing at 66%

2009-03-27 Thread Amandeep Khurana
I have a MR job running on approximately 15 lines of data in a text file. The reducer hangs at 66% and at that moment the cpu usage is at 100%. After 600 seconds, it kills the job... What could be going wrong and where should I look for the problem? Amandeep Khurana Computer Science Graduate

Typical hardware configurations

2009-03-27 Thread Amandeep Khurana
on the master. This didnt work very well due to the RAM being a little low. I got some config details from the powered by page on the Hadoop wiki, but nothing like that for Hbase. Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz

RDF store over HDFS/HBase

2009-03-23 Thread Amandeep Khurana
Has anyone explored using HDFS/HBase as the underlying storage for an RDF store? Most solutions (all are single node) that I have found till now scale up only to a couple of billion rows in the Triple store. Wondering how Hadoop could be leveraged here... Amandeep Amandeep Khurana Computer

Re: RDF store over HDFS/HBase

2009-03-23 Thread Amandeep Khurana
. Amandeep Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz On Mon, Mar 23, 2009 at 4:17 PM, Ryan Rawson ryano...@gmail.com wrote: I would expect HBase would scale well - the semantics of the data being stored shouldn't matter, just the size. I think

Re: How to apply a patch to my hadoop?

2009-03-17 Thread Amandeep Khurana
cd $HADOOP_HOME patch -p0 /path/to/the/file.patch ant clean jar Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz On Tue, Mar 17, 2009 at 4:48 PM, Steve Gao steve@yahoo.com wrote: I want to apply this patch https://issues.apache.org/jira/browse

Re: hadoop migration

2009-03-16 Thread Amandeep Khurana
of the tools around it is more of a data processing system than a backend datastore for a website. The output of the processing that Hadoop does is typically taken into a MySQL cluster which feeds a website. Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz

Re: hadoop migration

2009-03-16 Thread Amandeep Khurana
AFAIK, Google uses BigTable for pretty much most of their backend stuff. The thing to note here is that BigTable is much more mature than Hbase. You can try it out and see how it works out for you. Do share your results on the mailing list... Amandeep Khurana Computer Science Graduate Student

Re: Error while putting data onto hdfs

2009-03-11 Thread Amandeep Khurana
My dfs.datanode.socket.write.timeout is set to 0. This had to be done to get Hbase to work. Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz On Wed, Mar 11, 2009 at 10:23 AM, Raghu Angadi rang...@yahoo-inc.comwrote: Did you change

Re: mapred.input.file returns null

2009-03-06 Thread Amandeep Khurana
How are you using it? Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz On Fri, Mar 6, 2009 at 11:18 PM, Richa Khandelwal richa...@gmail.comwrote: Hi All, I am trying to retrieve the names of files for each record that I am processing. Using

Re: mapred.input.file returns null

2009-03-06 Thread Amandeep Khurana
Khandelwal richa...@gmail.com wrote: Here's a snippet of my code: private static String inputFile; public void configure(JobConf job) { inputFile=job.get(map.input.file); System.out.println(File +inputFile); } On Fri, Mar 6, 2009 at 11:19 PM, Amandeep Khurana ama

Re: Importing data from mysql into hadoop

2009-03-04 Thread Amandeep Khurana
Put it into the $HADOOP_HOME/lib folder. To be on the safer side, I generally include it in the job jar. Dont forget to put Class.forName(driverClassName); in your job code. Amandeep Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz On Wed, Mar 4, 2009

Re: Connection problem during data import into hbase

2009-02-21 Thread Amandeep Khurana
I have 1 master + 2 slaves. Am using 0.19.0 for both Hadoop and Hbase. I didnt change any config from the default except the hbase.rootdir and the hbase.master. I have gone through the FAQs but couldnt find anything. What exactly are you pointing to? Amandeep Khurana Computer Science Graduate

Re: Connection problem during data import into hbase

2009-02-21 Thread Amandeep Khurana
dfs.datanode.socket.write.timeout property with value 0. I also defined the property in the job config. Amandeep Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz On Sat, Feb 21, 2009 at 1:23 AM, Amandeep Khurana ama...@gmail.com wrote: I have 1 master + 2 slaves. Am using 0.19.0

Re: Connection problem during data import into hbase

2009-02-21 Thread Amandeep Khurana
) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2160) Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz On Sat, Feb 21, 2009 at 3:17 AM, Amandeep Khurana ama...@gmail.com wrote: I changed the config and restarted the cluster

Connection problem during data import into hbase

2009-02-20 Thread Amandeep Khurana
21:37:14,407 INFO org.apache.hadoop.ipc.HBaseClass: Retrying connect to server: /171.69.102.52:60020. Already tried 0 time(s). What could be going wrong? Amandeep Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz

Re: Connection problem during data import into hbase

2009-02-20 Thread Amandeep Khurana
) attempt_200902201300_0019_m_06_0: at org.apache.hadoop.mapred.Child.main(Child.java:155) Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz On Fri, Feb 20, 2009 at 9:43 PM, Amandeep Khurana ama...@gmail.com wrote: I am trying to import data from a flat file into Hbase

Re: Connection problem during data import into hbase

2009-02-20 Thread Amandeep Khurana
/HBase.rb:444:in `count' from /hadoop/install/hbase/bin/../bin/hirb.rb:348:in `count' from (hbase):3:in `binding' Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz On Fri, Feb 20, 2009 at 9:46 PM, Amandeep Khurana ama...@gmail.com wrote: Here's what

Re: HADOOP-2536 supports Oracle too?

2009-02-18 Thread Amandeep Khurana
It should either be in the jar or in the lib folder in the Hadoop installation. If none of them work, check the jar that you are including. Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz On Wed, Feb 18, 2009 at 12:08 AM, sandhiya sandhiy...@gmail.com

Re: How do you remove a machine from the cluster? Slaves file not working...

2009-02-17 Thread Amandeep Khurana
You have to decommission the node. Look at http://wiki.apache.org/hadoop/FAQ#17 Amandeep Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz On Tue, Feb 17, 2009 at 2:14 PM, S D sd.codewarr...@gmail.com wrote: I have a Hadoop 0.19.0 cluster of 3 machines

Re: HDFS architecture based on GFS?

2009-02-16 Thread Amandeep Khurana
Ok. Thanks.. Another question now. Do the datanodes have any way of linking a particular block of data to a global file identifier? Amandeep Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz On Sun, Feb 15, 2009 at 9:37 PM, Matei Zaharia ma

Re: Can never restart HDFS after a day or two

2009-02-16 Thread Amandeep Khurana
Where are your namenode and datanode storing the data? By default, it goes into the /tmp directory. You might want to move that out of there. Amandeep Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz On Mon, Feb 16, 2009 at 8:11 PM, Mark Kerzner markkerz

HDFS architecture based on GFS?

2009-02-15 Thread Amandeep Khurana
Hi Is the HDFS architecture completely based on the Google Filesystem? If it isnt, what are the differences between the two? Secondly, is the coupling between Hadoop and HDFS same as how it is between the Google's version of Map Reduce and GFS? Amandeep Amandeep Khurana Computer Science

Re: HDFS architecture based on GFS?

2009-02-15 Thread Amandeep Khurana
but there is nothing that concrete for Hadoop that I have been able to find. Amandeep Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz On Sun, Feb 15, 2009 at 12:07 PM, Matei Zaharia ma...@cloudera.com wrote: Hi Amandeep, Hadoop is definitely inspired

Re: HDFS architecture based on GFS?

2009-02-15 Thread Amandeep Khurana
Thanks Matei. If the basic architecture is similar to the Google stuff, I can safely just work on the project using the information from the papers. I am aware of the 4487 jira and the current status of the permissions mechanism. I had a look at them earlier. Cheers Amandeep Amandeep Khurana

Re: HDFS architecture based on GFS?

2009-02-15 Thread Amandeep Khurana
A quick question here. How does a typical hadoop job work at the system level? What are the various interactions and how does the data flow? Amandeep Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz On Sun, Feb 15, 2009 at 3:20 PM, Amandeep Khurana ama

Re: HDFS architecture based on GFS?

2009-02-15 Thread Amandeep Khurana
This is good information! Thanks a ton. I'll take all this into account. Amandeep Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz On Sun, Feb 15, 2009 at 4:47 PM, Matei Zaharia ma...@cloudera.com wrote: Typically the data flow is like this:1) Client

Re: HDFS architecture based on GFS?

2009-02-15 Thread Amandeep Khurana
know beforehand that multiple files would be accessed. Right? I am slightly confused why you have mentioned this case separately... Can you elaborate on it a little bit? Amandeep Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz On Sun, Feb 15, 2009 at 4:47

Re: HDFS architecture based on GFS?

2009-02-15 Thread Amandeep Khurana
Another question that I have here - When the jobs run arbitrary code and access data from the HDFS, do they go to the namenode to get the block information? Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz On Sun, Feb 15, 2009 at 6:00 PM, Amandeep Khurana

Re: HDFS architecture based on GFS?

2009-02-15 Thread Amandeep Khurana
Ok. Got it. Now, when my job needs to access another file, does it go to the Namenode to get the block ids? How does the java process know where the files are and how to access them? Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz On Sun, Feb 15, 2009

Re: HDFS architecture based on GFS?

2009-02-15 Thread Amandeep Khurana
Alright.. Got it. Now, do the task trackers talk to the namenode and the data node directly or do they go through the job tracker for it? So, if my code is such that I need to access more files from the hdfs, would the job tracker get involved or not? Amandeep Khurana Computer Science

Re: HDFS architecture based on GFS?

2009-02-15 Thread Amandeep Khurana
it access the data? Would it ask the parent java process (in the tasktracker) to get the data or would it go and do stuff on its own? Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz On Sun, Feb 15, 2009 at 8:23 PM, Matei Zaharia ma...@cloudera.com wrote

Re: Running Map and Reduce Sequentially

2009-02-13 Thread Amandeep Khurana
Have only one instance of the reduce task. This will run once your map tasks are completed. You can set this in your job conf by using conf.setNumReducers(1) Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz 2009/2/13 Kris Jirapinyo kris.jirapi...@biz360

Re: Running Map and Reduce Sequentially

2009-02-13 Thread Amandeep Khurana
What you can probably do is have the combine function do some reducing before the single reducer starts off. That might help. Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz 2009/2/13 Kris Jirapinyo kris.jirapi...@biz360.com I can't afford to have only

Re: Running Map and Reduce Sequentially

2009-02-13 Thread Amandeep Khurana
lowering the memory allocated to the JVMs as well so that 4 tasks can run. I dont know if you want to do that or not. Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz 2009/2/13 Kris Jirapinyo kris.jirapi...@biz360.com Thanks for the recommendation, haven't

Re: Backing up HDFS?

2009-02-09 Thread Amandeep Khurana
Why would you want to have another backup beyond HDFS? HDFS itself replicates your data so if the reliability of the system shouldnt be a concern (if at all it is)... Amandeep Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz On Mon, Feb 9, 2009 at 4:17

Re: Re: Re: Re: Regarding Hadoop multi cluster set-up

2009-02-07 Thread Amandeep Khurana
it work. Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz On Sat, Feb 7, 2009 at 9:06 AM, jason hadoop jason.had...@gmail.com wrote: On your master machine, use the netstat command to determine what ports and addresses the namenode process is listening

Hadoop job using multiple input files

2009-02-06 Thread Amandeep Khurana
this? Amandeep Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz

Re: Hadoop job using multiple input files

2009-02-06 Thread Amandeep Khurana
as (name,number) and the other outputing the value as (number, address) into the reducer? Not clear what I'll be doing with the map.intput.file here... Amandeep Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz On Fri, Feb 6, 2009 at 1:55 AM, Jeff Hammerbacher

Re: Re: Re: Regarding Hadoop multi cluster set-up

2009-02-06 Thread Amandeep Khurana
I had to change the master on my running cluster and ended up with the same problem. Were you able to fix it at your end? Amandeep Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz On Thu, Feb 5, 2009 at 8:46 AM, shefali pawar shefal

Re: Hadoop job using multiple input files

2009-02-06 Thread Amandeep Khurana
Ok. Got it. Now, how would my reducer know whether the name is coming first or the address? Is it going to be in the same order in the iterator as the files are read (alphabetically) in the mapper? Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz On Fri

Heap size error

2009-02-06 Thread Amandeep Khurana
) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:430) at org.apache.hadoop.mapred.Child.main(Child.java:155) Any inputs? Amandeep Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz

Re: Hadoop job using multiple input files

2009-02-06 Thread Amandeep Khurana
is not consistent. How can I get this to be consistent? Amandeep Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz On Fri, Feb 6, 2009 at 2:58 PM, Amandeep Khurana ama...@gmail.com wrote: Ok. Got it. Now, how would my reducer know whether the name is coming first

Re: HADOOP-2536 supports Oracle too?

2009-02-04 Thread Amandeep Khurana
Ok. I'm not sure if I got it correct. Are you saying, I should test the statement that hadoop creates directly with the database? Amandeep Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz On Wed, Feb 4, 2009 at 7:13 AM, Enis Soztutar enis@gmail.com

Re: HADOOP-2536 supports Oracle too?

2009-02-04 Thread Amandeep Khurana
Khurana Computer Science Graduate Student University of California, Santa Cruz On Wed, Feb 4, 2009 at 10:26 AM, Amandeep Khurana ama...@gmail.com wrote: Ok. I'm not sure if I got it correct. Are you saying, I should test the statement that hadoop creates directly with the database? Amandeep

Re: Bad connection to FS.

2009-02-04 Thread Amandeep Khurana
I faced the same issue a few days back. Formatting the namenode made it work for me. Amandeep Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz On Wed, Feb 4, 2009 at 3:06 PM, Mithila Nagendra mnage...@asu.edu wrote: Hey all When I try to copy

Unable to pull data from DB in Hadoop job

2009-02-03 Thread Amandeep Khurana
pointers on this? Thanks Amandeep Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz

How to use DBInputFormat?

2009-02-03 Thread Amandeep Khurana
it (an email sent earlier). Can anyone give me some inputs on this please? Thanks -Amandeep Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz

HADOOP-2536 supports Oracle too?

2009-02-03 Thread Amandeep Khurana
Does the patch HADOOP-2536 support connecting to Oracle databases as well? Or is it just limited to MySQL? Amandeep Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz

Re: How to use DBInputFormat?

2009-02-03 Thread Amandeep Khurana
Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz On Tue, Feb 3, 2009 at 6:51 PM, Kevin Peterson kpeter...@biz360.com wrote: On Tue, Feb 3, 2009 at 5:49 PM, Amandeep Khurana ama...@gmail.com wrote: In the setInput(...) function in DBInputFormat

Re: How to use DBInputFormat?

2009-02-03 Thread Amandeep Khurana
The same query is working if I write a simple JDBC client and query the database. So, I'm probably doing something wrong in the connection settings. But the error looks to be on the query side more than the connection side. Amandeep Amandeep Khurana Computer Science Graduate Student University

Re: Setting up cluster

2009-02-01 Thread Amandeep Khurana
Oh ok.. I am not very familiar with the intricate details yet. But I got what you are saying. I'll look into these things and try and figure out where the mismatch is happening. Thanks! Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz On Sun, Feb 1, 2009

Setting up version 0.19.0

2009-02-01 Thread Amandeep Khurana
) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:718) ... 11 more Does anyone have any idea about this? Amandeep Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz

Re: Setting up version 0.19.0

2009-02-01 Thread Amandeep Khurana
Aah ok. Got it to work. Thanks. Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz On Sun, Feb 1, 2009 at 10:56 PM, lohit lohit.vijayar...@yahoo.com wrote: from 0.19, package structure was changed. DistributedFileSystem is no longer

Setting up cluster

2009-01-30 Thread Amandeep Khurana
as the pseudodistributed setup on the master node and added the slaves to the list of slaves in the conf directory. Thereafter, I ran the start-dfs.sh and start-mapred.sh scripts. Am I missing something out? Amandeep Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz

Re: Setting up cluster

2009-01-30 Thread Amandeep Khurana
(Client.java:789) at org.apache.hadoop.ipc.Client.call(Client.java:704) ... 12 more What do I need to do for this? Amandeep Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz On Fri, Jan 30, 2009 at 2:49 PM, Amandeep Khurana ama...@gmail.com wrote

How to add nodes to existing cluster?

2009-01-30 Thread Amandeep Khurana
I am trying to add nodes to an existing working cluster. Do I need to bring the entire cluster down or just shutting down and restarting the namenode after adding the new machine list to the slaves would work? Amandeep

Re: How to add nodes to existing cluster?

2009-01-30 Thread Amandeep Khurana
Thanks Lohit On Fri, Jan 30, 2009 at 7:13 PM, lohit lohit.vijayar...@yahoo.com wrote: Just starting DataNode and TaskTracker would add it to cluster. http://wiki.apache.org/hadoop/FAQ#25 Lohit - Original Message From: Amandeep Khurana ama...@gmail.com To: core-user