Re: anyone know how to unsubscribe this mailist

2012-05-17 Thread Andrew Newman
I've been trying to for two years - as it's an account that forwards on but I don't have control over. If you find out can you let me know. On 16/05/2012, at 11:05 PM, Yue Guan gua...@husky.neu.edu wrote: It seems the instruction on the web pages doesn't work.Thank you for your help.

Speculative execution side effects of files created directly in HDFS

2012-05-17 Thread Abhay Ratnaparkhi
I have multiple reducers running simmultaneously. Each reducer is supposed to output data in different file. I'm creating a file on HDFS using fs.create() command in each reducer. Will speculative execution of tasks affects the output as I'm not using any outputFormat provided? ~Abhay

Re: Speculative execution side effects of files created directly in HDFS

2012-05-17 Thread Harsh J
Yes speculative execution will affect your tasks, please read the FAQ to understand the use of OutputCommitters: http://wiki.apache.org/hadoop/FAQ#Can_I_write_create.2BAC8-write-to_hdfs_files_directly_from_map.2BAC8-reduce_tasks.3F On Thu, May 17, 2012 at 2:02 PM, Abhay Ratnaparkhi

custom FileInputFormat class

2012-05-17 Thread John Hancock
All, Can anyone on the list point me in the right direction as to how to write my own FileInputFormat class? Perhaps this is not even the way I should go, but my goal is to write a MapReduce job that gets its input from a binary file of integers and longs. -John

Re: custom FileInputFormat class

2012-05-17 Thread Harsh J
Hello John, I covered two resources you can use to read up on these custom extensions previously at http://search-hadoop.com/m/98TH8MPsTK. Hope this helps you get started. Let us know if you have specific issues/questions once you do :) On Thu, May 17, 2012 at 3:40 PM, John Hancock

RE: custom FileInputFormat class

2012-05-17 Thread Devaraj k
Hi John, You can extend FileInputFormat(or implement InputFormat) and then you need to implement below methods. 1. InputSplit[] getSplits(JobConf job, int numSplits) : For splitting the input files logically for the job. If FileInputFormat.getSplits(JobConf job, int numSplits) suits for

problem setting up multi-user cluster using locally-mounted shared filesystem

2012-05-17 Thread Luca Pireddu
Hello all, we're trying to set up a multi-user MapReduce cluster that doesn't use HDFS. The idea is to use a central, shared JobTracker to which we add or remove task trackers as needed---a sort of in-house elastic MapReduce. Following the cluster setup documentation

problem setting up multi-user cluster using locally-mounted shared filesystem

2012-05-17 Thread Luca Pireddu
SorryI accidentally hit the send button on the last email. Hello all, we're trying to set up a multi-user MapReduce cluster that doesn't use HDFS. The idea is to use a central, shared JobTracker to which we add or remove task trackers as needed---a sort of in-house elastic MapReduce.

Unable to work with Hadoop 1.0.1 using eclipse-indigo

2012-05-17 Thread Ravi Joshi
Hi, i recently downloaded and successfully installed hadoop-1.0.1 in my ubuntu 10.04 LTS. I have hadoop-1.0.1.tar.gz downloaded and now i want to design map-reduce application. As suggested by some blogs, first we should install eclipse plugin for hadoop, which is located inside

Re: Unable to work with Hadoop 1.0.1 using eclipse-indigo

2012-05-17 Thread Ravi Joshi
Hi Jagat, Thank you so much for answering the question. Can you please tell me all the names with location of jar files, which must be added in the project? I am using hadoop-1.0.1 in Eclipse Indigo on Ubuntu10.04 LTS. Thank you. -Ravi Joshi --- On Thu, 17/5/12, Jagat jagatsi...@gmail.com

Re: Unable to work with Hadoop 1.0.1 using eclipse-indigo

2012-05-17 Thread Ravi Joshi
Hi, i added hadoop-core-1.0.1.jar in the project class path. i am testing wordcount (http://hadoop.apache.org/common/docs/current/mapred_tutorial.html#Example%3A+WordCount+v1.0) but when i try to run my WordCount.java in eclipse, it shows the following errors- Exception in thread main

Re: Unable to work with Hadoop 1.0.1 using eclipse-indigo

2012-05-17 Thread Ravi Joshi
Now i added all the jar files which came up with hadoop-1.0.1.tar.gz package. But some new errors are showing. This time i am following wordCount v2 (http://hadoop.apache.org/common/docs/current/mapred_tutorial.html#Example%3a+WordCount+v2.0). Following is the error. 12/05/17 20:31:43 WARN

Re: Unable to work with Hadoop 1.0.1 using eclipse-indigo

2012-05-17 Thread Jagat
Can you check why its saying input path does not exist: file:/home/hduser/Desktop/Eclipse_Workspace/K-Means Algorithm/~/Desktop/input/doc

Re: Unable to work with Hadoop 1.0.1 using eclipse-indigo

2012-05-17 Thread Harsh J
Ravi, ~ in ~/Desktop/input/doc isn't resolvable by the code AFAIK. A shell usually resolves that, and you seem to be running it from Eclipse (which hence, wouldn't resolve it). So rather provide an absolute path as input arguments. On Thu, May 17, 2012 at 8:37 PM, Ravi Joshi

Re: Unable to work with Hadoop 1.0.1 using eclipse-indigo

2012-05-17 Thread Ravi Joshi
Hi Jagat, I managed everything, now program is working. in the argument initially i was writing ~/Desktop/input/doc ~/Desktop/output that was giving error(don't know why!!) after that i changed it little bit ./Input/doc ./output (and i moved input, output directories inside project root

Re: problem setting up multi-user cluster using locally-mounted shared filesystem

2012-05-17 Thread Ranjith
In the error messages below it points to a permissions related issue. Did you try changing the permissions as it was listed in the error message below. ERROR security.UserGroupInformation: PriviledgedActionException as:sbsuser cause:java.io.IOException: The ownership/permissions on the

Hadoop-on-demand and torque

2012-05-17 Thread Merto Mertek
If I understand it right HOD is mentioned mainly for merging existing HPC clusters with hadoop and for testing purposes.. I cannot find what is the role of Torque here (just initial nodes allocation?) and which is the default scheduler of HOD ? Probably the scheduler from the hadoop

is hadoop suitable for us?

2012-05-17 Thread Pierre Antoine Du Bois De Naurois
Hello, We have about 50 VMs and we want to distribute processing across them. However these VMs share a huge data storage system and thus their virtual HDD are all located in the same computer. Would Hadoop be useful for such configuration? Could we use hadoop without HDFS? so that we can

Re: is hadoop suitable for us?

2012-05-17 Thread Mathias Herberts
Hadoop does not perform well with shared storage and vms. The question should be asked first regarding what you're trying to achieve, not about your infra. On May 17, 2012 10:39 PM, Pierre Antoine Du Bois De Naurois pad...@gmail.com wrote: Hello, We have about 50 VMs and we want to

dfs.replication factor for MR jobs

2012-05-17 Thread Aishwarya Venkataraman
Hello, I have a 4-node cluster. One namenode and 3 other datanodes. I want to explicitly set the dfs.replication factor to 1 inorder to run some experiments. I tried setting this via the hdfs-site.xml file and via the command line as well (hadoop dfs -setrep -R -w 1 /). But I have a feeling that

Re: dfs.replication factor for MR jobs

2012-05-17 Thread Abhishek Pratap Singh
Hi Aishwarya, Temporary output of mapper is used for reducer. And number of Reduce jobs are based on the output keys of Mapper. It has nothing to do with replication factor. It is writing to three nodes because at least three keys has been generated from mapper and assigned reducer to three

Re: dfs.replication factor for MR jobs

2012-05-17 Thread Aishwarya Venkataraman
Apologies this works now if I set the dfs.replication=1 when I launch the job i.e. hadoop jar foo.jar com.foo -D dfs.replication=1 input output On Thu, May 17, 2012 at 2:06 PM, Aishwarya Venkataraman avenk...@cs.ucsd.edu wrote: Hello, I have a 4-node cluster. One namenode and 3 other

Re: is hadoop suitable for us?

2012-05-17 Thread Abhishek Pratap Singh
Hi, For your question if HADOOP can be used without HDFS, the answer is Yes. Hadoop can be used with any kind of distributed file system. But I m not able to understand the problem statement clearly to advice my point of view. Are you processing text file and saving in distributed database??

Re: dfs.replication factor for MR jobs

2012-05-17 Thread Aishwarya Venkataraman
The MR job that Im running has zero reducers (Sorry I should have mentioned this earlier). Its a mapper only job. Thanks, On Thu, May 17, 2012 at 2:31 PM, Abhishek Pratap Singh manu.i...@gmail.com wrote: Hi Aishwarya, Temporary output of mapper is used for reducer. And number of Reduce jobs

Re: is hadoop suitable for us?

2012-05-17 Thread Pierre Antoine Du Bois De Naurois
We have large amount of text files that we want to process and index (plus applying other algorithms). The problem is that our configuration is share-everything while hadoop has a share-nothing configuration. We have 50 VMs and not actual servers, and these share a huge central storage. So using

RE: is hadoop suitable for us?

2012-05-17 Thread Sagar Shukla
Hi PA, Thanks for the detailed explanation of your environment. Based on some of my experiences with Hadoop so far, following is my recommendation: If you plan to process huge documents regularly and generate the index of the metadata, then hadoop is the way to do. I am not sure about

Re: is hadoop suitable for us?

2012-05-17 Thread Michael Segel
The short answer is yes. The longer answer is that you will have to account for the latencies. There is more but you get the idea.. Sent from my iPhone On May 17, 2012, at 5:33 PM, Pierre Antoine Du Bois De Naurois pad...@gmail.com wrote: We have large amount of text files that we want to

Re: is hadoop suitable for us?

2012-05-17 Thread Pierre Antoine Du Bois De Naurois
Thanks Sagar, Mathias and Michael for your replies. It seems we will have to go with hadoop even if I/O will be slow due to our configuration. I will try to update on how it worked for our case. Best, PA 2012/5/17 Michael Segel michael_se...@hotmail.com The short answer is yes. The longer

suppressing INFO messages to console

2012-05-17 Thread Jonathan Bishop
Hi, Anyone know an easy change to suppress the INFO messages from various logging classes. I assume you need to edit conf/log4j.properties, but everything I tried seems to not work. For instance, to suppress messages like... 12/05/17 17:24:47 INFO mapred.JobClient: Map output records=0 I

RE: is hadoop suitable for us?

2012-05-17 Thread Sagar Shukla
Hi PA, In my environment, we had a SAN storage and I/O was pretty good. So if you have similar environment then I don't see any performance issues. Just out of curiosity - what amount of data are you looking forward to process ? Regards, Sagar -Original Message- From: Pierre

Re: Why this problem is not solved yet ?

2012-05-17 Thread Ravi Prakash
Ravishankar, If you run $ jps, do you see a TaskTracker process running? Can you please post the tasktracker logs as well? On Thu, May 17, 2012 at 8:49 PM, Ravishankar Nair ravishankar.n...@gmail.com wrote: Dear experts, Today is my tenth day working with Hadoop on installing on my windows

Re: is hadoop suitable for us?

2012-05-17 Thread Pierre Antoine DuBoDeNa
You used HDFS too? or storing everything on SAN immediately? I don't have number of GB/TB (it might be about 2TB so not really that huge) but they are more than 100 million documents to be processed. In a single machine currently we can process about 200.000 docs/day (several parsing, indexing,