I've been trying to for two years - as it's an account that forwards on but I
don't have control over. If you find out can you let me know.
On 16/05/2012, at 11:05 PM, Yue Guan gua...@husky.neu.edu wrote:
It seems the instruction on the web pages doesn't work.Thank you
for your help.
I have multiple reducers running simmultaneously. Each reducer is supposed
to output data in different file.
I'm creating a file on HDFS using fs.create() command in each reducer.
Will speculative execution of tasks affects the output as I'm not using any
outputFormat provided?
~Abhay
Yes speculative execution will affect your tasks, please read the FAQ
to understand the use of OutputCommitters:
http://wiki.apache.org/hadoop/FAQ#Can_I_write_create.2BAC8-write-to_hdfs_files_directly_from_map.2BAC8-reduce_tasks.3F
On Thu, May 17, 2012 at 2:02 PM, Abhay Ratnaparkhi
All,
Can anyone on the list point me in the right direction as to how to write
my own FileInputFormat class?
Perhaps this is not even the way I should go, but my goal is to write a
MapReduce job that gets its input from a binary file of integers and longs.
-John
Hello John,
I covered two resources you can use to read up on these custom
extensions previously at http://search-hadoop.com/m/98TH8MPsTK. Hope
this helps you get started. Let us know if you have specific
issues/questions once you do :)
On Thu, May 17, 2012 at 3:40 PM, John Hancock
Hi John,
You can extend FileInputFormat(or implement InputFormat) and then you need to
implement below methods.
1. InputSplit[] getSplits(JobConf job, int numSplits) : For splitting the
input files logically for the job. If FileInputFormat.getSplits(JobConf job,
int numSplits) suits for
Hello all,
we're trying to set up a multi-user MapReduce cluster that doesn't use
HDFS. The idea is to use a central, shared JobTracker to which we add
or remove task trackers as needed---a sort of in-house elastic MapReduce.
Following the cluster setup documentation
SorryI accidentally hit the send button on the last email.
Hello all,
we're trying to set up a multi-user MapReduce cluster that doesn't use
HDFS. The idea is to use a central, shared JobTracker to which we add
or remove task trackers as needed---a sort of in-house elastic MapReduce.
Hi, i recently downloaded and successfully installed hadoop-1.0.1 in my
ubuntu 10.04 LTS. I have hadoop-1.0.1.tar.gz downloaded and now i want
to design map-reduce application. As suggested by some blogs, first we
should install eclipse plugin for hadoop, which is located inside
Hi Jagat, Thank you so much for answering the question. Can you please tell me
all the names with location of jar files, which must be added in the project? I
am using hadoop-1.0.1 in Eclipse Indigo on Ubuntu10.04 LTS.
Thank you.
-Ravi Joshi
--- On Thu, 17/5/12, Jagat jagatsi...@gmail.com
Hi, i added hadoop-core-1.0.1.jar in the project class path. i am testing
wordcount
(http://hadoop.apache.org/common/docs/current/mapred_tutorial.html#Example%3A+WordCount+v1.0)
but when i try to run my WordCount.java in eclipse, it shows the following
errors-
Exception in thread main
Now i added all the jar files which came up with hadoop-1.0.1.tar.gz package.
But some new errors are showing. This time i am following wordCount v2
(http://hadoop.apache.org/common/docs/current/mapred_tutorial.html#Example%3a+WordCount+v2.0).
Following is the error.
12/05/17 20:31:43 WARN
Can you check why its saying
input path does not exist:
file:/home/hduser/Desktop/Eclipse_Workspace/K-Means
Algorithm/~/Desktop/input/doc
Ravi,
~ in ~/Desktop/input/doc isn't resolvable by the code AFAIK. A shell
usually resolves that, and you seem to be running it from Eclipse
(which hence, wouldn't resolve it). So rather provide an absolute path
as input arguments.
On Thu, May 17, 2012 at 8:37 PM, Ravi Joshi
Hi Jagat, I managed everything, now program is working. in the argument
initially i was writing ~/Desktop/input/doc ~/Desktop/output that was giving
error(don't know why!!) after that i changed it little bit ./Input/doc ./output
(and i moved input, output directories inside project root
In the error messages below it points to a permissions related issue. Did you
try changing the permissions as it was listed in the error message below.
ERROR security.UserGroupInformation: PriviledgedActionException as:sbsuser
cause:java.io.IOException: The ownership/permissions on the
If I understand it right HOD is mentioned mainly for merging existing HPC
clusters with hadoop and for testing purposes..
I cannot find what is the role of Torque here (just initial nodes
allocation?) and which is the default scheduler of HOD ? Probably the
scheduler from the hadoop
Hello,
We have about 50 VMs and we want to distribute processing across them.
However these VMs share a huge data storage system and thus their virtual
HDD are all located in the same computer. Would Hadoop be useful for such
configuration? Could we use hadoop without HDFS? so that we can
Hadoop does not perform well with shared storage and vms.
The question should be asked first regarding what you're trying to achieve,
not about your infra.
On May 17, 2012 10:39 PM, Pierre Antoine Du Bois De Naurois
pad...@gmail.com wrote:
Hello,
We have about 50 VMs and we want to
Hello,
I have a 4-node cluster. One namenode and 3 other datanodes. I want to
explicitly set the dfs.replication factor to 1 inorder to run some
experiments. I tried setting this via the hdfs-site.xml file and via
the command line as well (hadoop dfs -setrep -R -w 1 /). But I have a
feeling that
Hi Aishwarya,
Temporary output of mapper is used for reducer. And number of Reduce jobs
are based on the output keys of Mapper. It has nothing to do with
replication factor. It is writing to three nodes because at least three
keys has been generated from mapper and assigned reducer to three
Apologies this works now if I set the dfs.replication=1 when I launch
the job i.e.
hadoop jar foo.jar com.foo -D dfs.replication=1 input output
On Thu, May 17, 2012 at 2:06 PM, Aishwarya Venkataraman
avenk...@cs.ucsd.edu wrote:
Hello,
I have a 4-node cluster. One namenode and 3 other
Hi,
For your question if HADOOP can be used without HDFS, the answer is Yes.
Hadoop can be used with any kind of distributed file system.
But I m not able to understand the problem statement clearly to advice my
point of view.
Are you processing text file and saving in distributed database??
The MR job that Im running has zero reducers (Sorry I should have
mentioned this earlier). Its a mapper only job.
Thanks,
On Thu, May 17, 2012 at 2:31 PM, Abhishek Pratap Singh
manu.i...@gmail.com wrote:
Hi Aishwarya,
Temporary output of mapper is used for reducer. And number of Reduce jobs
We have large amount of text files that we want to process and index (plus
applying other algorithms).
The problem is that our configuration is share-everything while hadoop has
a share-nothing configuration.
We have 50 VMs and not actual servers, and these share a huge central
storage. So using
Hi PA,
Thanks for the detailed explanation of your environment.
Based on some of my experiences with Hadoop so far, following is my
recommendation:
If you plan to process huge documents regularly and generate the index of the
metadata, then hadoop is the way to do. I am not sure about
The short answer is yes.
The longer answer is that you will have to account for the latencies.
There is more but you get the idea..
Sent from my iPhone
On May 17, 2012, at 5:33 PM, Pierre Antoine Du Bois De Naurois
pad...@gmail.com wrote:
We have large amount of text files that we want to
Thanks Sagar, Mathias and Michael for your replies.
It seems we will have to go with hadoop even if I/O will be slow due to our
configuration.
I will try to update on how it worked for our case.
Best,
PA
2012/5/17 Michael Segel michael_se...@hotmail.com
The short answer is yes.
The longer
Hi,
Anyone know an easy change to suppress the INFO messages from various
logging classes. I assume you need to edit conf/log4j.properties, but
everything I tried seems to not work.
For instance, to suppress messages like...
12/05/17 17:24:47 INFO mapred.JobClient: Map output records=0
I
Hi PA,
In my environment, we had a SAN storage and I/O was pretty good. So if you
have similar environment then I don't see any performance issues.
Just out of curiosity - what amount of data are you looking forward to process ?
Regards,
Sagar
-Original Message-
From: Pierre
Ravishankar,
If you run $ jps, do you see a TaskTracker process running? Can you please
post the tasktracker logs as well?
On Thu, May 17, 2012 at 8:49 PM, Ravishankar Nair
ravishankar.n...@gmail.com wrote:
Dear experts,
Today is my tenth day working with Hadoop on installing on my windows
You used HDFS too? or storing everything on SAN immediately?
I don't have number of GB/TB (it might be about 2TB so not really that
huge) but they are more than 100 million documents to be processed. In a
single machine currently we can process about 200.000 docs/day (several
parsing, indexing,
32 matches
Mail list logo