Hello,
I'm writing a program which will finish lucene searching in
about 12 index directorys, all of them are stored in HDFS. It is done
like this:
1. We get about 12 index Directorys through lucene index
functionality, each of which about 100M size,
2. We store these 12 index directory
There is no specific procedure for configuring virtual machine slaves.
make sure the following thing are done.
1.All machine's(both VM's and physical machines) public key are distributed
to all "~/.ssh/authorized_keys" file.
2.conf/hadoop-site.xml file is similar for all the machi
All deleted or not a file deleted at all, that depending on how fast you
press the ctrl-c. The delete command is not executed in your terminal,
instead the rmr command is sent to the hadoop namenode and is executed
there.
On 09-3-12 上午10:48, "bzheng" wrote:
>
> I did a ctrl-c immediately aft
http://wiki.apache.org/hadoop/HadoopDfsReadWriteExample
Loht
- Original Message
From: Amandeep Khurana
To: core-user@hadoop.apache.org
Sent: Wednesday, March 11, 2009 9:46:09 PM
Subject: Re: How to read output files over HDFS
2 ways that I can think of:
1. Write another MR job witho
Hi,
I have a question about how to decide the block size.
As I understanding, the block size is related to namenode's heap size(how
many blocks can be handled),
total storage capacity of clusters, the files size (depends on applications,
e.g. 1T log file), #of replicas,
and the performance of mapr
2 ways that I can think of:
1. Write another MR job without a reducer. The mapper can be made to do
whatever logic you want to do.
OR
2. Take an instance of DistributedFileSystem class in your java code and use
it to read the file from HDFS.
Amandeep Khurana
Computer Science Graduate Student
Un
Hi,
I am running multiple MapReduce jobs which generate their output in directories
named output0, output1, output2, ...etc. Once these jobs complete i want to
read the output stored in these files(line by line) using a Java code
automatically.
Kindly tell me how i can do this.
I do not want
When you issue -rmr with directory, namenode get a directory name and starts
deleting files recursively. It adds the blocks belonging to files to invalidate
list. NameNode then deletes those blocks lazily. So, yes it will issue command
to datanodes to delete those blocks, just give it some time
I did a ctrl-c immediately after issuing a hadoop dfs -rmr command. The rmr
target is no longer visible from the dfs -ls command. The number of files
deleted is huge and I don't think it can possibly delete them all between
the time the command is issued and ctrl-c. Does this mean it leaves beh
FYI: We temporarily lost couple of blocks in 0.18.3 due to
https://issues.apache.org/jira/browse/HADOOP-5465
Fix should be coming soon (to 0.19 as well).
Koji
-Original Message-
From: Nathan Marz [mailto:nat...@rapleaf.com]
Sent: Wednesday, March 11, 2009 3:37 PM
To: core-user@hadoo
Are there any known data loss problems remaining in Hadoop 0.19.1?
Thanks,
Nathan Marz
I found the solution here :
http://pero.blogs.aprilmayjune.org/2009/01/22/hadoop-and-linux-kernel-2627-epoll-limits/
J-D
On Fri, Mar 6, 2009 at 6:08 PM, Jean-Daniel Cryans wrote:
> I know this one may be weird, but I'll give it a try. Thanks to anyone
> reading this through.
>
> Setup : hadoop-0
Hey there, we're trying to decide where to host our next public
training session, so I'd like to simply ask - where is it needed? Use
this form or just drop me a note:
http://spreadsheets.google.com/viewform?formkey=cHZfNzNoLUlkU0dJY0VhUVUwVlpnUUE6MA
We'll do this over two days, with one day being
Shixing,
Discussion on
https://issues.apache.org/jira/browse/HADOOP-5059
may be related.
Koji
-Original Message-
From: shixing [mailto:paradise...@gmail.com]
Sent: Wednesday, March 11, 2009 1:31 AM
To: core-user@hadoop.apache.org
Subject: streaming error when submit the job:Cannot r
Here are exact numbers:
# of (k,v) pairs = 1.2 Mil this is same.
# of unique k = 1000, k is integer.
# of unique v = 1Mil, v is a big big string.
For a given k, cumulative size of all v associated to it is about 30 Mb.
(That is each v is about 25~30Kb)
# of Mappers = 30
# of Reducers = 10
(v,k)
On Tue, 2009-03-10 at 19:44 -0700, Gyanit wrote:
> I have large number of key,value pairs. I don't actually care if data goes in
> value or key. Let me be more exact.
> (k,v) pair after combiner is about 1 mil. I have approx 1kb data for each
> pair. I can put it in keys or values.
> I have experi
Raghu Angadi wrote:
Amandeep Khurana wrote:
My dfs.datanode.socket.write.timeout is set to 0. This had to be done
to get
Hbase to work.
ah.. I see, we should fix that. Not sure how others haven't seen it till
now. Affects only those with write.timeout set to 0 on the clients.
filed : https
Haha, good to know I might be a guinea pig!
-Original Message-
From: Kris Jirapinyo [mailto:kris.jirapi...@biz360.com]
Sent: Wednesday, March 11, 2009 15:59
To: core-user@hadoop.apache.org
Subject: Re: Persistent HDFS On EC2
That was also the starting point for my experiment (Tom White's
Well if the smaller keys are producing fewer unique values, there should be
some more significant differences.
I had assumed that your test produced the same number of unique values.
I'm still not sure why there would be that significant of a difference as long
as the total number of unique val
That was also the starting point for my experiment (Tom White's article).
Note that the most painful part about this setup is probably writing and
testing the scripts that will enable this to happen (and also customizing
your EC2 images). It would be interesting to see someone else try it.
On We
I notices one more thing. Lighter keys tend to make smaller number of unique
keys.
For example (key,value) pairs may be 10Mil, but if key is lighter unique
keys might be just 1000.
In other case if keys are heavier unique keys might be 5 mil.
I think this might have something to do with it.
Botto
I notices one more thing. Lighter keys tend to make smaller number of unique
keys.
For example (key,value) pairs may be 10Mil, but if key is lighter unique
keys might be just 1000.
In other case if keys are heavier unique keys might be 5 mil.
I think this might have something to do with it.
Botto
That is a fascinating question. I would also love to know the reason behind
this.
If I were to guess I would have thought that smaller keys and heavier values
would slightly outperform, rather than significantly underperform. (assuming
total pair count at each phase is the same). Perhaps th
Amandeep Khurana wrote:
What happens if you set it to 0? How is it a workaround?
HBase needs it in pre-19.0 (related story :
http://www.nabble.com/Datanode-Xceivers-td21372227.html). It should not
matter if you move to 0.19.0 or newer.
And how would it
matter if I change is to a large valu
Konstantin Shvachko wrote:
The port was not specified at all in the original configuration.
Since 0.18, the port is optional. If no port is specified, then 8020 is
used. 8020 is the default port for namenodes.
https://issues.apache.org/jira/browse/HADOOP-3317
Doug
Mayuran,
It takes very long for a lot of iterations if we have to go through each
debugging step, one at a time. May be a jira is a good place.
- Run fsck with blocks option.
- Check if those ids match with ids in file names found by 'find'.
- Check which directory are these files in.. and v
Tom White wrote a great blog post about some options here:
http://www.lexemetech.com/2008/08/elastic-hadoop-clusters-with-amazons.html
plus an Amazon article:
http://developer.amazonwebservices.com/connect/entry.jspa?externalID=873&categoryID=112
Regards,
- Adam
Kris Jirapinyo wrote:
Why w
What happens if you set it to 0? How is it a workaround? And how would it
matter if I change is to a large value?
Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz
On Wed, Mar 11, 2009 at 12:00 PM, Raghu Angadi wrote:
> Amandeep Khurana wrote:
>
>> My dfs.
Amandeep Khurana wrote:
My dfs.datanode.socket.write.timeout is set to 0. This had to be done to get
Hbase to work.
ah.. I see, we should fix that. Not sure how others haven't seen it till
now. Affects only those with write.timeout set to 0 on the clients.
Since setting it to 0 itself is a w
This is not about the default port.
The port was not specified at all in the original configuration.
--Konstantin
Doug Cutting wrote:
Konstantin Shvachko wrote:
Clarifying: port # is missing in your configuration, should be
fs.default.name
hdfs://hvcwydev0601:8020
where 8020 is your por
Mayuran Yogarajah wrote:
Raghu Angadi wrote:
The block files usually don't disappear easily. Check on the datanode if
you find any files starting with "blk". Also check datanode log to see
what happened there... may be use started on a different directory or
something like that.
Raghu.
I have the following class definition:
public class Ase2DbMapRed extends MapReduceBase implements
TableMap, Tool {
I am also implementing the close() method extended from MapReduceBase
Is it possible to know (and how?) within the "public class close()..."
method whether this parti
My dfs.datanode.socket.write.timeout is set to 0. This had to be done to get
Hbase to work.
Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz
On Wed, Mar 11, 2009 at 10:23 AM, Raghu Angadi wrote:
>
> Did you change dfs.datanode.socket.write.timeout to 5 se
Doug Cutting wrote:
Konstantin Shvachko wrote:
Clarifying: port # is missing in your configuration, should be
fs.default.name
hdfs://hvcwydev0601:8020
where 8020 is your port number.
That's the work-around, but it's a bug. One should not need to specify
the default port number (8020).
Konstantin Shvachko wrote:
Clarifying: port # is missing in your configuration, should be
fs.default.name
hdfs://hvcwydev0601:8020
where 8020 is your port number.
That's the work-around, but it's a bug. One should not need to specify
the default port number (8020). Please file an issu
Clarifying: port # is missing in your configuration, should be
fs.default.name
hdfs://hvcwydev0601:8020
where 8020 is your port number.
--Konstantin
Hairong Kuang wrote:
Please try using the port number 8020.
Hairong
On 3/11/09 9:42 AM, "Stuart White" wrote:
I've been running hado
Did you change dfs.datanode.socket.write.timeout to 5 seconds? The
exception message says so. It is extremely small.
The default is 8 minutes and is intentionally pretty high. Its purpose
is mainly to catch extremely unresponsive datanodes and other network
issues.
Raghu.
Amandeep Khurana
Hi!
I have a question about fine-tunining hadoop performance on 8-core machines.
I have 2 machines I am testing. One is 8-core Xeon and another is 8-core
Opteron. 16Gb RAM each. They both run mapreduce and dfs nodes. Currently
I've set up each of them to run 32 map and 8 reduce tasks.
Also, HADOOP
Please try using the port number 8020.
Hairong
On 3/11/09 9:42 AM, "Stuart White" wrote:
> I've been running hadoop-0.19.0 for several weeks successfully.
>
> Today, for the first time, I tried to run the balancer, and I'm receiving:
>
> java.lang.RuntimeException: Not a host:port pair: hvcwy
I've been running hadoop-0.19.0 for several weeks successfully.
Today, for the first time, I tried to run the balancer, and I'm receiving:
java.lang.RuntimeException: Not a host:port pair: hvcwydev0601
In my hadoop-site.xml, I have this:
fs.default.name
hdfs://hvcwydev0601/
What do I nee
Sandy,
Correct me if I'm wrong but, if you have only two cores and you are running
your jobs in pseudo distributed mode, what is the point of having more than
2 mappers/reducers? Any number larger than 2 would make the mapper/reducer
threads serialize. That serialization would certainly be an over
Finally remembered, we had saxon 6.5.5 in the class path, and the jetty
error was
09/03/11 08:23:20 WARN xml.XmlParser: EXCEPTION
javax.xml.parsers.ParserConfigurationException: AElfred parser is
non-validating
On Wed, Mar 11, 2009 at 8:01 AM, jason hadoop wrote:
> I am having trouble reproducing
I am having trouble reproducing this one. It happened in a very specific
environment that pulled in an alternate sax parser.
The bottom line is that jetty expects a parser with particular capabilities
and if it doesn't get one, odd things happen.
In a day or so I will have hopefully worked out th
Why would you lose the locality of storage-per-machine if one EBS volume is
mounted to each machine instance? When that machine goes down, you can just
restart the instance and re-mount the exact same volume. I've tried this
idea before successfully on a 10 node cluster on EC2, and didn't see any
I am estimating that all of the data I will need to run the job will be
~2 terabytes. Is that too large a data set to be copying from S3 every
startup?
-Original Message-
From: Steve Loughran [mailto:ste...@apache.org]
Sent: Wednesday, March 11, 2009 9:39
To: core-user@hadoop.apache.org
Malcolm Matalka wrote:
If this is not the correct place to ask Hadoop + EC2 questions please
let me know.
I am trying to get a handle on how to use Hadoop on EC2 before
committing any money to it. My question is, how do I maintain a
persistent HDFS between restarts of instances. Most of th
If this is not the correct place to ask Hadoop + EC2 questions please
let me know.
I am trying to get a handle on how to use Hadoop on EC2 before
committing any money to it. My question is, how do I maintain a
persistent HDFS between restarts of instances. Most of the tutorials I
have found i
In hadoop cluster management ,im trying to replace the physical machine
slaves to virtual machine slaves.whether there is any change in procedure.
Also the pseudo distribution was successful in all virtual machines .
--
View this message in context:
http://www.nabble.com/using-virtual-slave-mac
jason hadoop wrote:
The other goofy thing is that the xml parser that is commonly first in the
class path, validates xml in a way that is opposite to what jetty wants.
What does ant -diagnostics say? It will list the XML parser at work
This line in the preamble before theClusterMapReduceTes
09/03/11 15:43:55 ERROR streaming.StreamJob: Error Launching job :
java.io.IOException: Cannot run program "chmod": java.io.IOException:
error=12, Cannot allocate memory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:459)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:149)
Till 0.18.x, files are not added to client-side classpath. Use 0.19,
and run following command to use custom input format
bin/hadoop jar contrib/streaming/hadoop-0.19.0-streaming.jar -mapper
mapper.pl -reducer org.apache.hadoop.mapred.lib.IdentityReducer -input
test.data -output test-output -fi
51 matches
Mail list logo