Hello guys
I have a problem using the DistCp to transfer a large file from s3 to HDFS
cluster, whenever I tried to make the copy, I only saw processing work and
memory usage in one of the nodes, not in all of them, I don't know if this
is the proper behaviour of this or if it is a configuration
Hi,
my guess is that you run hadoop distcp on one of the datanodes... In that
case, the node will get the first replica of each block. But you should also
see copies on more nodes as well. But that one node will get a replica of all
the blocks.
Kai
Am 04.09.2012 um 22:07 schrieb Soulghost
Hello,
You could try this jar which I found link to from one of the amazon pages.
s3cmd get s3://eu-west-1.elasticmapreduce/libs/s3distcp/1.0.1/s3distcp.jar
s3dist.jar copies via mapreduce to s3 and back .
If you cluster has N number of reducers available, you can :
hadoop jar s3distcp.jar
Hello
I just wanna know why HBase doesn't provide Encryption ?
Tnx
Hi:
I wish to use Hadoop streaming to run a program which requires specific
PATH and CLASSPATH variables. I have set these two variables in both
/etc/profile and ~/.bashrc on all slaves (and restarted these slaves).
However, when I run the hadoop streaming job, the program generates error
Hi Andy,
Please specify the environment varibles in gedit .bashrc. you
can specify for the JAVA_HOME environment variables and configuration
hadoop-site.xml,hadoop-core.xml,and hadoop-default.xml files you can
specify which version hadoop you can use those things and then close
.bashrc
Hi Users,
Hadoop can distribute all the data into HDFS inside MapReduce
tasks can work together. which one is goes to which data node and how it
works all those things it can maintain each task has own JVM in each data
node. JVM can handle hell number of data to process to the all
Hi Andy,
If you are referring to HADOOP_CLASSPATH, that is env variable on your cluster
or effected via config xml.But if you need your own environment variables for
streaming you may use -cmdenv PATH= on your streaming command.Or if you have
specific jars for the streaming process -libjars
Hi Abhay,
NameNode it has address of the all data nodes. MapReduce can do
all the data is processing. First data set is putting into HDFS filesystem
and then run hadoop jar file. Map task can handle input files for shufle,
sorting and grouped together. Map task is completed and then
Hi Rekha,
What I said means first he has set install java and then
pwd(present working directory) for JAVA_HOME directory and then set to
classpath for the hadoop installation.
Thanks Regards,
Ramesh.Narasingu
On Tue, Sep 4, 2012 at 1:36 PM, Joshi, Rekha rekha_jo...@intuit.com wrote:
Hi Pat,
Please specify correct input file location.
Thanks Regards,
Ramesh.Narasingu
On Mon, Sep 3, 2012 at 9:28 PM, Pat Ferrel p...@occamsmachete.com wrote:
Using hadoop with mahout in a local filesystem/non-hdfs config for
debugging purposes inside Intellij IDEA. When I run one
Hi Dexter,
i think, what you want is a clustering of points based on the euclidian
distance or density based clustering (
http://en.wikipedia.org/wiki/Cluster_analysis ). I bet there are some
implemented quite well in Mahout already: afaik this is the datamining
framework based on Hadoop.
Hi
When I start my cluster (with start-dfs.sh), secondary namenodes are
created on all the machines in conf/slaves. I set conf/masters to a
single different machine (along with dfs.http.address pointing to the
nameserver) but seems to be ignored. any hint of what I'm doing wrong?
thanks
giovanni
Hi Andy,
Please try once other wise you can again download new hadoop
1.0.2 it is stable. you can set all environment variables in vi .bashrc or
gedt .bashrc you can set paths JAVA_HOME and HADOOP_HOME environment
variables and also set path.
Thanks Regards,
Ramesh.Narasingu
On Tue,
also getting one more error
*
org.apache.hadoop.ipc.RemoteException*: Server IPC version 5 cannot
communicate with client version 4
On Tue, Sep 4, 2012 at 2:44 PM, Visioner Sadak visioner.sa...@gmail.comwrote:
Thanks shobha tried adding conf folder to tomcats classpath still getting
same
Hi
You are running tomact on a windows machine and trying to connect to a remote
hadoop cluster from there. Your core site has
name
fs.default.name/name
valuehdfs://localhost:9000/value
But It is localhost here.( I assume you are not running hadoop on this windows
environment for some testing)
Which distro?
Saw this happen, way back when with a Cloudera release.
Check your config files too...
Sent from a remote device. Please excuse any typos...
Mike Segel
On Sep 4, 2012, at 3:22 AM, surfer sur...@crs4.it wrote:
Hi
When I start my cluster (with start-dfs.sh), secondary
Thanks bejoy, actually my hadoop is also on windows(i have installed it in
psuedo-distributed mode for testing) its not a remote cluster
On Tue, Sep 4, 2012 at 3:38 PM, Bejoy KS bejoy.had...@gmail.com wrote:
**
Hi
You are running tomact on a windows machine and trying to connect to a
Hi,
Following is a description of what I am trying to do and the steps I followed.
GOAL:
a). Install Hadoop
1.0.3
b). Hadoop in a standalone (or local) mode
c). OS: Windows 7
STEPS FOLLOWED:
1. 1. I
followed instructions from:
On 09/04/2012 12:58 PM, Michel Segel wrote:
Which distro?
Saw this happen, way back when with a Cloudera release.
Check your config files too...
Sent from a remote device. Please excuse any typos...
Mike Segel
thanks for your answer
the config files are these:
Can you please show contents of masters and slaves config files?
On 09/04/2012 09:15 AM, surfer wrote:
On 09/04/2012 12:58 PM, Michel Segel wrote:
Which distro?
Saw this happen, way back when with a Cloudera release.
Check your config files too...
Sent from a remote device. Please
I ran into an issue yesterday where one of the blocks on HDFS seems to
have gone away. I would appreciate any help that you can provide.
I am running Hadoop on Amazon's Elastic Map Reduce (EMR). I am running
hadoop version 0.20.205 and hive version 0.8.1.
I have a hive table that is written out
The other question you have to look at is the underlying start and stop script
to see what is being passed on to them.
I thought there was a parameter that would overload the defaults where you
specified the slaves and master files, but I could be wrong.
Since this is raw Apache, I don't
You didn't do anything wrong, this is just a bug in the Pi application. The
application _should_ be able to divide two numbers and not require an exact
decimal result. Everything you need to know is in the first line of the error
message. Try it with 100 maps and 10 billion samples per map,
You blew out the stack?
Or rather your number was too 'big'/'long'?
On Sep 4, 2012, at 11:10 AM, Gaurav Dasgupta gdsay...@gmail.com wrote:
Hi All,
I am running the Pi Estimator from hadoop-examples.jar in my 11 node CDH3u4
cluster.
Initially I ran the job for 10 maps and 100
I've been running up against the good old fashioned replicated to 0 nodes
gremlin quite a bit recently. My system (a set of processes interacting with
hadoop, and of course hadoop itself) runs for a while (a day or so) and then I
get plagued with these errors. This is a very simple system, a
Next time, try reading and writing to S3 directly from your hive job.
Not sure why the block was bad... What did the AWS folks have to say?
-Mike
On Sep 4, 2012, at 11:30 AM, Max Hansmire hansm...@gmail.com wrote:
I ran into an issue yesterday where one of the blocks on HDFS seems to
have
If the datanode is definitely not running out of space, and the overall system
has basically been working leading up to the replicated to 0 nodes error
(which proves the configuration and permissions are all basically correct),
then what other explanations are there for why hdfs would suddenly
- A datanode is typically kept free with up to 5 free blocks (HDFS block
size) of space.
- Disk space is used by mapreduce jobs to store temporary shuffle spills
also. This is what dfs.datanode.du.reserved is used to configure. The
configuration is available in hdfs-site.xml. If you have not
Observe:
~/ $ hd fs -put test /test
put: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot create
file/test. Name node is in safe mode.
~/ $ hadoop dfsadmin -safemode leave
Safe mode is OFF
~/ $ hadoop dfsadmin -safemode get
Safe mode is ON
~/ $ hadoop dfsadmin -safemode leave
Especially where I am reading from from the file using a Map-Reduce
job in the next step I am not sure that it makes sense in terms of
performance to put the file on S3. I have not tested, but my suspicion
is that the local disk reads on HDFS would outperform reading and
writing the file to S3.
Can look in name node logs and post last few lines?
On 9/4/12 10:07 AM, Keith Wiley kwi...@keithwiley.com wrote:
Observe:
~/ $ hd fs -put test /test
put: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot
create file/test. Name node is in safe mode.
~/ $ hadoop dfsadmin
Hi
When I start my cluster with start-dfs.sh the secondary namenodes are
created in the slaves machines. I set conf/masters to a different single
machine (along with the assignment of dfs.http.address to the
nameserver:50070) but it is apparently ignored.
hadoop version: 1.0.3
1 machine with JT
On Sep 4, 2012, at 10:05 , Suresh Srinivas wrote:
When these errors are thrown, please send the namenode web UI information. It
has storage related information in the cluster summary. That will help debug.
Sure thing. Thanks. Here's what I currently see. It looks like the problem
isn't
Keith,
Assuming that you were seeing the problem when you captured the namenode
webUI info, it is not related to what I suspect. This might be a good
question for CDH forums given this is not an Apache release.
Regards,
Suresh
On Tue, Sep 4, 2012 at 10:20 AM, Keith Wiley kwi...@keithwiley.com
Hi Udayani
By default hadoop works well for linux and linux based OS. Since you are on
Windows you need to install and configure ssh using cygwin before you start
hadoop daemons.
On Tue, Sep 4, 2012 at 6:16 PM, Udayini Pendyala udayini_pendy...@yahoo.com
wrote:
Hi,
Following is a
On Mon, Sep 3, 2012 at 5:09 AM, Hemanth Yamijala yhema...@gmail.com wrote:
Is there a reason why Yarn's directory paths are not defaulting to be
relative to hadoop.tmp.dir.
For e.g. yarn.nodemanager.local-dirs defaults to /tmp/nm-local-dir.
Could it be ${hadoop.tmp.dir}/nm-local-dir instead ?
Hi Keith,
See http://search-hadoop.com/m/z9oYUIhhUg and the method isGoodTarget
under
http://search-hadoop.com/c/Hadoop:/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java||isGoodTarget
On Tue, Sep 4, 2012 at 10:24 PM,
Hi Young,
Note that the SequenceFile.Writer#sync method != HDFS sync(), its just
a method that writes a sync marker (a set of bytes representing an end
points for one or more records, kinda like a newline in text files but
not for every record)
I don't think sync() would affect much. Although,
Keith,
The NameNode has a resource-checker thread in it by design to help
prevent cases of on-disk metadata corruption in event of filled up
dfs.namenode.name.dir disks, etc.. By default, an NN will lock itself
up if the free disk space (among its configured metadata mounts)
reaches a value 100
Hi,
The path
/tmp/hadoop-pat/mapred/local/archive/-4686065962599733460_1587570556_150738331/snip
is a location used by the tasktracker process for the 'DistributedCache' -
a mechanism to distribute files to all tasks running in a map reduce job. (
Though I agree with others that it would probably be easier to get Hadoop
up and running on Unix based systems, couldn't help notice that this path:
\tmp \hadoop-upendyal\mapred\staging\upendyal-1075683580\.staging
seems to have a space in the first component i.e '\tmp ' and not '\tmp'. Is
that
Hello
I just wanna know why hbase doesn't provide Encryption ?
Tnx
43 matches
Mail list logo