Hadoop 1.0.3 will give you lot of problems with windows and cygwin, becoz
of complexities of cygwin configuration paths,so better downgrade to lower
versions for development and testing purpose on windows(i downgraded to
0.22.0) and you can use 1.0.3 on production with linux servers...I will
be a
Hello
I just wanna know why hbase doesn't provide Encryption ?
Tnx
Though I agree with others that it would probably be easier to get Hadoop
up and running on Unix based systems, couldn't help notice that this path:
\tmp \hadoop-upendyal\mapred\staging\upendyal-1075683580\.staging
seems to have a space in the first component i.e '\tmp ' and not '\tmp'. Is
that
Hi,
The path
/tmp/hadoop-pat/mapred/local/archive/-4686065962599733460_1587570556_150738331/
is a location used by the tasktracker process for the 'DistributedCache' -
a mechanism to distribute files to all tasks running in a map reduce job. (
http://hadoop.apache.org/common/docs/r1.0.3/mapred_tu
Good to know. The bottom line is I was really short-roping everything on
resources. I just need to jack the machine up some.
Thanks.
On Sep 4, 2012, at 19:41 , Harsh J wrote:
> Keith,
>
> The NameNode has a resource-checker thread in it by design to help
> prevent cases of on-disk metadata c
Keith,
The NameNode has a resource-checker thread in it by design to help
prevent cases of on-disk metadata corruption in event of filled up
dfs.namenode.name.dir disks, etc.. By default, an NN will lock itself
up if the free disk space (among its configured metadata mounts)
reaches a value < 100
Hi Young,
Note that the SequenceFile.Writer#sync method != HDFS sync(), its just
a method that writes a sync marker (a set of bytes representing an end
points for one or more records, kinda like a newline in text files but
not for every record)
I don't think sync() would affect much. Although, if
Hi Keith,
See http://search-hadoop.com/m/z9oYUIhhUg and the method isGoodTarget
under
http://search-hadoop.com/c/Hadoop:/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java||isGoodTarget
On Tue, Sep 4, 2012 at 10:24 PM, Kei
> . I don't seem to be able to add you as a CC, so feel free to add
> yourself.
Added.
Thanks,
+Vinod
On 09/04/2012 02:35 PM, Udayini Pendyala wrote:
Hi Bejoy,
Thanks for your response. I first started to install on Ubuntu Linux
and ran into a bunch of problems. So, I wanted to back off a bit and
try something simple first. Hence, my attempt to install on my Windows
7 Laptop.
Well, if you
On Mon, Sep 3, 2012 at 5:09 AM, Hemanth Yamijala wrote:
> Is there a reason why Yarn's directory paths are not defaulting to be
> relative to hadoop.tmp.dir.
>
> For e.g. yarn.nodemanager.local-dirs defaults to /tmp/nm-local-dir.
> Could it be ${hadoop.tmp.dir}/nm-local-dir instead ? Similarly for
Hi Bejoy,
Thanks for your response. I first started to install on Ubuntu Linux and ran
into a bunch of problems. So, I wanted to back off a bit and try something
simple first. Hence, my attempt to install on my Windows 7 Laptop.
I am doing the "standalone" mode - as per the documentation (link
Hi Udayani
By default hadoop works well for linux and linux based OS. Since you are on
Windows you need to install and configure ssh using cygwin before you start
hadoop daemons.
On Tue, Sep 4, 2012 at 6:16 PM, Udayini Pendyala wrote:
> Hi,
>
>
> Following is a description of what I am trying t
I had moved the data directory to the larger disk but left the namenode
directory on the smaller disk figuring it didn't need much room. Moving that
to the larger disk seems to have improved the situation...although I'm still
surprised the NN needed so much room.
Problem is solved for now.
T
Max,
Yes, you will get better performance if your data is on HDFS (local/ephemeral)
versus S3.
I'm not sure why you couldn't see the bad block.
Next time this happens, try running an hadoop fsck from the name node.
The reason why I was suggesting that you run against S3 is that while slower,
Keith,
Assuming that you were seeing the problem when you captured the namenode
webUI info, it is not related to what I suspect. This might be a good
question for CDH forums given this is not an Apache release.
Regards,
Suresh
On Tue, Sep 4, 2012 at 10:20 AM, Keith Wiley wrote:
> On Sep 4, 201
On Sep 4, 2012, at 10:05 , Suresh Srinivas wrote:
> When these errors are thrown, please send the namenode web UI information. It
> has storage related information in the cluster summary. That will help debug.
Sure thing. Thanks. Here's what I currently see. It looks like the problem
isn't t
Hi
When I start my cluster with start-dfs.sh the secondary namenodes are
created in the slaves machines. I set conf/masters to a different single
machine (along with the assignment of dfs.http.address to the
nameserver:50070) but it is apparently ignored.
hadoop version: 1.0.3
1 machine with JT
Hi, All
I run a MR program, WordCount:
InputFile is a sequence file compressed by snappy block type.
InputFormat is SequenceFileInputFormat.
To check whether SequenceFile.Writer.sync() method would affect a MR
program,
At one case, writer.sync() method was called. the sync() method did not
Can look in name node logs and post last few lines?
On 9/4/12 10:07 AM, "Keith Wiley" wrote:
>Observe:
>
>~/ $ hd fs -put test /test
>put: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot
>create file/test. Name node is in safe mode.
>~/ $ hadoop dfsadmin -safemode leave
>Safe
The job is creating several output and intermediate files all under the
location: Users/pat/Projects/big-data/b/ssvd/ several output directories and
files are created correctly and the file
Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-0 is created and exists at the
time of the error. We see
Especially where I am reading from from the file using a Map-Reduce
job in the next step I am not sure that it makes sense in terms of
performance to put the file on S3. I have not tested, but my suspicion
is that the local disk reads on HDFS would outperform reading and
writing the file to S3.
Th
Observe:
~/ $ hd fs -put test /test
put: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot create
file/test. Name node is in safe mode.
~/ $ hadoop dfsadmin -safemode leave
Safe mode is OFF
~/ $ hadoop dfsadmin -safemode get
Safe mode is ON
~/ $ hadoop dfsadmin -safemode leave
Safe
- A datanode is typically kept free with up to 5 free blocks (HDFS block
size) of space.
- Disk space is used by mapreduce jobs to store temporary shuffle spills
also. This is what "dfs.datanode.du.reserved" is used to configure. The
configuration is available in hdfs-site.xml. If you have not conf
If the datanode is definitely not running out of space, and the overall system
has basically been working leading up to the "replicated to 0 nodes" error
(which proves the configuration and permissions are all basically correct),
then what other explanations are there for why hdfs would suddenly
Next time, try reading and writing to S3 directly from your hive job.
Not sure why the block was bad... What did the AWS folks have to say?
-Mike
On Sep 4, 2012, at 11:30 AM, Max Hansmire wrote:
> I ran into an issue yesterday where one of the blocks on HDFS seems to
> have gone away. I would
I've been running up against the good old fashioned "replicated to 0 nodes"
gremlin quite a bit recently. My system (a set of processes interacting with
hadoop, and of course hadoop itself) runs for a while (a day or so) and then I
get plagued with these errors. This is a very simple system, a
You blew out the stack?
Or rather your number was too 'big'/'long'?
On Sep 4, 2012, at 11:10 AM, Gaurav Dasgupta wrote:
> Hi All,
>
> I am running the Pi Estimator from hadoop-examples.jar in my 11 node CDH3u4
> cluster.
>
> Initially I ran the job for 10 maps and 100 samples per map
You didn't do anything wrong, this is just a bug in the Pi application. The
application _should_ be able to divide two numbers and not require an exact
decimal result. Everything you need to know is in the first line of the error
message. Try it with 100 maps and 10 billion samples per map, w
The other question you have to look at is the underlying start and stop script
to see what is being passed on to them.
I thought there was a parameter that would overload the defaults where you
specified the slaves and master files, but I could be wrong.
Since this is raw Apache, I don't think
I ran into an issue yesterday where one of the blocks on HDFS seems to
have gone away. I would appreciate any help that you can provide.
I am running Hadoop on Amazon's Elastic Map Reduce (EMR). I am running
hadoop version 0.20.205 and hive version 0.8.1.
I have a hive table that is written out i
Can you please show contents of masters and slaves config files?
On 09/04/2012 09:15 AM, surfer wrote:
> On 09/04/2012 12:58 PM, Michel Segel wrote:
>> Which distro?
>>
>> Saw this happen, way back when with a Cloudera release.
>>
>> Check your config files too...
>>
>>
>> Sent from a remote dev
Hi guys,
I am trying to implement a block matrix-vector multiplication algorithm
with Hadoop according to the schematics from
http://i.stanford.edu/~ullman/mmds/ch5.pdf page 162. My matrix is going to
be sparse and the vector dense which is exactly what is required in
PageRank as well. The vector
On 09/04/2012 12:58 PM, Michel Segel wrote:
> Which distro?
>
> Saw this happen, way back when with a Cloudera release.
>
> Check your config files too...
>
>
> Sent from a remote device. Please excuse any typos...
>
> Mike Segel
thanks for your answer
the config files are these: https://gist.git
Hi,
Following is a description of what I am trying to do and the steps I followed.
GOAL:
a). Install Hadoop
1.0.3
b). Hadoop in a standalone (or local) mode
c). OS: Windows 7
STEPS FOLLOWED:
1. 1. I
followed instructions from:
http://www.oreillynet.com/pub/a/other-programming/exce
On 3 September 2012 15:19, Abhay Ratnaparkhi wrote:
> Hello,
>
> How can one get to know the nodes on which reduce tasks will run?
>
> One of my job is running and it's completing all the map tasks.
> My map tasks write lots of intermediate data. The intermediate directory
> is getting full on all
Thanks bejoy, actually my hadoop is also on windows(i have installed it in
psuedo-distributed mode for testing) its not a remote cluster
On Tue, Sep 4, 2012 at 3:38 PM, Bejoy KS wrote:
> **
> Hi
>
> You are running tomact on a windows machine and trying to connect to a
> remote hadoop cluste
Which distro?
Saw this happen, way back when with a Cloudera release.
Check your config files too...
Sent from a remote device. Please excuse any typos...
Mike Segel
On Sep 4, 2012, at 3:22 AM, surfer wrote:
> Hi
>
> When I start my cluster (with start-dfs.sh), secondary namenodes are
> c
Hi
You are running tomact on a windows machine and trying to connect to a remote
hadoop cluster from there. Your core site has
fs.default.name
hdfs://localhost:9000
But It is localhost here.( I assume you are not running hadoop on this windows
environment for some testing)
You need to have t
also getting one more error
*
org.apache.hadoop.ipc.RemoteException*: Server IPC version 5 cannot
communicate with client version 4
On Tue, Sep 4, 2012 at 2:44 PM, Visioner Sadak wrote:
> Thanks shobha tried adding conf folder to tomcats classpath still getting
> same error
>
>
> Call to loca
Thanks shobha tried adding conf folder to tomcats classpath still getting
same error
Call to localhost/127.0.0.1:9000 failed on local exception:
java.io.IOException: An established connection was aborted by the software
in your host machine
On Tue, Sep 4, 2012 at 11:18 AM, Mahadevappa, Shobha <
Hi Andy,
Please try once other wise you can again download new hadoop
1.0.2 it is stable. you can set all environment variables in vi .bashrc or
gedt .bashrc you can set paths JAVA_HOME and HADOOP_HOME environment
variables and also set path.
Thanks & Regards,
Ramesh.Narasingu
On Tue,
Hi :
Thank you, Narasingu and Rekha for your help.
So there are no way for hadoop streaming to read an environment variable
from the OS? Guess that I'll have to use the "-cmdenv" command to specify
the PATH and CLASSPATH variables.
Again, appreciate your help.
Andy
On 4 September 2012 18:13, N
Hi
When I start my cluster (with start-dfs.sh), secondary namenodes are
created on all the machines in conf/slaves. I set conf/masters to a
single different machine (along with dfs.http.address pointing to the
nameserver) but seems to be ignored. any hint of what I'm doing wrong?
thanks
giovanni
Hi Dexter,
i think, what you want is a clustering of points based on the euclidian
distance or density based clustering (
http://en.wikipedia.org/wiki/Cluster_analysis ). I bet there are some
implemented quite well in Mahout already: afaik this is the datamining
framework based on Hadoop.
B
Hi Pat,
Please specify correct input file location.
Thanks & Regards,
Ramesh.Narasingu
On Mon, Sep 3, 2012 at 9:28 PM, Pat Ferrel wrote:
> Using hadoop with mahout in a local filesystem/non-hdfs config for
> debugging purposes inside Intellij IDEA. When I run one particular part of
>
Hi Rekha,
What I said means first he has set install java and then
pwd(present working directory) for JAVA_HOME directory and then set to
classpath for the hadoop installation.
Thanks & Regards,
Ramesh.Narasingu
On Tue, Sep 4, 2012 at 1:36 PM, Joshi, Rekha wrote:
> Hi Andy,
>
> If y
Hi Abhay,
NameNode it has address of the all data nodes. MapReduce can do
all the data is processing. First data set is putting into HDFS filesystem
and then run hadoop jar file. Map task can handle input files for shufle,
sorting and grouped together. Map task is completed and then tak
Hi Andy,
If you are referring to HADOOP_CLASSPATH, that is env variable on your cluster
or effected via config xml.But if you need your own environment variables for
streaming you may use -cmdenv PATH= on your streaming command.Or if you have
specific jars for the streaming process -libjars on
Hi Users,
Hadoop can distribute all the data into HDFS inside MapReduce
tasks can work together. which one is goes to which data node and how it
works all those things it can maintain each task has own JVM in each data
node. JVM can handle hell number of data to process to the all dat
Hi Andy,
Please specify the environment varibles in gedit .bashrc. you
can specify for the JAVA_HOME environment variables and configuration
hadoop-site.xml,hadoop-core.xml,and hadoop-default.xml files you can
specify which version hadoop you can use those things and then close
.bashrc
Hi:
I wish to use Hadoop streaming to run a program which requires specific
PATH and CLASSPATH variables. I have set these two variables in both
"/etc/profile" and "~/.bashrc" on all slaves (and restarted these slaves).
However, when I run the hadoop streaming job, the program generates error
mess
52 matches
Mail list logo