Re: Exception while running a Hadoop example on a standalone install on Windows 7

2012-09-04 Thread Visioner Sadak
Hadoop 1.0.3 will give you lot of problems with windows and cygwin, becoz of complexities of cygwin configuration paths,so better downgrade to lower versions for development and testing purpose on windows(i downgraded to 0.22.0) and you can use 1.0.3 on production with linux servers...I will be a

why hbase doesn't provide Encryption

2012-09-04 Thread Farrokh Shahriari
Hello I just wanna know why hbase doesn't provide Encryption ? Tnx

Re: Exception while running a Hadoop example on a standalone install on Windows 7

2012-09-04 Thread Hemanth Yamijala
Though I agree with others that it would probably be easier to get Hadoop up and running on Unix based systems, couldn't help notice that this path: \tmp \hadoop-upendyal\mapred\staging\upendyal-1075683580\.staging seems to have a space in the first component i.e '\tmp ' and not '\tmp'. Is that

Re: Error using hadoop in non-distributed mode

2012-09-04 Thread Hemanth Yamijala
Hi, The path /tmp/hadoop-pat/mapred/local/archive/-4686065962599733460_1587570556_150738331/ is a location used by the tasktracker process for the 'DistributedCache' - a mechanism to distribute files to all tasks running in a map reduce job. ( http://hadoop.apache.org/common/docs/r1.0.3/mapred_tu

Re: could only be replicated to 0 nodes, instead of 1

2012-09-04 Thread Keith Wiley
Good to know. The bottom line is I was really short-roping everything on resources. I just need to jack the machine up some. Thanks. On Sep 4, 2012, at 19:41 , Harsh J wrote: > Keith, > > The NameNode has a resource-checker thread in it by design to help > prevent cases of on-disk metadata c

Re: could only be replicated to 0 nodes, instead of 1

2012-09-04 Thread Harsh J
Keith, The NameNode has a resource-checker thread in it by design to help prevent cases of on-disk metadata corruption in event of filled up dfs.namenode.name.dir disks, etc.. By default, an NN will lock itself up if the free disk space (among its configured metadata mounts) reaches a value < 100

Re: questions about SequenceFile

2012-09-04 Thread Harsh J
Hi Young, Note that the SequenceFile.Writer#sync method != HDFS sync(), its just a method that writes a sync marker (a set of bytes representing an end points for one or more records, kinda like a newline in text files but not for every record) I don't think sync() would affect much. Although, if

Re: could only be replicated to 0: TL;DR

2012-09-04 Thread Harsh J
Hi Keith, See http://search-hadoop.com/m/z9oYUIhhUg and the method isGoodTarget under http://search-hadoop.com/c/Hadoop:/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java||isGoodTarget On Tue, Sep 4, 2012 at 10:24 PM, Kei

Re: Yarn defaults for local directories

2012-09-04 Thread Vinod Kumar Vavilapalli
> . I don't seem to be able to add you as a CC, so feel free to add > yourself. Added. Thanks, +Vinod

Re: Exception while running a Hadoop example on a standalone install on Windows 7

2012-09-04 Thread Marcos Ortiz
On 09/04/2012 02:35 PM, Udayini Pendyala wrote: Hi Bejoy, Thanks for your response. I first started to install on Ubuntu Linux and ran into a bunch of problems. So, I wanted to back off a bit and try something simple first. Hence, my attempt to install on my Windows 7 Laptop. Well, if you

Re: Yarn defaults for local directories

2012-09-04 Thread Andy Isaacson
On Mon, Sep 3, 2012 at 5:09 AM, Hemanth Yamijala wrote: > Is there a reason why Yarn's directory paths are not defaulting to be > relative to hadoop.tmp.dir. > > For e.g. yarn.nodemanager.local-dirs defaults to /tmp/nm-local-dir. > Could it be ${hadoop.tmp.dir}/nm-local-dir instead ? Similarly for

Re: Exception while running a Hadoop example on a standalone install on Windows 7

2012-09-04 Thread Udayini Pendyala
Hi Bejoy, Thanks for your response. I first started to install on Ubuntu Linux and ran into a bunch of problems. So, I wanted to back off a bit and try something simple first. Hence, my attempt to install on my Windows 7 Laptop. I am doing the "standalone" mode - as per the documentation (link

Re: Exception while running a Hadoop example on a standalone install on Windows 7

2012-09-04 Thread Bejoy Ks
Hi Udayani By default hadoop works well for linux and linux based OS. Since you are on Windows you need to install and configure ssh using cygwin before you start hadoop daemons. On Tue, Sep 4, 2012 at 6:16 PM, Udayini Pendyala wrote: > Hi, > > > Following is a description of what I am trying t

Re: could only be replicated to 0 nodes, instead of 1

2012-09-04 Thread Keith Wiley
I had moved the data directory to the larger disk but left the namenode directory on the smaller disk figuring it didn't need much room. Moving that to the larger disk seems to have improved the situation...although I'm still surprised the NN needed so much room. Problem is solved for now. T

Re: Data loss on EMR cluster running Hadoop and Hive

2012-09-04 Thread Michael Segel
Max, Yes, you will get better performance if your data is on HDFS (local/ephemeral) versus S3. I'm not sure why you couldn't see the bad block. Next time this happens, try running an hadoop fsck from the name node. The reason why I was suggesting that you run against S3 is that while slower,

Re: could only be replicated to 0 nodes, instead of 1

2012-09-04 Thread Suresh Srinivas
Keith, Assuming that you were seeing the problem when you captured the namenode webUI info, it is not related to what I suspect. This might be a good question for CDH forums given this is not an Apache release. Regards, Suresh On Tue, Sep 4, 2012 at 10:20 AM, Keith Wiley wrote: > On Sep 4, 201

Re: could only be replicated to 0 nodes, instead of 1

2012-09-04 Thread Keith Wiley
On Sep 4, 2012, at 10:05 , Suresh Srinivas wrote: > When these errors are thrown, please send the namenode web UI information. It > has storage related information in the cluster summary. That will help debug. Sure thing. Thanks. Here's what I currently see. It looks like the problem isn't t

secondary namenode

2012-09-04 Thread derrick thomas
Hi When I start my cluster with start-dfs.sh the secondary namenodes are created in the slaves machines. I set conf/masters to a different single machine (along with the assignment of dfs.http.address to the nameserver:50070) but it is apparently ignored. hadoop version: 1.0.3 1 machine with JT

questions about SequenceFile

2012-09-04 Thread Young-Geun Park
Hi, All I run a MR program, WordCount: InputFile is a sequence file compressed by snappy block type. InputFormat is SequenceFileInputFormat. To check whether SequenceFile.Writer.sync() method would affect a MR program, At one case, writer.sync() method was called. the sync() method did not

Re: Can't get out of safemode

2012-09-04 Thread Serge Blazhiyevskyy
Can look in name node logs and post last few lines? On 9/4/12 10:07 AM, "Keith Wiley" wrote: >Observe: > >~/ $ hd fs -put test /test >put: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot >create file/test. Name node is in safe mode. >~/ $ hadoop dfsadmin -safemode leave >Safe

Re: Error using hadoop in non-distributed mode

2012-09-04 Thread Pat Ferrel
The job is creating several output and intermediate files all under the location: Users/pat/Projects/big-data/b/ssvd/ several output directories and files are created correctly and the file Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-0 is created and exists at the time of the error. We see

Re: Data loss on EMR cluster running Hadoop and Hive

2012-09-04 Thread Max Hansmire
Especially where I am reading from from the file using a Map-Reduce job in the next step I am not sure that it makes sense in terms of performance to put the file on S3. I have not tested, but my suspicion is that the local disk reads on HDFS would outperform reading and writing the file to S3. Th

Can't get out of safemode

2012-09-04 Thread Keith Wiley
Observe: ~/ $ hd fs -put test /test put: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot create file/test. Name node is in safe mode. ~/ $ hadoop dfsadmin -safemode leave Safe mode is OFF ~/ $ hadoop dfsadmin -safemode get Safe mode is ON ~/ $ hadoop dfsadmin -safemode leave Safe

Re: could only be replicated to 0 nodes, instead of 1

2012-09-04 Thread Suresh Srinivas
- A datanode is typically kept free with up to 5 free blocks (HDFS block size) of space. - Disk space is used by mapreduce jobs to store temporary shuffle spills also. This is what "dfs.datanode.du.reserved" is used to configure. The configuration is available in hdfs-site.xml. If you have not conf

Re: could only be replicated to 0: TL;DR

2012-09-04 Thread Keith Wiley
If the datanode is definitely not running out of space, and the overall system has basically been working leading up to the "replicated to 0 nodes" error (which proves the configuration and permissions are all basically correct), then what other explanations are there for why hdfs would suddenly

Re: Data loss on EMR cluster running Hadoop and Hive

2012-09-04 Thread Michael Segel
Next time, try reading and writing to S3 directly from your hive job. Not sure why the block was bad... What did the AWS folks have to say? -Mike On Sep 4, 2012, at 11:30 AM, Max Hansmire wrote: > I ran into an issue yesterday where one of the blocks on HDFS seems to > have gone away. I would

could only be replicated to 0 nodes, instead of 1

2012-09-04 Thread Keith Wiley
I've been running up against the good old fashioned "replicated to 0 nodes" gremlin quite a bit recently. My system (a set of processes interacting with hadoop, and of course hadoop itself) runs for a while (a day or so) and then I get plagued with these errors. This is a very simple system, a

Re: Pi Estimator failing to print output after finishing the job sucessfully

2012-09-04 Thread Michael Segel
You blew out the stack? Or rather your number was too 'big'/'long'? On Sep 4, 2012, at 11:10 AM, Gaurav Dasgupta wrote: > Hi All, > > I am running the Pi Estimator from hadoop-examples.jar in my 11 node CDH3u4 > cluster. > > Initially I ran the job for 10 maps and 100 samples per map

RE: Pi Estimator failing to print output after finishing the job sucessfully

2012-09-04 Thread Jeffrey Buell
You didn't do anything wrong, this is just a bug in the Pi application. The application _should_ be able to divide two numbers and not require an exact decimal result. Everything you need to know is in the first line of the error message. Try it with 100 maps and 10 billion samples per map, w

Re: SNN

2012-09-04 Thread Michael Segel
The other question you have to look at is the underlying start and stop script to see what is being passed on to them. I thought there was a parameter that would overload the defaults where you specified the slaves and master files, but I could be wrong. Since this is raw Apache, I don't think

Data loss on EMR cluster running Hadoop and Hive

2012-09-04 Thread Max Hansmire
I ran into an issue yesterday where one of the blocks on HDFS seems to have gone away. I would appreciate any help that you can provide. I am running Hadoop on Amazon's Elastic Map Reduce (EMR). I am running hadoop version 0.20.205 and hive version 0.8.1. I have a hive table that is written out i

Re: SNN

2012-09-04 Thread Terry Healy
Can you please show contents of masters and slaves config files? On 09/04/2012 09:15 AM, surfer wrote: > On 09/04/2012 12:58 PM, Michel Segel wrote: >> Which distro? >> >> Saw this happen, way back when with a Cloudera release. >> >> Check your config files too... >> >> >> Sent from a remote dev

Hadoop CompositeInputFormat block matrix-vector multiplication

2012-09-04 Thread Sigurd Spieckermann
Hi guys, I am trying to implement a block matrix-vector multiplication algorithm with Hadoop according to the schematics from http://i.stanford.edu/~ullman/mmds/ch5.pdf page 162. My matrix is going to be sparse and the vector dense which is exactly what is required in PageRank as well. The vector

Re: SNN

2012-09-04 Thread surfer
On 09/04/2012 12:58 PM, Michel Segel wrote: > Which distro? > > Saw this happen, way back when with a Cloudera release. > > Check your config files too... > > > Sent from a remote device. Please excuse any typos... > > Mike Segel thanks for your answer the config files are these: https://gist.git

Exception while running a Hadoop example on a standalone install on Windows 7

2012-09-04 Thread Udayini Pendyala
Hi, Following is a description of what I am trying to do and the steps I followed. GOAL: a). Install Hadoop 1.0.3 b). Hadoop in a standalone (or local) mode c). OS: Windows 7 STEPS FOLLOWED: 1.    1.   I followed instructions from: http://www.oreillynet.com/pub/a/other-programming/exce

Re: knowing the nodes on which reduce tasks will run

2012-09-04 Thread Steve Loughran
On 3 September 2012 15:19, Abhay Ratnaparkhi wrote: > Hello, > > How can one get to know the nodes on which reduce tasks will run? > > One of my job is running and it's completing all the map tasks. > My map tasks write lots of intermediate data. The intermediate directory > is getting full on all

Re: Integrating hadoop with java UI application deployed on tomcat

2012-09-04 Thread Visioner Sadak
Thanks bejoy, actually my hadoop is also on windows(i have installed it in psuedo-distributed mode for testing) its not a remote cluster On Tue, Sep 4, 2012 at 3:38 PM, Bejoy KS wrote: > ** > Hi > > You are running tomact on a windows machine and trying to connect to a > remote hadoop cluste

Re: SNN

2012-09-04 Thread Michel Segel
Which distro? Saw this happen, way back when with a Cloudera release. Check your config files too... Sent from a remote device. Please excuse any typos... Mike Segel On Sep 4, 2012, at 3:22 AM, surfer wrote: > Hi > > When I start my cluster (with start-dfs.sh), secondary namenodes are > c

Re: Integrating hadoop with java UI application deployed on tomcat

2012-09-04 Thread Bejoy KS
Hi You are running tomact on a windows machine and trying to connect to a remote hadoop cluster from there. Your core site has fs.default.name hdfs://localhost:9000 But It is localhost here.( I assume you are not running hadoop on this windows environment for some testing) You need to have t

Re: Integrating hadoop with java UI application deployed on tomcat

2012-09-04 Thread Visioner Sadak
also getting one more error * org.apache.hadoop.ipc.RemoteException*: Server IPC version 5 cannot communicate with client version 4 On Tue, Sep 4, 2012 at 2:44 PM, Visioner Sadak wrote: > Thanks shobha tried adding conf folder to tomcats classpath still getting > same error > > > Call to loca

Re: Integrating hadoop with java UI application deployed on tomcat

2012-09-04 Thread Visioner Sadak
Thanks shobha tried adding conf folder to tomcats classpath still getting same error Call to localhost/127.0.0.1:9000 failed on local exception: java.io.IOException: An established connection was aborted by the software in your host machine On Tue, Sep 4, 2012 at 11:18 AM, Mahadevappa, Shobha <

Re: Hadoop Streaming: Does not recognise PATH and CLASSPATH defined

2012-09-04 Thread Narasingu Ramesh
Hi Andy, Please try once other wise you can again download new hadoop 1.0.2 it is stable. you can set all environment variables in vi .bashrc or gedt .bashrc you can set paths JAVA_HOME and HADOOP_HOME environment variables and also set path. Thanks & Regards, Ramesh.Narasingu On Tue,

Re: Hadoop Streaming: Does not recognise PATH and CLASSPATH defined

2012-09-04 Thread Andy Xue
Hi : Thank you, Narasingu and Rekha for your help. So there are no way for hadoop streaming to read an environment variable from the OS? Guess that I'll have to use the "-cmdenv" command to specify the PATH and CLASSPATH variables. Again, appreciate your help. Andy On 4 September 2012 18:13, N

SNN

2012-09-04 Thread surfer
Hi When I start my cluster (with start-dfs.sh), secondary namenodes are created on all the machines in conf/slaves. I set conf/masters to a single different machine (along with dfs.http.address pointing to the nameserver) but seems to be ignored. any hint of what I'm doing wrong? thanks giovanni

Re: best way to join?

2012-09-04 Thread Björn-Elmar Macek
Hi Dexter, i think, what you want is a clustering of points based on the euclidian distance or density based clustering ( http://en.wikipedia.org/wiki/Cluster_analysis ). I bet there are some implemented quite well in Mahout already: afaik this is the datamining framework based on Hadoop. B

Re: Error using hadoop in non-distributed mode

2012-09-04 Thread Narasingu Ramesh
Hi Pat, Please specify correct input file location. Thanks & Regards, Ramesh.Narasingu On Mon, Sep 3, 2012 at 9:28 PM, Pat Ferrel wrote: > Using hadoop with mahout in a local filesystem/non-hdfs config for > debugging purposes inside Intellij IDEA. When I run one particular part of >

Re: Hadoop Streaming: Does not recognise PATH and CLASSPATH defined

2012-09-04 Thread Narasingu Ramesh
Hi Rekha, What I said means first he has set install java and then pwd(present working directory) for JAVA_HOME directory and then set to classpath for the hadoop installation. Thanks & Regards, Ramesh.Narasingu On Tue, Sep 4, 2012 at 1:36 PM, Joshi, Rekha wrote: > Hi Andy, > > If y

Re: knowing the nodes on which reduce tasks will run

2012-09-04 Thread Narasingu Ramesh
Hi Abhay, NameNode it has address of the all data nodes. MapReduce can do all the data is processing. First data set is putting into HDFS filesystem and then run hadoop jar file. Map task can handle input files for shufle, sorting and grouped together. Map task is completed and then tak

Re: Hadoop Streaming: Does not recognise PATH and CLASSPATH defined

2012-09-04 Thread Joshi, Rekha
Hi Andy, If you are referring to HADOOP_CLASSPATH, that is env variable on your cluster or effected via config xml.But if you need your own environment variables for streaming you may use -cmdenv PATH= on your streaming command.Or if you have specific jars for the streaming process -libjars on

Re: how to execute different tasks on data nodes(simultaneously in hadoop).

2012-09-04 Thread Narasingu Ramesh
Hi Users, Hadoop can distribute all the data into HDFS inside MapReduce tasks can work together. which one is goes to which data node and how it works all those things it can maintain each task has own JVM in each data node. JVM can handle hell number of data to process to the all dat

Re: Hadoop Streaming: Does not recognise PATH and CLASSPATH defined

2012-09-04 Thread Narasingu Ramesh
Hi Andy, Please specify the environment varibles in gedit .bashrc. you can specify for the JAVA_HOME environment variables and configuration hadoop-site.xml,hadoop-core.xml,and hadoop-default.xml files you can specify which version hadoop you can use those things and then close .bashrc

Hadoop Streaming: Does not recognise PATH and CLASSPATH defined

2012-09-04 Thread Andy Xue
Hi: I wish to use Hadoop streaming to run a program which requires specific PATH and CLASSPATH variables. I have set these two variables in both "/etc/profile" and "~/.bashrc" on all slaves (and restarted these slaves). However, when I run the hadoop streaming job, the program generates error mess