How to compile HBase code ?

2011-05-24 Thread praveenesh kumar
Hello guys, In case any of you are working on HBASE, I just wrote a program by reading some tutorials.. But no where its mentioned how to run codes on HBASE. In case anyone of you has done some coding on HBASE , can you please tell me how to run it. I am able to compile my code by adding

AW: How to compile HBase code ?

2011-05-24 Thread Kleegrewe, Christian
How do you execute the client (command line) do you use the java or the hadoop command? It seems that there is an error in your classpath when running the client job. The classpath when compiling classes that implement the client is different from the classpath when your client is executed

Re: How to compile HBase code ?

2011-05-24 Thread praveenesh kumar
I am simply using HBase API, not doing any Map-reduce work on it. Following is the code I have written , simply creating the file on HBase: import java.io.IOException; import org.apache.hadoop.hbase.HBaseConfiguration; import org.apache.hadoop.hbase.HColumnDescriptor; import

AW: How to compile HBase code ?

2011-05-24 Thread Kleegrewe, Christian
Are you sure that the directory where your ExampleClient.class is locates is part of the MYCLASSPATH? regards Christian ---8 Siemens AG Corporate Technology Corporate Research and Technologies CT T DE IT3 Otto-Hahn-Ring 6 81739 München, Deutschland

Re: How to compile HBase code ?

2011-05-24 Thread Harsh J
Praveenesh, HBase has their own user mailing lists where such queries ought to go. Am moving the discussion to u...@hbase.apache.org and bcc-ing common-user@ here. Also added you to cc. Regarding your first error, going forward you can use the useful `hbase classpath` to generate a

Re: How to compile HBase code ?

2011-05-24 Thread praveenesh kumar
Hey Harsh, Actually I mailed to HBase mailing list also.. but since I wanted to get this thing done as soon as possible so I mailed in this group also.. anyways I will take care of this in future , although I got more responses in this mailing list only :-) Anyways problem is solved.. What i

Re: How to compile HBase code ?

2011-05-24 Thread Harsh J
Praveenesh, On Tue, May 24, 2011 at 4:31 PM, praveenesh kumar praveen...@gmail.com wrote: Hey Harsh, Actually I mailed to HBase mailing list also.. but since I wanted to get this thing done as soon as possible so I mailed in this group also.. anyways I will take care of this in future ,

Re: How to compile HBase code ?

2011-05-24 Thread praveenesh kumar
Hey harsh, I tried that.. its not working. I am using hbase 0.20.6. there is no command like bin/hbase classpath : hadoop@ub6:/usr/local/hadoop/hbase$ hbase Usage: hbase command where command is one of: shellrun the HBase shell master run an HBase HMaster node

Re: How to compile HBase code ?

2011-05-24 Thread Harsh J
Praveenesh, Ah yes it would not work on the older 0.20.x releases; The command exists in the current HBase release. On Tue, May 24, 2011 at 5:11 PM, praveenesh kumar praveen...@gmail.com wrote: Hey harsh, I tried that.. its not working. I am using hbase 0.20.6. there is no command like

Simple change to WordCount either times out or runs 18+ hrs with little progress

2011-05-24 Thread Maryanne.DellaSalla
I am attempting to familiarize myself with hadoop and utilizing MapReduce in order to process system log files. I had tried to start small with a simple map reduce program similar to the word count example provided. I wanted for each line that I had read in, to grab the 5th word as my output

Re: Simple change to WordCount either times out or runs 18+ hrs with little progress

2011-05-24 Thread Ted Dunning
itr.nextToken() is inside the if. On Tue, May 24, 2011 at 7:29 AM, maryanne.dellasa...@gdc4s.com wrote: while (itr.hasMoreTokens()) { if(count == 5) { word.set(itr.nextToken()); output.collect(word, one); } count++; }

question about BlockLocation setHosts

2011-05-24 Thread George Kousiouris
Hi all, I had a question regarding the setHosts method of the BlockLocation class in hadoop hdfs. Does this make the block in question to be moved to the specified host? Furthermore, where does the getHosts method of block location get the host names? Thanks, George --

RE: Simple change to WordCount either times out or runs 18+ hrs with little progress

2011-05-24 Thread Maryanne.DellaSalla
Ahh, well that's embarrassing and explains the situation where it runs for many hours. I am still baffled as to the split on delimiter version timing out, though. String line = value.toString(); String[] splitLine = line.split(,); if( splitLine.length = 5 ) {

Re: tips and tools to optimize cluster

2011-05-24 Thread Chris Smith
Worth a look at OpenTSDB ( http://opentsdb.net/ ) as it doesn't lose precision on the historical data. It also has some neat tracks around the collection and display of data. Another useful tool is 'collectl' ( http://collectl.sourceforge.net/ ) which is a light weight Perl script that both

Re: get name of file in mapper output directory

2011-05-24 Thread Mark question
thanks both for the comments, but even though finally, I managed to get the output file of the current mapper, I couldn't use it because apparently, mappers uses _temporary file while it's in process. So in Mapper.close , the file for eg. part-0 which it wrote to, does not exists yet. There

Processing xml files

2011-05-24 Thread Mohit Anchlia
I just started learning hadoop and got done with wordcount mapreduce example. I also briefly looked at hadoop streaming. Some questions 1) What should be my first step now? Are there more examples somewhere that I can try out? 2) Second question is around pracitcal usability using xml files. Our

EC2 cloudera cc1.4xlarge

2011-05-24 Thread Aleksandr Elbakyan
Hello, I am want to use cc1.4xlarge cluster for some data processing, to spin clusters I am using cloudera scripts. hadoop-ec2-init-remote.sh has default configuration until c1.xlarge but not configuration for cc1.4xlarge, can someone give formula how does this values calculated based on

Re: Processing xml files

2011-05-24 Thread Aleksandr Elbakyan
Hello,  We have the same type of data, we currently convert it to tab delimited file and use it as input for streaming Regards, Aleksandr --- On Tue, 5/24/11, Mohit Anchlia mohitanch...@gmail.com wrote: From: Mohit Anchlia mohitanch...@gmail.com Subject: Processing xml files To:

Re: Processing xml files

2011-05-24 Thread Mohit Anchlia
On Tue, May 24, 2011 at 4:25 PM, Aleksandr Elbakyan ramal...@yahoo.com wrote: Hello,  We have the same type of data, we currently convert it to tab delimited file and use it as input for streaming Can you please give more info? Do you append multiple xml files data as a line into one file?

Re: Sorting ...

2011-05-24 Thread Mark question
Thanks Luca, but what other way to sort a directory of sequence files? I don't plan to write a sorting algorithm in mappers/reducers, but hoping to use the sequenceFile.sorter instead. Any ideas? Mark On Mon, May 23, 2011 at 12:33 AM, Luca Pireddu pire...@crs4.it wrote: On May 22, 2011

Re: tips and tools to optimize cluster

2011-05-24 Thread Tom Melendez
Thanks Chris, these are quite helpful. Thanks, Tom On Tue, May 24, 2011 at 11:13 AM, Chris Smith csmi...@gmail.com wrote: Worth a look at OpenTSDB ( http://opentsdb.net/ ) as it doesn't lose precision on the historical data. It also has some neat tracks around the collection and display of

Re: Processing xml files

2011-05-24 Thread Aleksandr Elbakyan
Can you please give more info? We currently have off hadoop process which uses java xml parser to convert it to flat file. We have files from couple kb to 10of GB. Do you append multiple xml files data as a line into one file? Or someother way? If so then how big do you let files to be. We

Re: EC2 cloudera cc1.4xlarge

2011-05-24 Thread Aleksandr Elbakyan
I look into different cluster and configurations from cloudera and came with this number let me know what do you think... Machine 23 GB of memory 33.5 EC2 Compute Units (2 x Intel Xeon X5570, quad-core “Nehalem” architecture) 1690 GB of instance storage 64-bit platform I/O Performance:

Checkpoint vs Backup Node

2011-05-24 Thread sulabh choudhury
As far as my understanding goes, I feel that Backup node is much more efficient then the Checkpoint node, as it has the current(up-to-date) copy of file system too. I do not understand what would be the use case (in a production environment) tin which someone would prefer Checkpoint node over

Re: Processing xml files

2011-05-24 Thread Mohit Anchlia
Thanks some more questions :) On Tue, May 24, 2011 at 4:54 PM, Aleksandr Elbakyan ramal...@yahoo.com wrote: Can you please give more info? We currently have off hadoop process which uses java xml parser to convert it to flat file. We have files from couple kb to 10of GB. Do you convert it

Re: Checkpoint vs Backup Node

2011-05-24 Thread Todd Lipcon
Hi Sulabh, Neither of these nodes have been productionized -- so I don't think anyone would have a good answer for you about what works in production. They are only available in 0.21 and haven't had any substantial QA. One of the potential issues with the BN is that it can delay the logging of

Re: EC2 cloudera cc1.4xlarge

2011-05-24 Thread Konstantin Boudnik
Try cloudera specific lisls with your questions. --   Take care, Konstantin (Cos) Boudnik 2CAC 8312 4870 D885 8616  6115 220F 6980 1F27 E622 Disclaimer: Opinions expressed in this email are those of the author, and do not necessarily represent the views of any company the author might be

Re: Processing xml files

2011-05-24 Thread Aleksandr Elbakyan
Hello, We currently have complicated process which has more then 20 jobs piped to each other. We are using shell script to control the flow, I saw some other company they were using spring batch. We use pig, streaming and hive  Not one thing if you are using ec2 for your jobs all local files

Cannot lock storage, directory is already locked

2011-05-24 Thread Mark question
Hi guys, I'm using an NFS cluster consisting of 30 machines, but only specified 3 of the nodes to be my hadoop cluster. So my problem is this. Datanode won't start in one of the nodes because of the following error: org.apache.hadoop.hdfs.server.common.Storage: Cannot lock storage

I can't see this email ... So to clarify ..

2011-05-24 Thread Mark question
Hi guys, I'm using an NFS cluster consisting of 30 machines, but only specified 3 of the nodes to be my hadoop cluster. So my problem is this. Datanode won't start in one of the nodes because of the following error: org.apache.hadoop.hdfs.server. common.Storage: Cannot lock storage

Re: I can't see this email ... So to clarify ..

2011-05-24 Thread Joey Echeverria
Try moving the the configuration to hdfs-site.xml. One word of warning, if you use /tmp to store your HDFS data, you risk data loss. On many operating systems, files and directories in /tmp are automatically deleted. -Joey On Tue, May 24, 2011 at 10:22 PM, Mark question markq2...@gmail.com

Re: I can't see this email ... So to clarify ..

2011-05-24 Thread Mark question
Well, you're right ... moving it to hdfs-site.xml had an effect at least. But now I'm in the NameSpace incompatable error: WARN org.apache.hadoop.hdfs.server.common.Util: Path /tmp/hadoop-mark/dfs/data should be specified as a URI in configuration files. Please update hdfs configuration.

Re: I can't see this email ... So to clarify ..

2011-05-24 Thread Mapred Learn
Do u Hv right permissions on the new dirs ? Try stopping n starting cluster... -JJ On May 24, 2011, at 9:13 PM, Mark question markq2...@gmail.com wrote: Well, you're right ... moving it to hdfs-site.xml had an effect at least. But now I'm in the NameSpace incompatable error: WARN

LeaseExpirationException and 'leaseholder failing to recreate file': Could anything be done run-time?

2011-05-24 Thread Lokendra Singh
Hi All, I am running a process to extract feature vectors from images and write as SequenceFiles on HDFS. My dataset of images is very large (~46K images). The writing process worked all fine for half of the process but all of sudden following problem occured: