Re: working with SAS

2012-02-06 Thread alo alt
Hi, hadoop is running on a linux box (mostly) and can run in a standalone installation for testing only. If you decide to use hadoop with hive or hbase you have to face a lot of more tasks: - installation (whirr and Amazone EC2 as example) - write your own mapreduce job or use hive / hbase - se

Re: working with SAS

2012-02-06 Thread Prashant Sharma
+ you will not necessarily need vertical systems for speeding up things(totally depends on your query) . Give a thought of having commodity hardware(much cheaper) and hadoop being suited for them, *I hope* your infrastructure can be cheaper in terms of price to performance ratio. Having said that,

Can I write to an compressed file which is located in hdfs?

2012-02-06 Thread Xiaobin She
hi all, I'm testing hadoop and hive, and I want to use them in log analysis. Here I have a question, can I write/append log to an compressed file which is located in hdfs? Our system generate lots of log files every day, I can't compress these logs every hour and them put them into hdfs. But w

Re: Can I write to an compressed file which is located in hdfs?

2012-02-06 Thread Xiaobin She
sorry, this sentence is wrong, I can't compress these logs every hour and them put them into hdfs. it should be I can compress these logs every hour and them put them into hdfs. 2012/2/6 Xiaobin She > > hi all, > > I'm testing hadoop and hive, and I want to use them in log analysis. > > H

The Common Account for Hadoop

2012-02-06 Thread Bing Li
Dear all, I am just starting to learn Hadoop. According to the book, Hadoop in Action, a common account for each server (masters/slaves) must be created. Moreover, I need to create a public/private rsa key pair as follows. ssh-keygen -t rsa Then, id_rsa and id_rsa.pub are put under $HOME/.s

Re: The Common Account for Hadoop

2012-02-06 Thread alo alt
check the rights of .ssh/authorized_keys on the hosts, have to be only read- and writable for the user (including directory) Be sure you copied the right key without line-breaks and fragments. If you have a lot of boxes you could use BCFG2: http://docs.bcfg2.org/ - Alex -- Alexander Lorenz

Re: The Common Account for Hadoop

2012-02-06 Thread Bing Li
Hi, Alex, Thanks so much for your help! I noticed that I didn't put the RSA key to the account's home directory. Best regards, Bing On Mon, Feb 6, 2012 at 6:19 PM, alo alt wrote: > check the rights of .ssh/authorized_keys on the hosts, have to be only > read- and writable for the user (includ

Re: Can I write to an compressed file which is located in hdfs?

2012-02-06 Thread bejoy . hadoop
Hi If you have log files enough to become at least one block size in an hour. You can go ahead as - run a scheduled job every hour that compresses the log files for that hour and stores them on to hdfs (can use LZO or even Snappy to compress) - if your hive does more frequent analysis on thi

Re: Can I write to an compressed file which is located in hdfs?

2012-02-06 Thread Xiaobin She
hi Bejoy , thank you for your reply. actually I have set up an test cluster which has one namenode/jobtracker and two datanode/tasktracker, and I have make an test on this cluster. I fetch the log file of one of our modules from the log collector machines by rsync, and then I use hive command li

Re: working with SAS

2012-02-06 Thread Michel Segel
Both responses assume replacing SAS w a Hadoop cluster. I would agree that going to EC2 might make sense in terms of a PoC before investing in a physical cluster, but we need to know more about the underlying problem. First, can the problem be broken down in to something that can be accomplished

Re: Can I write to an compressed file which is located in hdfs?

2012-02-06 Thread David Sinclair
Hi, You may want to have a look at the Flume project from Cloudera. I use it for writing data into HDFS. https://ccp.cloudera.com/display/SUPPORT/Downloads dave 2012/2/6 Xiaobin She > hi Bejoy , > > thank you for your reply. > > actually I have set up an test cluster which has one namenode/jo

Re: Can I write to an compressed file which is located in hdfs?

2012-02-06 Thread bejoy . hadoop
Hi I agree with David on the point, you can achieve step 1 of my previous response with flume. ie load real time inflow of data in compressed format into hdfs. You can specify a time interval or data size in flume collector that determines when to flush data on to hdfs. Regards Bejoy K

HDFS Files Seem to be Stored in the Wrong Location?

2012-02-06 Thread Eli Finkelshteyn
Hi, I have a pseudo-distributed Hadoop cluster setup, and I'm currently hoping to put about 100 gigs of files on it to play around with. I got a unix box at work no one else is using for this, and running a df -h, I get: FilesystemSize Used Avail Use% Mounted on /dev/sda1

Re: HDFS Files Seem to be Stored in the Wrong Location?

2012-02-06 Thread Harsh J
You need your dfs.data.dir configured to the bigger disks for data. That config targets the datanodes. The one you've overriden is for the namenode's metadata, and hence the default dfs.data.dir config is writing to /tmp on your root disk (which is a bad thing, gets wiped after a reboot). On Mon,

Tom White's book, 2nd ed. Which API?

2012-02-06 Thread Keith Wiley
I have the first edition of Tom White's O'Reilly Hadoop book and I was curious about the second edition. I realize it adds new sections on some of the wrapper tools, like Hive, but as far as the core Hadoop documentation is concerned, I'm wondering if there is much difference? In particular, I

Re: HDFS Files Seem to be Stored in the Wrong Location?

2012-02-06 Thread Eli Finkelshteyn
Ah, crud. Typo on my part. Don't know how I didn't notice that. Thanks! On 2/6/12 11:30 AM, Harsh J wrote: You need your dfs.data.dir configured to the bigger disks for data. That config targets the datanodes. The one you've overriden is for the namenode's metadata, and hence the default dfs.da

Re: Tom White's book, 2nd ed. Which API?

2012-02-06 Thread zep
On Monday, February 06, 2012 11:36:10 AM, Keith Wiley wrote: > I have the first edition of Tom White's O'Reilly Hadoop book and I was > curious about the second edition. I realize it adds new sections on some of > the wrapper tools, like Hive, but as far as the core Hadoop documentation is > co

Re: Tom White's book, 2nd ed. Which API?

2012-02-06 Thread W.P. McNeill
The second edition of Tom White's *Hadoop: The Definitive Guide * uses the old API for its examples, though it does contain a brief two-page overview of the new API. The first edition is all old API.

Re: Tom White's book, 2nd ed. Which API?

2012-02-06 Thread Richard Nadeau
If you're looking to buy the 2nd edition you might want to wait, the third edition is in the works now. Regards, Rick On Feb 6, 2012 10:24 AM, "W.P. McNeill" wrote: > The second edition of Tom White's *Hadoop: The Definitive > Guide > * uses the ol

Re: Does Hadoop 0.20.205 and Ganglia 3.1.7 compatible with each other ?

2012-02-06 Thread Vitthal "Suhas" Gogate
I assume you have seen the following information on Hadoop twiki, http://wiki.apache.org/hadoop/GangliaMetrics So do you use GangliaContext31 in hadoop-metrics2.properties? We use Ganglia 3.2 with Hadoop 20.205 and works fine (I remember seeing gmetad sometime goes down due to buffer overflow pr

Re: Does Hadoop 0.20.205 and Ganglia 3.1.7 compatible with each other ?

2012-02-06 Thread mete
Hello, i also face this issue when using GangliaContext31 and hadoop-1.0.0, and ganglia 3.1.7 (also tried 3.1.2). I continuously get buffer overflows as soon as i restart the gmetad. Regards Mete On Mon, Feb 6, 2012 at 7:42 PM, Vitthal "Suhas" Gogate < gog...@hortonworks.com> wrote: > I assume yo

Re: How to Set the Value of hadoop.tmp.dir?

2012-02-06 Thread bejoy . hadoop
Hi Bing What is yout value for dfs.name.dir and dfs.data.dir ? I believe that is still pointing to /tmp. Better to change it to another location as /tmp gets wiped on every reboot. --Original Message-- From: Bing Li To: common-user@hadoop.apache.org ReplyTo: common-user@hadoop.apac

Re: Tom White's book, 2nd ed. Which API?

2012-02-06 Thread Russell Jurney
Or get O'Reilly Safari, which would get you both? On Feb 6, 2012, at 9:34 AM, Richard Nadeau wrote: > If you're looking to buy the 2nd edition you might want to wait, the third > edition is in the works now. > > Regards, > Rick > On Feb 6, 2012 10:24 AM, "W.P. McNeill" wrote: > >> The second

Re: Does Hadoop 0.20.205 and Ganglia 3.1.7 compatible with each other ?

2012-02-06 Thread Merto Mertek
Yes I am encoutering the same problems and like Mete said few seconds after restarting a segmentation fault appears.. here is my conf.. And here are some info from /var/log/messages (ubuntu server 10.10): kernel: [424447.140641] gmetad[26115] general protection ip:

RE: Tom White's book, 2nd ed. Which API?

2012-02-06 Thread GOEKE, MATTHEW (AG/1000)
I haven't gotten a chance to look at the rough cut of 3rd out on safari right now but what are the main differences between it and the 2nd edition? -Original Message- From: Russell Jurney [mailto:russell.jur...@gmail.com] Sent: Monday, February 06, 2012 3:35 PM To: common-user@hadoop.apa

Re: Does Hadoop 0.20.205 and Ganglia 3.1.7 compatible with each other ?

2012-02-06 Thread Varun Kapoor
Hey Merto, I've been digging into this problem since Sunday, and believe I may have root-caused it. I'm using ganglia-3.2.0, rrdtool-1.4.5 and http://svn.apache.org/viewvc/hadoop/common/branches/branch-1/ (which I believe should be running essentially the identical relevant code as 0.20.205). Wh

Re: Tom White's book, 2nd ed. Which API?

2012-02-06 Thread Keith Wiley
Thanks everyone. I knew about the upcoming third edition. I'm not sure I want to wait until May to learn the "new" API (pretty old actually). I'd like to find a resource that goes through the new API. I realize Tom White's examples are offered with the new API online, I was just hoping for s

Re: Does Hadoop 0.20.205 and Ganglia 3.1.7 compatible with each other ?

2012-02-06 Thread Merto Mertek
I have tried to run it but it repeats crashing.. - When you start gmetad and Hadoop is not emitting metrics, everything > is peachy. > Right, running just ganglia without running hadoop jobs seems stable for at least a day.. > - When you start Hadoop (and it thus starts emitting metrics),

Re: Hadoop does not start on Windows XP

2012-02-06 Thread Jay
Hi Ron, Thank you. I deleted the Hadoop directory from my Windows folder. The untar/unzipped on Cygwin ad the directory d:\hadoop (Ex: here is a path: D:\Hadoop\hadoop-1.0.0\bin\ ) Now I could start Hadoop: $ bin/hadoop start-all.sh Above worked. But similar problem persists: $ bin/hadoop fs -

Re: Can I write to an compressed file which is located in hdfs?

2012-02-06 Thread Xiaobin She
hi Bejoy and David, thank you for you help. So I can't directly write logs or append logs into an compressed file in hdfs, right? Can I compress an file which is already in hdfs and has not been compressed? If I can , how can I do that? Thanks! 2012/2/6 > Hi > I agree with David on

Re: Does Hadoop 0.20.205 and Ganglia 3.1.7 compatible with each other ?

2012-02-06 Thread mete
Same with Merto's situation here, it always overflows short time after the restart. Without the hadoop metrics enabled everything is smooth. Regards Mete On Tue, Feb 7, 2012 at 4:58 AM, Merto Mertek wrote: > I have tried to run it but it repeats crashing.. > > - When you start gmetad and Hadoo

The Mapper does not run from JobControl

2012-02-06 Thread prajor
Using Hadoop version 0.20.. I am creating a chain of jobs job1 and job2 (mappers of which are in x.jar, there is no reducer) , with dependency and submitting to hadoop cluster using JobControl. Note I have setJarByClass and getJar gives the correct jar file, when checked before submission. Submis

The Mapper does not run from JobControl

2012-02-06 Thread prajor
Using Hadoop version 0.20.. I am creating a chain of jobs job1 and job2 (mappers of which are in x.jar, there is no reducer) , with dependency and submitting to hadoop cluster using JobControl. Note I have setJarByClass and getJar gives the correct jar file, when checked before submission. Submis

Hadoop Active Directory Integration

2012-02-06 Thread Benyi Wang
Hi, I have questions about Hadoop Active Directory Integration: 1. When using Active Directory, do we still need to create a Linux account for each user on each Linux node? 2. What about if I enable queue acls and use fairscheduler? Will task trackers send all ACLs check to Active dir

Re: Can I write to an compressed file which is located in hdfs?

2012-02-06 Thread bejoy . hadoop
Hi AFAIK I don't think it is possible to append into a compressed file. If you have files in hdfs on a dir and you need to compress the same (like files for an hour) you can use MapReduce to do that by setting mapred.output.compress = true and mapred.output.compression.codec='theCodecYouPre