Pydoop 0.5 released

2012-02-21 Thread Luca Pireddu
Hello everyone, we're happy to announce that we have just released Pydoop 0.5.0 (http://pydoop.sourceforge.net). The main changes with respect to the previous version are: * Pydoop now works with Hadoop 1.0.0. * Support for multiple Hadoop versions with the same Pydoop installation * Easy

Tasktracker fails

2012-02-21 Thread Adarsh Sharma
Dear all, Today I am trying to configure hadoop-0.20.205.0 on a 4 node Cluster. When I start my cluster , all daemons got started except tasktracker, don't know why task tracker fails due to following error logs. Cluster is in private network.My /etc/hosts file contains all IP hostname

Re: Did DFSClient cache the file data into a temporary local file

2012-02-21 Thread Harsh J
Seven, Yes that strategy has changed since long ago, but the doc on it was only recently updated: https://issues.apache.org/jira/browse/HDFS-1454 (and some more improvements followed later IIRC) 2012/2/21 seven garfee garfee.se...@gmail.com: hi,all As this Page(

Re: Pydoop 0.5 released

2012-02-21 Thread Alexander Lorenz
awesome, guys! -Alex sent via my mobile device On Feb 20, 2012, at 11:59 PM, Luca Pireddu pire...@crs4.it wrote: Hello everyone, we're happy to announce that we have just released Pydoop 0.5.0 (http://pydoop.sourceforge.net). The main changes with respect to the previous version are:

access hbase table from hadoop mapreduce

2012-02-21 Thread amsal
hi.. i want to access hbase table from hadoop mapreducei m using windowsXP and cygwin i m using hadoop-0.20.2 and hbase-0.92.0 hadoop cluster is working finei am able to run mapreduce wordcount successfully on 3 pc's hbase is also working .i can cerate table from shell i have tried

Re: Problem in installation

2012-02-21 Thread Harsh J
Dheeraj, In most homogenous cluster environments, people do keep the configs synced. However, that isn't necessary. It is alright to have different *-site.xml contents on each slave, tailored for its provided resources. For instance if you have 3 slaves with 3 disks, and 1 slave with 2, you can

Re: access hbase table from hadoop mapreduce

2012-02-21 Thread Clint Heath
It sounds to me like you just need to include your HBase jars into your compiler's classpath like so: javac -classpath $HADOOP_HOME Example.java where $HADOOP_HOME includes all your base hadoop jars as well as your hbase jars. then you would want to put the resulting Example.class file into

Re: Number of Under-Replicated Blocks ?

2012-02-21 Thread Chris Curtin
Have you had any Name Node failures lately? I had them every couple of days and found that there were files being left in hdfs /log/hadoop/tmp/mapred/staging/... when communications with the Name Node was lost. Not sure why they never got replicated correctly (maybe because they are in /log?) I

Re: Writing small files to one big file in hdfs

2012-02-21 Thread Joey Echeverria
I'd recommend making a SequenceFile[1] to store each XML file as a value. -Joey [1] http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/io/SequenceFile.html On Tue, Feb 21, 2012 at 12:15 PM, Mohit Anchlia mohitanch...@gmail.comwrote: We have small xml files. Currently I am

Re: Writing small files to one big file in hdfs

2012-02-21 Thread Bejoy Ks
Mohit Rather than just appending the content into a normal text file or so, you can create a sequence file with the individual smaller file content as values. Regards Bejoy.K.S On Tue, Feb 21, 2012 at 10:45 PM, Mohit Anchlia mohitanch...@gmail.comwrote: We have small xml files.

Re: Writing small files to one big file in hdfs

2012-02-21 Thread Mohit Anchlia
On Tue, Feb 21, 2012 at 9:25 AM, Bejoy Ks bejoy.had...@gmail.com wrote: Mohit Rather than just appending the content into a normal text file or so, you can create a sequence file with the individual smaller file content as values. Thanks. I was planning to use pig's

Dynamic changing of slaves

2012-02-21 Thread theta
Hi, I am working on a project which requires a setup as follows: One master with four slaves.However, when a map only program is run, the master dynamically selects the slave to run the map. For example, when the program is run for the first time, slave 2 is selected to run the map and reduce

Re: Writing small files to one big file in hdfs

2012-02-21 Thread Bill Graham
You might want to check out File Crusher: http://www.jointhegrid.com/hadoop_filecrush/index.jsp I've never used it, but it sounds like it could be helpful. On Tue, Feb 21, 2012 at 10:25 AM, Bejoy Ks bejoy.had...@gmail.com wrote: Hi Mohit AFAIK XMLLoader in pig won't be suited for

Re: Writing small files to one big file in hdfs

2012-02-21 Thread Mohit Anchlia
I am trying to look for examples that demonstrates using sequence files including writing to it and then running mapred on it, but unable to find one. Could you please point me to some examples of sequence files? On Tue, Feb 21, 2012 at 10:25 AM, Bejoy Ks bejoy.had...@gmail.com wrote: Hi Mohit

Re: Writing small files to one big file in hdfs

2012-02-21 Thread Arko Provo Mukherjee
Hi, Let's say all the smaller files are in the same directory. Then u can do: *BufferedWriter output = new BufferedWriter (newOutputStreamWriter(fs.create(output_path, true))); // Output path* *FileStatus[] output_files = fs.listStatus(new Path(input_path)); // Input directory* *for ( int

Re: Writing small files to one big file in hdfs

2012-02-21 Thread Mohit Anchlia
Thanks How does mapreduce work on sequence file? Is there an example I can look at? On Tue, Feb 21, 2012 at 11:34 AM, Arko Provo Mukherjee arkoprovomukher...@gmail.com wrote: Hi, Let's say all the smaller files are in the same directory. Then u can do: *BufferedWriter output = new

Re: Writing small files to one big file in hdfs

2012-02-21 Thread Arko Provo Mukherjee
Hi, I think the following link will help: http://hadoop.apache.org/common/docs/current/mapred_tutorial.html Cheers Arko On Tue, Feb 21, 2012 at 2:04 PM, Mohit Anchlia mohitanch...@gmail.comwrote: Sorry may be it's something obvious but I was wondering when map or reduce gets called what

WAN-based Hadoop high availability (HA)?

2012-02-21 Thread Saqib Jang -- Margalla Communications
Hello, I'm a market analyst involved in researching the Hadoop space, had a quick question. I was wondering if and what type of requirements may there be for WAN-based high availability for Hadoop configurations e.g. for disaster recovery and what type of solutions may be available for such

Re: Writing to SequenceFile fails

2012-02-21 Thread Mohit Anchlia
I am past this error. Looks like I needed to use CDH libraries. I changed my maven repo. Now I am stuck at *org.apache.hadoop.security.AccessControlException *since I am not writing as user that owns the file. Looking online for solutions On Tue, Feb 21, 2012 at 12:48 PM, Mohit Anchlia

Re: WAN-based Hadoop high availability (HA)?

2012-02-21 Thread Jamack, Peter
For High Availability? The issue is the nameNode, going forward there is a Federated NameNode environment, but I haven't used it and not sure If it's kind of an active-active name node environment or just a sharded environment. DR/BR is always an issue when you have petabytes of data across

Re: Dynamic changing of slaves

2012-02-21 Thread Merto Mertek
I think that job configuration does not allow you such setup, however maybe I missed something.. Probably I would tackle this problem from the scheduler source. The default one is JobQueueTaskScheduler which preserves a fifo based queue. When a tasktracker (your slave) tells the jobtracker that

Re: Dynamic changing of slaves

2012-02-21 Thread Jamack, Peter
Yeah, I'm not sure how you can actually do it, as I haven't done it before, but from a logical perspective, you'd probably have to do a lot of configuration changes and maybe even write up some complicated M/R code, coordination/rules engine logic, change how the heartbeat scheduler operate to

Re: Writing small files to one big file in hdfs

2012-02-21 Thread Mohit Anchlia
Need some more help. I wrote sequence file using below code but now when I run mapreduce job I get file.*java.lang.ClassCastException*: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.io.Text even though I didn't use LongWritable when I originally wrote to the sequence

Re: Writing small files to one big file in hdfs

2012-02-21 Thread Mohit Anchlia
It looks like in mapper values are coming as binary instead of Text. Is this expected from sequence file? I initially wrote SequenceFile with Text values. On Tue, Feb 21, 2012 at 4:13 PM, Mohit Anchlia mohitanch...@gmail.comwrote: Need some more help. I wrote sequence file using below code but

Re: Writing small files to one big file in hdfs

2012-02-21 Thread Edward Capriolo
On Tue, Feb 21, 2012 at 7:50 PM, Mohit Anchlia mohitanch...@gmail.com wrote: It looks like in mapper values are coming as binary instead of Text. Is this expected from sequence file? I initially wrote SequenceFile with Text values. On Tue, Feb 21, 2012 at 4:13 PM, Mohit Anchlia

Re: Did DFSClient cache the file data into a temporary local file

2012-02-21 Thread seven garfee
thanks a lot. 2012/2/21 Harsh J ha...@cloudera.com Seven, Yes that strategy has changed since long ago, but the doc on it was only recently updated: https://issues.apache.org/jira/browse/HDFS-1454 (and some more improvements followed later IIRC) 2012/2/21 seven garfee

Re: Writing small files to one big file in hdfs

2012-02-21 Thread Mohit Anchlia
Finally figured it out. I needed to use SequenceFileAstextInputFormat. There is just lack of examples that makes it difficult when you start. On Tue, Feb 21, 2012 at 4:50 PM, Mohit Anchlia mohitanch...@gmail.comwrote: It looks like in mapper values are coming as binary instead of Text. Is this

Re: Writing to SequenceFile fails

2012-02-21 Thread Harsh J
1. It is important to ensure your clients are on the same major version jars as your server. 2. You are probably looking for hadoop fs -chown and hadoop fs -chmod tools to modify permissions. On Wed, Feb 22, 2012 at 3:15 AM, Mohit Anchlia mohitanch...@gmail.com wrote: I am past this error. Looks

Changing into Replication factor

2012-02-21 Thread hadoop hive
HI Folks, Rite now i m having replication factor 2, but now i want to make it three for sum tables so how can i do that for specific tables, so that whenever the data would be loaded in those tables it can automatically replicated into three nodes. Or i need to replicate for all the tables. and