Re: Client access

2010-09-07 Thread Alex Baranau
You might find this useful as well: What are the ways of importing data to HDFS from remote locations? I need this process to be well-managed and automated. Here are just some of the options. First you should look at available HDFS shell commands. For large inter/intra-cluster copying distcp

Re: Research projects with Hadoop

2010-09-07 Thread Alex Baranau
Hi Luan, That's not a new question on these mailing lists, so I'd suggest to start digging into links at http://search-hadoop.com/?q=research+project+ideaspage. Hadoop-related projects are relatively young and full of ideas, good luck with finding your spot! Alex Baranau Sematext ::

Re: Research projects with Hadoop

2010-09-07 Thread Alex Baranau
Sorry, looks like the link I provided got corrupted, the original was: http://search-hadoop.com/?q=research+project+ideas Alex Baranau Sematext :: http://sematext.com/ :: Solr - Lucene - Nautch - Hadoop - HBase Hadoop ecosystem search :: http://search-hadoop.com/ On Tue, Sep 7, 2010 at

Namenode Datanode

2010-09-07 Thread Mark
In a small cluster (4 machines) with not many jobs can a Namenode be configured as a Datanode so it can assist in our MR tasks. If so how is this configured? Is it just simply marking it as such in the slaves file? Thanks

Re: Namenode Datanode

2010-09-07 Thread abhishek sharma
On Tue, Sep 7, 2010 at 12:09 PM, Mark static.void@gmail.com wrote:  In a small cluster (4 machines) with not many jobs can a Namenode be configured as a Datanode so it can assist in our MR tasks.  If so how is this configured? Is it just simply marking it as such in the slaves file?

Re: Namenode Datanode

2010-09-07 Thread Mark
Thanks On 9/7/10 9:16 AM, abhishek sharma wrote: On Tue, Sep 7, 2010 at 12:09 PM, Markstatic.void@gmail.com wrote: In a small cluster (4 machines) with not many jobs can a Namenode be configured as a Datanode so it can assist in our MR tasks. If so how is this configured? Is it just

Re: Research projects with Hadoop

2010-09-07 Thread Alan Gates
Luan, Pig keeps a list at http://wiki.apache.org/pig/PigJournal of all the Pig projects we know of. Many of these are more project based, but some could be turned into actual research. If you do choose one of these, please let us know (over on pig-...@hadoop.apache.org) so we can mark

Re: how to revert from a new version to an older one (CDH3)?

2010-09-07 Thread Eli Collins
Forgot to mention the reason most people choose a particular build is due to the compatibility issue with TT's (eg provisioning a new machine with a new hadoop means it won't work with the existing hadoop builds). To address this we included HADOOP-5203 (TT's version build is too restrictive) in

Re: Re: namenode consume quite a lot of memory with only serveral hundredsof files in it

2010-09-07 Thread shangan
how to change the configure in order to trigger GC earlier not when it is close to the memory maximum? 2010-09-08 shangan 发件人: Steve Loughran 发送时间: 2010-09-06 18:16:51 收件人: common-user 抄送: 主题: Re: namenode consume quite a lot of memory with only serveral hundredsof files in it

Re: Re: namenode consume quite a lot of memory with only serveral hundredsof files in it

2010-09-07 Thread Edward Capriolo
The fact that the memory is high is not necessarily a bad thing. Faster garbage collection implies more CPU usage. I had some success following the tuning advice here, to make my memory usage less spikey http://blog.mikiobraun.de/2010/08/cassandra-gc-tuning.html Again, less spikes != better

RE: the question of hadoop

2010-09-07 Thread 褚 鵬兵
hi stevel: thanks for your reply.i have not tried to debug infiniband,although i only know it. Now, My hadoop cluster is made with HDFS+MAPREDUCE, ,hive, derby server.i want to put HBASE into cluster.how can i do it .can you help me . thanks.pengbing chu Date: Mon, 6 Sep 2010 11:14:10 +0100

change HDFS block size

2010-09-07 Thread Gang Luo
Hi all, I need to change the block size (from 128m to 64m) and have to shut down the cluster first. I was wondering what will happen to the current files on HDFS (with 128M block size). Are they still there and usable? If so, what is the block size of those lagacy files? Thanks, -Gang

Re: change HDFS block size

2010-09-07 Thread Jeff Zhang
Those lagacy files won't change block size (NameNode have the mapping between block and file) only the new added files will have the block size of 128m On Tue, Sep 7, 2010 at 7:27 PM, Gang Luo lgpub...@yahoo.com.cn wrote: Hi all, I need to change the block size (from 128m to 64m) and have to

Sort with customized input/output !!

2010-09-07 Thread Matthew John
Hey , M pretty new to Hadoop . I need to Sort a Metafile (TBs) and thought of using Hadoop Sort (in examples) for it. My input metafile looks like this -- binary stream (only 1's and 0's). It basically contains records of 40 bytes. Every record goes like this : long a; key -- 8 bytes. The rest

How to rebuild Hadoop ??

2010-09-07 Thread Matthew John
Hi all, I wrote some new writable files corresponding to my data input. I added them to /src/org//io/ where all the writables reside. Similarly, I also wrote input/output format files and a recordreader and added them to src/mapred/./mapred/ where all related files reside. I want

Re: randomly pick rows from data files

2010-09-07 Thread Lance Norskog
Copy FileInputFormat and make your own subclass of LineRecordReader. I did this same thing to make a nice CSV input reader. Yours will drop every Nth line. This would be a very handy tool if you could pull N unique randomly chosen sample sets with no correlation, giving a value from 1 to N.

Re: Sort with customized input/output !!

2010-09-07 Thread Ted Yu
Please get hadoop source code and read the comment at the beginning of SequenceFile.java: * pEssentially there are 3 different formats for codeSequenceFile/codes ... On Tue, Sep 7, 2010 at 8:13 PM, Matthew John tmatthewjohn1...@gmail.comwrote: Hey , M pretty new to Hadoop . I need to Sort a

Re: How to rebuild Hadoop ??

2010-09-07 Thread Jeff Zhang
Matthew, You should put your code in the example source folder, and rebuild the example. And use the new generated hadoop*version*example.jar in the build folder. PS: each mapred job needs a jar which contains the classes this job needs On Tue, Sep 7, 2010 at 8:14 PM, Matthew John

Re: How to rebuild Hadoop ??

2010-09-07 Thread Matthew John
Thanks a lot Jeff ! The problem is that everytime I build (using ant ) there is a build folder created. But there is no examples.jar created inside that. I wanted to add some files into io package and mapred package. So I suppose I should put the files appropriately ( inside io and mapred folder

Re: How to rebuild Hadoop ??

2010-09-07 Thread Jeff Zhang
Do you run ant example ? On Tue, Sep 7, 2010 at 10:29 PM, Matthew John tmatthewjohn1...@gmail.com wrote: Thanks a lot Jeff ! The problem is that everytime I build (using ant ) there is a build folder created. But there is no examples.jar created inside that. I wanted to add some files into