You might find this useful as well:
What are the ways of importing data to HDFS from remote locations? I need
this process to be well-managed and automated.
Here are just some of the options. First you should look at available HDFS
shell commands. For large inter/intra-cluster copying distcp
Hi Luan,
That's not a new question on these mailing lists, so I'd suggest to start
digging into links at
http://search-hadoop.com/?q=research+project+ideaspage. Hadoop-related
projects are relatively young and full of ideas, good
luck with finding your spot!
Alex Baranau
Sematext ::
Sorry, looks like the link I provided got corrupted, the original was:
http://search-hadoop.com/?q=research+project+ideas
Alex Baranau
Sematext :: http://sematext.com/ :: Solr - Lucene - Nautch - Hadoop - HBase
Hadoop ecosystem search :: http://search-hadoop.com/
On Tue, Sep 7, 2010 at
In a small cluster (4 machines) with not many jobs can a Namenode be
configured as a Datanode so it can assist in our MR tasks.
If so how is this configured? Is it just simply marking it as such in
the slaves file? Thanks
On Tue, Sep 7, 2010 at 12:09 PM, Mark static.void@gmail.com wrote:
In a small cluster (4 machines) with not many jobs can a Namenode be
configured as a Datanode so it can assist in our MR tasks.
If so how is this configured? Is it just simply marking it as such in the
slaves file?
Thanks
On 9/7/10 9:16 AM, abhishek sharma wrote:
On Tue, Sep 7, 2010 at 12:09 PM, Markstatic.void@gmail.com wrote:
In a small cluster (4 machines) with not many jobs can a Namenode be
configured as a Datanode so it can assist in our MR tasks.
If so how is this configured? Is it just
Luan,
Pig keeps a list at http://wiki.apache.org/pig/PigJournal of all the
Pig projects we know of. Many of these are more project based, but
some could be turned into actual research. If you do choose one of
these, please let us know (over on pig-...@hadoop.apache.org) so we
can mark
Forgot to mention the reason most people choose a particular build is
due to the compatibility issue with TT's (eg provisioning a new
machine with a new hadoop means it won't work with the existing hadoop
builds). To address this we included HADOOP-5203 (TT's version build
is too restrictive) in
how to change the configure in order to trigger GC earlier not when it is close
to the memory maximum?
2010-09-08
shangan
发件人: Steve Loughran
发送时间: 2010-09-06 18:16:51
收件人: common-user
抄送:
主题: Re: namenode consume quite a lot of memory with only serveral hundredsof
files in it
The fact that the memory is high is not necessarily a bad thing.
Faster garbage collection implies more CPU usage.
I had some success following the tuning advice here, to make my memory
usage less spikey
http://blog.mikiobraun.de/2010/08/cassandra-gc-tuning.html
Again, less spikes != better
hi stevel:
thanks for your reply.i have not tried to debug infiniband,although i only know
it.
Now, My hadoop cluster is made with HDFS+MAPREDUCE, ,hive, derby server.i want
to put HBASE into cluster.how can i do it .can you help me .
thanks.pengbing chu
Date: Mon, 6 Sep 2010 11:14:10 +0100
Hi all,
I need to change the block size (from 128m to 64m) and have to shut down the
cluster first. I was wondering what will happen to the current files on HDFS
(with 128M block size). Are they still there and usable? If so, what is the
block size of those lagacy files?
Thanks,
-Gang
Those lagacy files won't change block size (NameNode have the mapping
between block and file)
only the new added files will have the block size of 128m
On Tue, Sep 7, 2010 at 7:27 PM, Gang Luo lgpub...@yahoo.com.cn wrote:
Hi all,
I need to change the block size (from 128m to 64m) and have to
Hey ,
M pretty new to Hadoop .
I need to Sort a Metafile (TBs) and thought of using Hadoop Sort (in
examples) for it.
My input metafile looks like this -- binary stream (only 1's and 0's). It
basically contains records of 40 bytes.
Every record goes like this :
long a; key -- 8 bytes. The rest
Hi all,
I wrote some new writable files corresponding to my data input. I added
them to /src/org//io/ where all the writables reside. Similarly, I
also wrote input/output format files and a recordreader and added them to
src/mapred/./mapred/ where all related files reside.
I want
Copy FileInputFormat and make your own subclass of LineRecordReader. I
did this same thing to make a nice CSV input reader. Yours will drop
every Nth line.
This would be a very handy tool if you could pull N unique randomly
chosen sample
sets with no correlation, giving a value from 1 to N.
Please get hadoop source code and read the comment at the beginning of
SequenceFile.java:
* pEssentially there are 3 different formats for
codeSequenceFile/codes
...
On Tue, Sep 7, 2010 at 8:13 PM, Matthew John tmatthewjohn1...@gmail.comwrote:
Hey ,
M pretty new to Hadoop .
I need to Sort a
Matthew,
You should put your code in the example source folder, and rebuild the
example. And use the new generated hadoop*version*example.jar in the
build folder.
PS: each mapred job needs a jar which contains the classes this job needs
On Tue, Sep 7, 2010 at 8:14 PM, Matthew John
Thanks a lot Jeff !
The problem is that everytime I build (using ant ) there is a build folder
created. But there is no examples.jar created inside that. I wanted to add
some files into io package and mapred package. So I suppose I should put the
files appropriately ( inside io and mapred folder
Do you run ant example ?
On Tue, Sep 7, 2010 at 10:29 PM, Matthew John
tmatthewjohn1...@gmail.com wrote:
Thanks a lot Jeff !
The problem is that everytime I build (using ant ) there is a build folder
created. But there is no examples.jar created inside that. I wanted to add
some files into
20 matches
Mail list logo