Hi folks:
How to configure a remote client to the HDFS? For a cluster with a few
nodes, could we make one node as the remote client but not one of the data
nodes/task nodes?
What to change the hadoop-site.xml to configure this? Has anyone tried this
before?
Thanks.
Richard
Kevin 写道:
Hi,
This is about dfs only, not to consider mapreduce. It may sound like a
strange need, but sometimes I want to read a block from a specific
data node which holds a replica. Figuring out which datanodes have the
block is easy. But is there an easy way to specify which datanode I
want
Richard Zhang Wrote:
Hi folks:
How to configure a remote client to the HDFS? For a cluster with a few
nodes, could we make one node as the remote client but not one of the data
nodes/task nodes?
What to change the hadoop-site.xml to configure this? Has anyone tried this
before?
Thanks.
Richard
Hi,
I have installed Hadoop on 20 nodes (data storage) and one master (namenode)
to which i want to add data. I have learned that this is possible through a
Java API or via the Hadoop shell. However, i would like to mount the HDFS
using FUSE and i discovered that there's a contrib/fuse-dfs within
Hi Sebastian,
The problem is that hdfs.so is supposed to be in build/libhdfs but for some
reason isn't.
Have you tried doing a ant compile-libhdfs -Dlibhdfs=1 ?
And then checked if hdfs.so is in build/libhdfs ?
Thanks, pete
On 8/6/08 5:04 AM, Sebastian Vieira [EMAIL PROTECTED] wrote:
Hi,
Sorry - I see the problem now: should be:
Ant compile-contrib -Dlibhdfs=1 -Dfusedfs=1
Compile-contrib depends on compile-libhdfs which also requires the
-Dlibhdfs=1 property to be set.
pete
On 8/6/08 5:04 AM, Sebastian Vieira [EMAIL PROTECTED] wrote:
Hi,
I have installed Hadoop on 20
On Tue, Aug 5, 2008 at 11:20 AM, Theocharis Ian Athanasakis
[EMAIL PROTECTED] wrote:
What's the proposed the design pattern for a reducer that needs two
sets of inputs?
Are there any source code examples?
Thanks :)
If I understand your question, I think one answer is to move the
problem out
Konstantin Shvachko wrote:
Imho we either need to correct it or remove.
+1
Doug
Thank you for the suggestion. I looked at DFSClient. It appears that
chooseDataNode method decides which data node to connect to. Currently
it chooses the first non-dead data node returned by namenode, which
have sorted the nodes by proximity to the client. However,
chooseDataNode is private, so
Seeing as there is no search function on the archives, I'm relegated
to asking a possibly redundant question or four:
I have, as a sample setup:
idx1-trackerJobTracker
idx2-namenode NameNode
idx3-slave DataTracker
...
idx20-slave DataTracker
Q1: Can I put the same
Yes, the namenode is in charge of deciding the proximity by using
DNSToSwitchMapping. On the other hand, I am exploring the possibility
to let the client decide which data node to connect to, since
sometimes network hierarchy is so complex or dynamic that we better
leave it to the client to find
I need this because I do not want to trust namenode's ordering. For
applications where network congestion is rare, we should let the
client to decide which data node to load from.
If this is the case, then providing a method to re-order the datanode list
shouldnt be hard. May be open a JIRA
Hi James,
You can put the same hadoop-site.xml on all machines. Yes, you do want a
secondary NN - a single NN is a SPOF. Browser the archives a few days back to
find an email from Paul about DRBD (disk replication) to avoid this SPOF.
Otis
--
Sematext -- http://sematext.com/ -- Lucene -
Thus spake Otis Gospodnetic::
Hi James,
You can put the same hadoop-site.xml on all machines. Yes, you do want a
secondary NN - a single NN is a SPOF. Browser the archives a few days
back to find an email from Paul about DRBD (disk replication) to avoid
this SPOF.
Okay, thank you! good to
Thanks. I've looked brievly at HBase and thought, that is was designed
for very large datasets only. But now i've got the feeling that it's
also suitable for distributed, scalable persitence of small datasets
under huge requests. Is it this way?
Leon Mergen schrieb:
Hello,
On Tue, Aug 5,
Thanks. Hopefully you keep us informed via this thread.
Kylie McCormick schrieb:
Hello:
I am actually working on this myself on my project Multisearch. The Map()
function uses clients to connect to services and collect responses, and the
Reduce() function merges them together. I'm working on
Thank you for the idea of submitting request. However, I guess I could
not wait until it is served. The worst case is that I would probably
hack my copy of hadoop and rebuild it.
-Kevin
On Wed, Aug 6, 2008 at 11:31 AM, lohit [EMAIL PROTECTED] wrote:
I need this because I do not want to trust
On 8/6/08 11:52 AM, Otis Gospodnetic [EMAIL PROTECTED] wrote:
You can put the same hadoop-site.xml on all machines. Yes, you do want a
secondary NN - a single NN is a SPOF. Browser the archives a few days back to
find an email from Paul about DRBD (disk replication) to avoid this SPOF.
hey all,
often i find it would be convenient for me to run conventional unix
commands on hdfs, such as using the following to delete the contents
of my HDFS
hadoop dfs -rm *
or moving files from one folder to another:
hadoop dfs -mv /path/one/* path/two/
does anyone know of a way to do
Thus spake James Graham (Greywolf)::
Now I have something interesting going on. Given the following configuration
file, what am I doing wrong? When I type start-dfs.sh on the namenode,
as instructed in the docs, I end up with, effectively, Address already
in use;
shutting down NameNode.
I do
1. how do you enable compression of the data on disk.
2. how do you enable compression on connections from hdfs clients
Hi,
I guess this thread is old. But I eventually need to raise the
question again as I am more into dfs now. Would a line be broken
between adjacent blocks in dfs? Can line be preserved in block level?
-Kevin
On Wed, Jul 16, 2008 at 4:57 PM, Chris Douglas [EMAIL PROTECTED] wrote:
Hi,
I am planning to use distributed lucene from hadoop.contrib.index for
indexing. Has anyone used this or tested it? Any issues or comments?
I see that the design described is different from HDFS (Namenode is
stateless, stores no information regarding blocks for files, etc) . Does
anyone
If I use one node for reduce, hadoop can sort the result.
If I use 30 nodes for reduce, the result is part-0 ~ part-00029.
How make all the 30 parts sort globally and all the files in part-1
are greater that part-0 ?
Thanks a lot
Xing
I suppose you meant to sort the result globally across files. AFAIK,
This is not currently supported unless you have only one reducer. It
is said that version 0.19 will introduce such capability.
-Kevin
On Wed, Aug 6, 2008 at 6:01 PM, Xing [EMAIL PROTECTED] wrote:
If I use one node for
You may want to write a partitioner that partitions the output from mappers
in a way that fits your definition of sorted data (e.g. all keys in
part-1 are greater than those in part-0.) Once you've done it, just
merging all the reduce output from 0 to N will give you a sorted result
file.
I guess a quick way to find an answer for your question is to look at size
of data block files stored in datanodes.
If they are all the same (e.g. 64MB), then you could say lines are NOT
preserved in block level as DFS simply cuts the original file into exact
64MB pieces.
They are almost all the
Yes, I have looked at the block files and it matches what you said. I
am just wondering if there is some property or flag that would turn
this feature on, if it exists.
-Kevin
On Wed, Aug 6, 2008 at 8:01 PM, Taeho Kang [EMAIL PROTECTED] wrote:
I guess a quick way to find an answer for your
28 matches
Mail list logo