how to configure a remote client? Has anyone tried this before?

2008-08-06 Thread Richard Zhang
Hi folks: How to configure a remote client to the HDFS? For a cluster with a few nodes, could we make one node as the remote client but not one of the data nodes/task nodes? What to change the hadoop-site.xml to configure this? Has anyone tried this before? Thanks. Richard

Re: DFS. How to read from a specific datanode

2008-08-06 Thread Samuel Guo
Kevin 写道: Hi, This is about dfs only, not to consider mapreduce. It may sound like a strange need, but sometimes I want to read a block from a specific data node which holds a replica. Figuring out which datanodes have the block is easy. But is there an easy way to specify which datanode I want

Re: how to configure a remote client? Has anyone tried this before?

2008-08-06 Thread Samuel Guo
Richard Zhang Wrote: Hi folks: How to configure a remote client to the HDFS? For a cluster with a few nodes, could we make one node as the remote client but not one of the data nodes/task nodes? What to change the hadoop-site.xml to configure this? Has anyone tried this before? Thanks. Richard

fuse-dfs

2008-08-06 Thread Sebastian Vieira
Hi, I have installed Hadoop on 20 nodes (data storage) and one master (namenode) to which i want to add data. I have learned that this is possible through a Java API or via the Hadoop shell. However, i would like to mount the HDFS using FUSE and i discovered that there's a contrib/fuse-dfs within

Re: fuse-dfs

2008-08-06 Thread Pete Wyckoff
Hi Sebastian, The problem is that hdfs.so is supposed to be in build/libhdfs but for some reason isn't. Have you tried doing a ant compile-libhdfs -Dlibhdfs=1 ? And then checked if hdfs.so is in build/libhdfs ? Thanks, pete On 8/6/08 5:04 AM, Sebastian Vieira [EMAIL PROTECTED] wrote: Hi,

Re: fuse-dfs

2008-08-06 Thread Pete Wyckoff
Sorry - I see the problem now: should be: Ant compile-contrib -Dlibhdfs=1 -Dfusedfs=1 Compile-contrib depends on compile-libhdfs which also requires the -Dlibhdfs=1 property to be set. pete On 8/6/08 5:04 AM, Sebastian Vieira [EMAIL PROTECTED] wrote: Hi, I have installed Hadoop on 20

Re: Reducer with two sets of inputs

2008-08-06 Thread James Moore
On Tue, Aug 5, 2008 at 11:20 AM, Theocharis Ian Athanasakis [EMAIL PROTECTED] wrote: What's the proposed the design pattern for a reducer that needs two sets of inputs? Are there any source code examples? Thanks :) If I understand your question, I think one answer is to move the problem out

Re: Confusing NameNodeFailover page in Hadoop Wiki

2008-08-06 Thread Doug Cutting
Konstantin Shvachko wrote: Imho we either need to correct it or remove. +1 Doug

Re: DFS. How to read from a specific datanode

2008-08-06 Thread Kevin
Thank you for the suggestion. I looked at DFSClient. It appears that chooseDataNode method decides which data node to connect to. Currently it chooses the first non-dead data node returned by namenode, which have sorted the nodes by proximity to the client. However, chooseDataNode is private, so

Configuration: I need help.

2008-08-06 Thread James Graham (Greywolf)
Seeing as there is no search function on the archives, I'm relegated to asking a possibly redundant question or four: I have, as a sample setup: idx1-trackerJobTracker idx2-namenode NameNode idx3-slave DataTracker ... idx20-slave DataTracker Q1: Can I put the same

Re: DFS. How to read from a specific datanode

2008-08-06 Thread Kevin
Yes, the namenode is in charge of deciding the proximity by using DNSToSwitchMapping. On the other hand, I am exploring the possibility to let the client decide which data node to connect to, since sometimes network hierarchy is so complex or dynamic that we better leave it to the client to find

Re: DFS. How to read from a specific datanode

2008-08-06 Thread lohit
I need this because I do not want to trust namenode's ordering. For applications where network congestion is rare, we should let the client to decide which data node to load from. If this is the case, then providing a method to re-order the datanode list shouldnt be hard. May be open a JIRA

Re: Configuration: I need help.

2008-08-06 Thread Otis Gospodnetic
Hi James, You can put the same hadoop-site.xml on all machines. Yes, you do want a secondary NN - a single NN is a SPOF. Browser the archives a few days back to find an email from Paul about DRBD (disk replication) to avoid this SPOF. Otis -- Sematext -- http://sematext.com/ -- Lucene -

Re: Configuration: I need help.

2008-08-06 Thread James Graham (Greywolf)
Thus spake Otis Gospodnetic:: Hi James, You can put the same hadoop-site.xml on all machines. Yes, you do want a secondary NN - a single NN is a SPOF. Browser the archives a few days back to find an email from Paul about DRBD (disk replication) to avoid this SPOF. Okay, thank you! good to

Re: Hadoop also applicable in a web app environment?

2008-08-06 Thread Mork0075
Thanks. I've looked brievly at HBase and thought, that is was designed for very large datasets only. But now i've got the feeling that it's also suitable for distributed, scalable persitence of small datasets under huge requests. Is it this way? Leon Mergen schrieb: Hello, On Tue, Aug 5,

Re: Hadoop also applicable in a web app environment?

2008-08-06 Thread Mork0075
Thanks. Hopefully you keep us informed via this thread. Kylie McCormick schrieb: Hello: I am actually working on this myself on my project Multisearch. The Map() function uses clients to connect to services and collect responses, and the Reduce() function merges them together. I'm working on

Re: DFS. How to read from a specific datanode

2008-08-06 Thread Kevin
Thank you for the idea of submitting request. However, I guess I could not wait until it is served. The worst case is that I would probably hack my copy of hadoop and rebuild it. -Kevin On Wed, Aug 6, 2008 at 11:31 AM, lohit [EMAIL PROTECTED] wrote: I need this because I do not want to trust

Re: Configuration: I need help.

2008-08-06 Thread Allen Wittenauer
On 8/6/08 11:52 AM, Otis Gospodnetic [EMAIL PROTECTED] wrote: You can put the same hadoop-site.xml on all machines. Yes, you do want a secondary NN - a single NN is a SPOF. Browser the archives a few days back to find an email from Paul about DRBD (disk replication) to avoid this SPOF.

hdfs question

2008-08-06 Thread Mori Bellamy
hey all, often i find it would be convenient for me to run conventional unix commands on hdfs, such as using the following to delete the contents of my HDFS hadoop dfs -rm * or moving files from one folder to another: hadoop dfs -mv /path/one/* path/two/ does anyone know of a way to do

Re: Configuration: I need help.

2008-08-06 Thread James Graham (Greywolf)
Thus spake James Graham (Greywolf):: Now I have something interesting going on. Given the following configuration file, what am I doing wrong? When I type start-dfs.sh on the namenode, as instructed in the docs, I end up with, effectively, Address already in use; shutting down NameNode. I do

compression questions

2008-08-06 Thread eleith
1. how do you enable compression of the data on disk. 2. how do you enable compression on connections from hdfs clients

Re: Are lines broken in dfs and/or in InputSplit

2008-08-06 Thread Kevin
Hi, I guess this thread is old. But I eventually need to raise the question again as I am more into dfs now. Would a line be broken between adjacent blocks in dfs? Can line be preserved in block level? -Kevin On Wed, Jul 16, 2008 at 4:57 PM, Chris Douglas [EMAIL PROTECTED] wrote:

Distributed Lucene - from hadoop contrib

2008-08-06 Thread Deepika Khera
Hi, I am planning to use distributed lucene from hadoop.contrib.index for indexing. Has anyone used this or tested it? Any issues or comments? I see that the design described is different from HDFS (Namenode is stateless, stores no information regarding blocks for files, etc) . Does anyone

How to order all the output file if I use more than one reduce node?

2008-08-06 Thread Xing
If I use one node for reduce, hadoop can sort the result. If I use 30 nodes for reduce, the result is part-0 ~ part-00029. How make all the 30 parts sort globally and all the files in part-1 are greater that part-0 ? Thanks a lot Xing

Re: How to order all the output file if I use more than one reduce node?

2008-08-06 Thread Kevin
I suppose you meant to sort the result globally across files. AFAIK, This is not currently supported unless you have only one reducer. It is said that version 0.19 will introduce such capability. -Kevin On Wed, Aug 6, 2008 at 6:01 PM, Xing [EMAIL PROTECTED] wrote: If I use one node for

Re: How to order all the output file if I use more than one reduce node?

2008-08-06 Thread Taeho Kang
You may want to write a partitioner that partitions the output from mappers in a way that fits your definition of sorted data (e.g. all keys in part-1 are greater than those in part-0.) Once you've done it, just merging all the reduce output from 0 to N will give you a sorted result file.

Re: Are lines broken in dfs and/or in InputSplit

2008-08-06 Thread Taeho Kang
I guess a quick way to find an answer for your question is to look at size of data block files stored in datanodes. If they are all the same (e.g. 64MB), then you could say lines are NOT preserved in block level as DFS simply cuts the original file into exact 64MB pieces. They are almost all the

Re: Are lines broken in dfs and/or in InputSplit

2008-08-06 Thread Kevin
Yes, I have looked at the block files and it matches what you said. I am just wondering if there is some property or flag that would turn this feature on, if it exists. -Kevin On Wed, Aug 6, 2008 at 8:01 PM, Taeho Kang [EMAIL PROTECTED] wrote: I guess a quick way to find an answer for your