[jira] Commented: (HADOOP-171) need standard API to set dfs replication = high

2006-04-26 Thread eric baldeschwieler (JIRA)
[ http://issues.apache.org/jira/browse/HADOOP-171?page=comments#action_12376639 ] eric baldeschwieler commented on HADOOP-171: highReplicationHint is not the right description IMO. One goal is to optimize on distribution. Call it distributionH

[jira] Commented: (HADOOP-171) need standard API to set dfs replication = high

2006-04-26 Thread Doug Cutting (JIRA)
[ http://issues.apache.org/jira/browse/HADOOP-171?page=comments#action_12376622 ] Doug Cutting commented on HADOOP-171: - > We can implement short highReplicationHint() That would work, but it doesn't seem like the simplest API. But if folks feel stro

[jira] Commented: (HADOOP-171) need standard API to set dfs replication = high

2006-04-26 Thread Konstantin Shvachko (JIRA)
[ http://issues.apache.org/jira/browse/HADOOP-171?page=comments#action_12376616 ] Konstantin Shvachko commented on HADOOP-171: We can implement short highReplicationHint() which would ask namenode before anything what it thinks an appropriate rep

[jira] Resolved: (HADOOP-171) need standard API to set dfs replication = high

2006-04-26 Thread eric baldeschwieler (JIRA)
[ http://issues.apache.org/jira/browse/HADOOP-171?page=all ] eric baldeschwieler resolved HADOOP-171: Resolution: Duplicate > need standard API to set dfs replication = high > --- > > Key:

[jira] Commented: (HADOOP-170) setReplication and related bug fixes

2006-04-26 Thread Runping Qi (JIRA)
[ http://issues.apache.org/jira/browse/HADOOP-170?page=comments#action_12376614 ] Runping Qi commented on HADOOP-170: --- It will be more effective if after assigning a map task to a specific jobtracker node, the jobtrackercan request the name node to repli

[jira] Commented: (HADOOP-171) need standard API to set dfs replication = high

2006-04-26 Thread Doug Cutting (JIRA)
[ http://issues.apache.org/jira/browse/HADOOP-171?page=comments#action_12376605 ] Doug Cutting commented on HADOOP-171: - One alternative to 'fs.copyFromLocalFile(localJobJar, remoteJobJar, Short.MAX_VALUE)' might be: fs.copyFromLocalFile(localJobJar, r

[jira] Commented: (HADOOP-171) need standard API to set dfs replication = high

2006-04-26 Thread Doug Cutting (JIRA)
[ http://issues.apache.org/jira/browse/HADOOP-171?page=comments#action_12376604 ] Doug Cutting commented on HADOOP-171: - So my concrete question is: how should I change JobClient? It currently uses fs.copyFromLocalFile() to copy the job.jar and fs.crea

[jira] Created: (HADOOP-171) need standard API to set dfs replication = high

2006-04-26 Thread Doug Cutting (JIRA)
need standard API to set dfs replication = high --- Key: HADOOP-171 URL: http://issues.apache.org/jira/browse/HADOOP-171 Project: Hadoop Type: New Feature Components: dfs Versions: 0.2 Reporter: Doug Cuttin

[jira] Resolved: (HADOOP-170) setReplication and related bug fixes

2006-04-26 Thread Doug Cutting (JIRA)
[ http://issues.apache.org/jira/browse/HADOOP-170?page=all ] Doug Cutting resolved HADOOP-170: - Fix Version: 0.2 Resolution: Fixed I just committed this. Thanks, Konstantin! > setReplication and related bug fixes > ---

[jira] Commented: (HADOOP-170) setReplication and related bug fixes

2006-04-26 Thread Konstantin Shvachko (JIRA)
[ http://issues.apache.org/jira/browse/HADOOP-170?page=comments#action_12376598 ] Konstantin Shvachko commented on HADOOP-170: - Replication can be set to maximum if the number of nodes is available. That is block will replicated on all almost al

[jira] Commented: (HADOOP-170) setReplication and related bug fixes

2006-04-26 Thread Doug Cutting (JIRA)
[ http://issues.apache.org/jira/browse/HADOOP-170?page=comments#action_12376596 ] Doug Cutting commented on HADOOP-170: - > +1 on asking the job tracker for the size of the cluster and replicating > based on that size. I think you mean "namenode", not "

[jira] Updated: (HADOOP-170) setReplication and related bug fixes

2006-04-26 Thread Konstantin Shvachko (JIRA)
[ http://issues.apache.org/jira/browse/HADOOP-170?page=all ] Konstantin Shvachko updated HADOOP-170: --- Attachment: setReplication.patch Java documentation is added for new public methods. > setReplication and related bug fixes > --

[jira] Updated: (HADOOP-170) setReplication and related bug fixes

2006-04-26 Thread Konstantin Shvachko (JIRA)
[ http://issues.apache.org/jira/browse/HADOOP-170?page=all ] Konstantin Shvachko updated HADOOP-170: --- Attachment: (was: setReplication.patch) > setReplication and related bug fixes > > > Key: HADOOP

[jira] Commented: (HADOOP-170) setReplication and related bug fixes

2006-04-26 Thread Yoram Arnon (JIRA)
[ http://issues.apache.org/jira/browse/HADOOP-170?page=comments#action_12376593 ] Yoram Arnon commented on HADOOP-170: the optimal replication factor for distributing a file in 2 hops is sqrt(cluster size). The worst case delivery time is 2*sqrt(n) time

[jira] Commented: (HADOOP-170) setReplication and related bug fixes

2006-04-26 Thread Doug Cutting (JIRA)
[ http://issues.apache.org/jira/browse/HADOOP-170?page=comments#action_12376587 ] Doug Cutting commented on HADOOP-170: - Can you please add proper javadoc to the new public methods in FileSystem.java? Thanks. Also, as mentioned above, an easy way to i

[jira] Resolved: (HADOOP-169) a single failure from locateMapOutputs kills the entire job

2006-04-26 Thread Doug Cutting (JIRA)
[ http://issues.apache.org/jira/browse/HADOOP-169?page=all ] Doug Cutting resolved HADOOP-169: - Resolution: Fixed I just committed this patch. Thanks, Owen. > a single failure from locateMapOutputs kills the entire job > --

[jira] Commented: (HADOOP-170) setReplication and related bug fixes

2006-04-26 Thread Doug Cutting (JIRA)
[ http://issues.apache.org/jira/browse/HADOOP-170?page=comments#action_12376581 ] Doug Cutting commented on HADOOP-170: - > This probably has to be called smartReplication(). What's the matter with Integer.MAX_VALUE? This is one of the most important

[jira] Commented: (HADOOP-170) setReplication and related bug fixes

2006-04-26 Thread Benjamin Reed (JIRA)
[ http://issues.apache.org/jira/browse/HADOOP-170?page=comments#action_12376576 ] Benjamin Reed commented on HADOOP-170: -- It's really JobTracker, not the fs, that knows how high to set the replication count since the JobTracker will know the number of

[jira] Commented: (HADOOP-170) setReplication and related bug fixes

2006-04-26 Thread Konstantin Shvachko (JIRA)
[ http://issues.apache.org/jira/browse/HADOOP-170?page=comments#action_12376574 ] Konstantin Shvachko commented on HADOOP-170: This probably has to be called smartReplication(). Right now even if we the data blocks are placed "smart" by the name

Re: C API for Hadoop DFS

2006-04-26 Thread Ben Reed
Good idea. To be fully future-proof, this could even become: tOffset* getBlockStarts(dfsFs fs, char* file); That would permit variable sized-blocks, which could happen, e.g., if we someday support appending to files. Is that overkill? I like it a lot! (I'm really hoping for record aligned

[jira] Resolved: (HADOOP-168) JobSubmissionProtocol and InterTrackerProtocol don't include "throws IOException" on all methods

2006-04-26 Thread Doug Cutting (JIRA)
[ http://issues.apache.org/jira/browse/HADOOP-168?page=all ] Doug Cutting resolved HADOOP-168: - Resolution: Fixed I just committed this. Thanks, Owen! > JobSubmissionProtocol and InterTrackerProtocol don't include "throws > IOException" on all me

[jira] Commented: (HADOOP-170) setReplication and related bug fixes

2006-04-26 Thread Doug Cutting (JIRA)
[ http://issues.apache.org/jira/browse/HADOOP-170?page=comments#action_12376558 ] Doug Cutting commented on HADOOP-170: - We should start setting a high replication count in MapReduce for submitted job files, that are read by every node. But how should

[jira] Updated: (HADOOP-170) setReplication and related bug fixes

2006-04-26 Thread Konstantin Shvachko (JIRA)
[ http://issues.apache.org/jira/browse/HADOOP-170?page=all ] Konstantin Shvachko updated HADOOP-170: --- Attachment: setReplication.patch > setReplication and related bug fixes > > > Key: HADOOP-170 >

[jira] Created: (HADOOP-170) setReplication and related bug fixes

2006-04-26 Thread Konstantin Shvachko (JIRA)
setReplication and related bug fixes Key: HADOOP-170 URL: http://issues.apache.org/jira/browse/HADOOP-170 Project: Hadoop Type: Improvement Components: fs, dfs Versions: 0.1.1 Reporter: Konstantin Shvachko Assign

Re: C API for Hadoop DFS

2006-04-26 Thread Doug Cutting
Konstantin Shvachko wrote: If we want a copy function I'd propose a slightly generalized version int dfsCopy(dfsFs srcFs, char* src, dfsFs dstFs, char* dst); That way we can copy from/to local using the same function. +1 Doug

Re: C API for Hadoop DFS

2006-04-26 Thread Konstantin Shvachko
These are utility methods, that could be implemented by user code, i.e., not core methods. That's fine. But perhaps we should add another: int dfsCopy(dfsFs fs, char* src, char* dst); Otherwise lots of applications will end up writing this themselves. If we want a copy function I'd pro

[jira] Resolved: (HADOOP-166) IPC is unable to invoke methods that use interfaces as parameter

2006-04-26 Thread Doug Cutting (JIRA)
[ http://issues.apache.org/jira/browse/HADOOP-166?page=all ] Doug Cutting resolved HADOOP-166: - Fix Version: 0.2 Resolution: Fixed Assign To: Doug Cutting I just committed a fix for this. Thanks for your help, Stefan! > IPC is unable to

[jira] Commented: (HADOOP-167) reducing the number of Configuration & JobConf objects created

2006-04-26 Thread Doug Cutting (JIRA)
[ http://issues.apache.org/jira/browse/HADOOP-167?page=comments#action_12376520 ] Doug Cutting commented on HADOOP-167: - Can we stop the extra reads caused by addFinalResource() and 'new JobConf(Configuration)' by re-using the hash table instead of re-r

Re: C API for Hadoop DFS

2006-04-26 Thread Eric Baldeschwieler
I'd vote against supporting variable block sizes, this way. Let's keep it simpler until we have more use cases. fair enough. we can go forward with the copy APIs. I appear to be out voted. ;-) On Apr 26, 2006, at 10:33 AM, Doug Cutting wrote: Eric Baldeschwieler wrote: Instead, I think

RE: C API for Hadoop DFS

2006-04-26 Thread Devaraj Das
Hi Doug, thanks for the positive feedback! I agree with you and Eric on the getBlockSize/getHosts/dfsCopy suggestions. Will revise the spec accordingly. Thanks, Devaraj. -Original Message- From: Doug Cutting [mailto:[EMAIL PROTECTED] Sent: Wednesday, April 26, 2006 10:26 PM To: hadoop-dev

Re: C API for Hadoop DFS

2006-04-26 Thread Doug Cutting
Eric Baldeschwieler wrote: Instead, I think we need the following two functions: tOffset getBlockSize(dfsFs fs); char** geHosts(dfsFs fs, char* file, tOffset pos); ** Think your suggestion is good. An addition... I'd rather not assume that block size is global. Why not require a file name

[jira] Commented: (HADOOP-167) reducing the number of Configuration & JobConf objects created

2006-04-26 Thread Owen O'Malley (JIRA)
[ http://issues.apache.org/jira/browse/HADOOP-167?page=comments#action_12376517 ] Owen O'Malley commented on HADOOP-167: -- Your example is exactly the same as: jobConf = new JobConf(); jobConf.addFinalResource(getHadoopAliasConfFile()); just without re

[jira] Commented: (HADOOP-167) reducing the number of Configuration & JobConf objects created

2006-04-26 Thread Michel Tourn (JIRA)
[ http://issues.apache.org/jira/browse/HADOOP-167?page=comments#action_12376516 ] Michel Tourn commented on HADOOP-167: - Avoiding the multiple config-loading messages is a good thing. This could also be controlled with a verbosity / logging level settin

Re: C API for Hadoop DFS

2006-04-26 Thread Eric Baldeschwieler
a couple of thoughts: Instead, I think we need the following two functions: tOffset getBlockSize(dfsFs fs); char** geHosts(dfsFs fs, char* file, tOffset pos); ** Think your suggestion is good. An addition... I'd rather not assume that block size is global. Why not require a file name in

Re: C API for Hadoop DFS

2006-04-26 Thread Doug Cutting
Devaraj Das wrote: Attached is a draft of the C API specification that some of us (in Yahoo) have been thinking about. The specification is closely tied to the API exported by Hadoop's FileSystem class. Will really appreciate any comments, etc. on the specification. Overall, this looks great!

Re: C API for Hadoop DFS

2006-04-26 Thread Leen Toelen
Hi, to make deployment easier, it would also be handy to prepare a binary compiled to native code with gcj. This way nodes don't actually need java installed for hadoop to work. Regards, Leen On 4/26/06, Devaraj Das <[EMAIL PROTECTED]> wrote: > > Hi All, > Attached is a draft of the C API specif

[jira] Updated: (HADOOP-166) IPC is unable to invoke methods that use interfaces as parameter

2006-04-26 Thread Stefan Groschupf (JIRA)
[ http://issues.apache.org/jira/browse/HADOOP-166?page=all ] Stefan Groschupf updated HADOOP-166: Attachment: RPC_interface_supportV2.patch Make sense. Here an update as you suggested. I renamed the declaredClass into instanceClass in the readObject

C API for Hadoop DFS

2006-04-26 Thread Devaraj Das
Hi All, Attached is a draft of the C API specification that some of us (in Yahoo) have been thinking about. The specification is closely tied to the API exported by Hadoop's FileSystem class. Will really appreciate any comments, etc. on the specification. Thanks, Devaraj. #ifndef DFS_H #define DF