[
http://issues.apache.org/jira/browse/HADOOP-171?page=comments#action_12376639 ]
eric baldeschwieler commented on HADOOP-171:
highReplicationHint is not the right description IMO. One goal is to optimize
on distribution. Call it distributionH
[
http://issues.apache.org/jira/browse/HADOOP-171?page=comments#action_12376622 ]
Doug Cutting commented on HADOOP-171:
-
> We can implement short highReplicationHint()
That would work, but it doesn't seem like the simplest API. But if folks feel
stro
[
http://issues.apache.org/jira/browse/HADOOP-171?page=comments#action_12376616 ]
Konstantin Shvachko commented on HADOOP-171:
We can implement
short highReplicationHint()
which would ask namenode before anything what it thinks an appropriate
rep
[ http://issues.apache.org/jira/browse/HADOOP-171?page=all ]
eric baldeschwieler resolved HADOOP-171:
Resolution: Duplicate
> need standard API to set dfs replication = high
> ---
>
> Key:
[
http://issues.apache.org/jira/browse/HADOOP-170?page=comments#action_12376614 ]
Runping Qi commented on HADOOP-170:
---
It will be more effective if after assigning a map task to a specific
jobtracker node, the jobtrackercan request the name node to repli
[
http://issues.apache.org/jira/browse/HADOOP-171?page=comments#action_12376605 ]
Doug Cutting commented on HADOOP-171:
-
One alternative to 'fs.copyFromLocalFile(localJobJar, remoteJobJar,
Short.MAX_VALUE)' might be:
fs.copyFromLocalFile(localJobJar, r
[
http://issues.apache.org/jira/browse/HADOOP-171?page=comments#action_12376604 ]
Doug Cutting commented on HADOOP-171:
-
So my concrete question is: how should I change JobClient? It currently uses
fs.copyFromLocalFile() to copy the job.jar and fs.crea
need standard API to set dfs replication = high
---
Key: HADOOP-171
URL: http://issues.apache.org/jira/browse/HADOOP-171
Project: Hadoop
Type: New Feature
Components: dfs
Versions: 0.2
Reporter: Doug Cuttin
[ http://issues.apache.org/jira/browse/HADOOP-170?page=all ]
Doug Cutting resolved HADOOP-170:
-
Fix Version: 0.2
Resolution: Fixed
I just committed this. Thanks, Konstantin!
> setReplication and related bug fixes
> ---
[
http://issues.apache.org/jira/browse/HADOOP-170?page=comments#action_12376598 ]
Konstantin Shvachko commented on HADOOP-170:
- Replication can be set to maximum if the number of nodes is available.
That is block will replicated on all almost al
[
http://issues.apache.org/jira/browse/HADOOP-170?page=comments#action_12376596 ]
Doug Cutting commented on HADOOP-170:
-
> +1 on asking the job tracker for the size of the cluster and replicating
> based on that size.
I think you mean "namenode", not "
[ http://issues.apache.org/jira/browse/HADOOP-170?page=all ]
Konstantin Shvachko updated HADOOP-170:
---
Attachment: setReplication.patch
Java documentation is added for new public methods.
> setReplication and related bug fixes
> --
[ http://issues.apache.org/jira/browse/HADOOP-170?page=all ]
Konstantin Shvachko updated HADOOP-170:
---
Attachment: (was: setReplication.patch)
> setReplication and related bug fixes
>
>
> Key: HADOOP
[
http://issues.apache.org/jira/browse/HADOOP-170?page=comments#action_12376593 ]
Yoram Arnon commented on HADOOP-170:
the optimal replication factor for distributing a file in 2 hops is
sqrt(cluster size). The worst case delivery time is 2*sqrt(n) time
[
http://issues.apache.org/jira/browse/HADOOP-170?page=comments#action_12376587 ]
Doug Cutting commented on HADOOP-170:
-
Can you please add proper javadoc to the new public methods in FileSystem.java?
Thanks.
Also, as mentioned above, an easy way to i
[ http://issues.apache.org/jira/browse/HADOOP-169?page=all ]
Doug Cutting resolved HADOOP-169:
-
Resolution: Fixed
I just committed this patch. Thanks, Owen.
> a single failure from locateMapOutputs kills the entire job
> --
[
http://issues.apache.org/jira/browse/HADOOP-170?page=comments#action_12376581 ]
Doug Cutting commented on HADOOP-170:
-
> This probably has to be called smartReplication().
What's the matter with Integer.MAX_VALUE?
This is one of the most important
[
http://issues.apache.org/jira/browse/HADOOP-170?page=comments#action_12376576 ]
Benjamin Reed commented on HADOOP-170:
--
It's really JobTracker, not the fs, that knows how high to set the replication
count since the JobTracker will know the number of
[
http://issues.apache.org/jira/browse/HADOOP-170?page=comments#action_12376574 ]
Konstantin Shvachko commented on HADOOP-170:
This probably has to be called smartReplication().
Right now even if we the data blocks are placed "smart" by the name
Good idea. To be fully future-proof, this could even become:
tOffset* getBlockStarts(dfsFs fs, char* file);
That would permit variable sized-blocks, which could happen, e.g.,
if we someday support appending to files. Is that overkill?
I like it a lot! (I'm really hoping for record aligned
[ http://issues.apache.org/jira/browse/HADOOP-168?page=all ]
Doug Cutting resolved HADOOP-168:
-
Resolution: Fixed
I just committed this. Thanks, Owen!
> JobSubmissionProtocol and InterTrackerProtocol don't include "throws
> IOException" on all me
[
http://issues.apache.org/jira/browse/HADOOP-170?page=comments#action_12376558 ]
Doug Cutting commented on HADOOP-170:
-
We should start setting a high replication count in MapReduce for submitted job
files, that are read by every node. But how should
[ http://issues.apache.org/jira/browse/HADOOP-170?page=all ]
Konstantin Shvachko updated HADOOP-170:
---
Attachment: setReplication.patch
> setReplication and related bug fixes
>
>
> Key: HADOOP-170
>
setReplication and related bug fixes
Key: HADOOP-170
URL: http://issues.apache.org/jira/browse/HADOOP-170
Project: Hadoop
Type: Improvement
Components: fs, dfs
Versions: 0.1.1
Reporter: Konstantin Shvachko
Assign
Konstantin Shvachko wrote:
If we want a copy function I'd propose a slightly generalized version
int dfsCopy(dfsFs srcFs, char* src, dfsFs dstFs, char* dst);
That way we can copy from/to local using the same function.
+1
Doug
These are utility methods, that could be implemented by user code,
i.e., not core methods. That's fine. But perhaps we should add another:
int dfsCopy(dfsFs fs, char* src, char* dst);
Otherwise lots of applications will end up writing this themselves.
If we want a copy function I'd pro
[ http://issues.apache.org/jira/browse/HADOOP-166?page=all ]
Doug Cutting resolved HADOOP-166:
-
Fix Version: 0.2
Resolution: Fixed
Assign To: Doug Cutting
I just committed a fix for this. Thanks for your help, Stefan!
> IPC is unable to
[
http://issues.apache.org/jira/browse/HADOOP-167?page=comments#action_12376520 ]
Doug Cutting commented on HADOOP-167:
-
Can we stop the extra reads caused by addFinalResource() and 'new
JobConf(Configuration)' by re-using the hash table instead of re-r
I'd vote against supporting variable block sizes, this way. Let's
keep it simpler until we have more use cases. fair enough.
we can go forward with the copy APIs. I appear to be out voted. ;-)
On Apr 26, 2006, at 10:33 AM, Doug Cutting wrote:
Eric Baldeschwieler wrote:
Instead, I think
Hi Doug, thanks for the positive feedback!
I agree with you and Eric on the getBlockSize/getHosts/dfsCopy suggestions.
Will revise the spec accordingly.
Thanks,
Devaraj.
-Original Message-
From: Doug Cutting [mailto:[EMAIL PROTECTED]
Sent: Wednesday, April 26, 2006 10:26 PM
To: hadoop-dev
Eric Baldeschwieler wrote:
Instead, I think we need the following two functions:
tOffset getBlockSize(dfsFs fs);
char** geHosts(dfsFs fs, char* file, tOffset pos);
** Think your suggestion is good. An addition... I'd rather not assume
that block size is global. Why not require a file name
[
http://issues.apache.org/jira/browse/HADOOP-167?page=comments#action_12376517 ]
Owen O'Malley commented on HADOOP-167:
--
Your example is exactly the same as:
jobConf = new JobConf();
jobConf.addFinalResource(getHadoopAliasConfFile());
just without re
[
http://issues.apache.org/jira/browse/HADOOP-167?page=comments#action_12376516 ]
Michel Tourn commented on HADOOP-167:
-
Avoiding the multiple config-loading messages is a good thing.
This could also be controlled with a verbosity / logging level settin
a couple of thoughts:
Instead, I think we need the following two functions:
tOffset getBlockSize(dfsFs fs);
char** geHosts(dfsFs fs, char* file, tOffset pos);
** Think your suggestion is good. An addition... I'd rather not
assume that block size is global. Why not require a file name in
Devaraj Das wrote:
Attached is a draft of the C API specification that some of us (in Yahoo)
have been thinking about. The specification is closely tied to the API
exported by Hadoop's FileSystem class.
Will really appreciate any comments, etc. on the specification.
Overall, this looks great!
Hi,
to make deployment easier, it would also be handy to prepare a binary
compiled to native code with gcj. This way nodes don't actually need
java installed for hadoop to work.
Regards,
Leen
On 4/26/06, Devaraj Das <[EMAIL PROTECTED]> wrote:
>
> Hi All,
> Attached is a draft of the C API specif
[ http://issues.apache.org/jira/browse/HADOOP-166?page=all ]
Stefan Groschupf updated HADOOP-166:
Attachment: RPC_interface_supportV2.patch
Make sense.
Here an update as you suggested.
I renamed the declaredClass into instanceClass in the readObject
Hi All,
Attached is a draft of the C API specification that some of us (in Yahoo)
have been thinking about. The specification is closely tied to the API
exported by Hadoop's FileSystem class.
Will really appreciate any comments, etc. on the specification.
Thanks,
Devaraj.
#ifndef DFS_H
#define DF
38 matches
Mail list logo