Re: [VOTE] Merge feature branch YARN-5355 (Timeline Service v2) to trunk

2017-08-30 Thread Colin McCabe
The "git" way of doing things would be to rebase the feature branch on master (trunk) and then commit the patch stack. Squashing the entire feature into a 10 MB megapatch is the "svn" way of doing things. The svn workflow evolved because merging feature branches back to trunk was really painful

Re: [DISCUSS] Branches and versions for Hadoop 3

2017-08-28 Thread Colin McCabe
On Mon, Aug 28, 2017, at 14:22, Allen Wittenauer wrote: > > > On Aug 28, 2017, at 12:41 PM, Jason Lowe wrote: > > > > I think this gets back to the "if it's worth committing" part. > > This brings us back to my original question: > > "Doesn't this place an undue

Re: inotify

2016-07-05 Thread Colin McCabe
I think it makes sense to have an AddBlockEvent. It seems like we could provide something like the block ID, block pool ID, and genstamp, as well as the inode ID and path of the file which the block was added to. Clearly, we cannot provide the length, since we don't know how many bytes the

Re: HDFS Block compression

2016-07-05 Thread Colin McCabe
We have discussed this in the past. I think the single biggest issue is that HDFS doesn't understand the schema of the data which is stored in it. So it may not be aware of what compression scheme would be most appropriate for the application and data. While it is true that HDFS doens't allow

Re: [DISCUSS] Increased use of feature branches

2016-06-13 Thread Colin McCabe
able to find a thread like that. > Thanks in advance. Hmm, perhaps I was thinking of the release vote process. Can anyone confirm? It would be nice if this information could appear on the bylaws page... best, Colin > > Thanks > Anu > > > On 6/13/16, 11:51 AM, &quo

Re: [DISCUSS] Increased use of feature branches

2016-06-13 Thread Colin McCabe
On Sun, Jun 12, 2016, at 05:06, Steve Loughran wrote: > > On 10 Jun 2016, at 20:37, Anu Engineer wrote: > > > > I actively work on two branches (Diskbalancer and ozone) and I agree with > > most of what Sangjin said. > > There is an overhead in working with branches,

Re: Compile proto

2016-05-10 Thread Colin McCabe
Hi Kun Ren, You have to add your new proto file to the relevant pom.xml file. best, Colin On Fri, May 6, 2016, at 13:04, Kun Ren wrote: > Hi Genius, > > I added a new proto into the > HADOOP_DIR/hadoop-common-project/hadoop-common/src/main/proto, > > however,every time when I run the

Re: Another thought on client-side support of HDFS federation

2016-05-02 Thread Colin McCabe
Hi Tianyi HE, Thanks for sharing this! This reminds me of the httpfs daemon. This daemon basically sits in front of an HDFS cluster and accepts requests, which it serves by forwarding them to the underlying HDFS instance. There is some documentation about it here:

Re: 2.7.3 release plan

2016-04-04 Thread Colin McCabe
I agree that HDFS-8578 should be a prerequisite for backporting HDFS-8791. I think we are overestimating the number of people affected by HDFS-8791, and underestimating the disruption that would be caused by a layout version upgrade in a dot release. As Andrew, Sean, and others in the thread

Re: Revive HADOOP-2705?

2015-12-18 Thread Colin McCabe
ava_tip_how_read_files_quickly > > One of the conclusions: > > "Minimize I/O operations by reading an array at a time, not a byte at > a time. An 8Kbyte array is a good size." > > > On Tue, Dec 15, 2015 at 3:41 PM, Colin McCabe <cmcc...@alumni.cmu.edu>

Re: Revive HADOOP-2705?

2015-12-15 Thread Colin McCabe
Hi David, Do you have benchmarks to justify changing this configuration? best, Colin On Wed, Dec 9, 2015 at 8:05 AM, dam6923 . wrote: > Hello! > > A while back, Java 1.6, the size of the internal internal file-reading > buffers were bumped-up to 8192 bytes. > >

Re: DISCUSS: is the order in FS.listStatus() required to be sorted?

2015-06-16 Thread Colin McCabe
On Tue, Jun 16, 2015 at 3:02 AM, Steve Loughran ste...@hortonworks.com wrote: On 15 Jun 2015, at 21:22, Colin P. McCabe cmcc...@apache.org wrote: One possibility is that we could randomize the order of returned results in HDFS (at least within a given batch of results returned from the NN).

Re: fsck output compatibility question with regard to HDFS-7281

2015-05-05 Thread Colin McCabe
How about just having a --json option for the fsck command? That's what we did in Ceph for some command line tools. It would make the output easier to consume and easier to provide compatibility for. Colin On Apr 28, 2015 12:32 PM, Allen Wittenauer a...@altiscale.com wrote: A lot of the

Re: HDFS audit log

2015-05-05 Thread Colin McCabe
I think HDFS INotify is a better choice if you need: * guaranteed backwards compatibility * rapid and unambiguous parsing (via protobuf) * clear Java API for retrieving the data (I.e. not rsync on a text file) * ability to resume reading at a given point if the consumer process fails We are using

Re: upstream jenkins build broken?

2015-03-11 Thread Colin McCabe
Is there a maven plugin or setting we can use to simply remove directories that have no executable permissions on them? Clearly we have the permission to do this from a technical point of view (since we created the directories as the jenkins user), it's simply that the code refuses to do it.

Re: upstream jenkins build broken?

2015-03-11 Thread Colin McCabe
to see a directory. That might even let us enable some of these tests that are skipped on Windows, because Windows allows access for the owner even after permissions have been stripped. +1. JIRA? Colin Chris Nauroth Hortonworks http://hortonworks.com/ On 3/11/15, 2:10 PM, Colin

Re: 2.7 status

2015-02-17 Thread Colin McCabe
+1 for starting thinking about releasing 2.7 soon. Re: building Windows binaries. Do we release binaries for all the Linux and UNIX architectures? I thought we didn't. It seems a little inconsistent to release binaries just for Windows, but not for those other architectures and OSes. I wonder

Re: NFSv3 Filesystem Connector

2015-01-14 Thread Colin McCabe
Why not just use LocalFileSystem with an NFS mount (or several)? I read through the README but I didn't see that question answered anywhere. best, Colin On Tue, Jan 13, 2015 at 1:35 PM, Gokul Soundararajan gokulsoun...@gmail.com wrote: Hi, We (Jingxin Feng, Xing Lin, and I) have been

Re: Symbolic links disablement

2014-12-31 Thread Colin McCabe
As far as I know, nobody is working on this at the moment. There are a lot of issues that would need to be worked through before we could enable symlinks in production. We never quite agreed on the semantics of how symlinks should work... for example, some people advocated that listing a

Re: Thinking ahead to hadoop-2.7

2014-12-08 Thread Colin McCabe
On Fri, Dec 5, 2014 at 11:15 AM, Karthik Kambatla ka...@cloudera.com wrote: It would be nice to cut the branch for the next feature release (not just Java 7) in the first week of January, so we can get the RC out by the end of the month? Yesterday, this came up in an offline discussion on

Re: Switching to Java 7

2014-12-08 Thread Colin McCabe
On Mon, Dec 8, 2014 at 7:46 AM, Steve Loughran ste...@hortonworks.com wrote: On 8 December 2014 at 14:58, Ted Yu yuzhih...@gmail.com wrote: Looks like there was still OutOfMemoryError :

Re: Guava

2014-11-10 Thread Colin McCabe
I'm usually an advocate for getting rid of unnecessary dependencies (cough, jetty, cough), but a lot of the things in Guava are really useful. Immutable collections, BiMap, Multisets, Arrays#asList, the stuff for writing hashCode() and equals(), String#Joiner, the list goes on. We particularly

Re: Why do reads take as long as replicated writes?

2014-11-10 Thread Colin McCabe
I strongly suggest benchmarking a modern version of Hadoop rather than Hadoop 1.x. The native CRC stuff from HDFS-3528 greatly reduces CPU consumption on the read path. I wrote about some other read path optimizations in Hadoop 2.x here:

builds failing on H9 with cannot access java.lang.Runnable

2014-10-03 Thread Colin McCabe
It looks like builds are failing on the H9 host with cannot access java.lang.Runnable Example from https://builds.apache.org/job/PreCommit-HDFS-Build/8313/artifact/patchprocess/trunkJavacWarnings.txt : [INFO] [INFO] BUILD

Re: builds failing on H9 with cannot access java.lang.Runnable

2014-10-03 Thread Colin McCabe
...@hortonworks.com wrote: all the slaves are getting re-booted give it some more time -giri On Fri, Oct 3, 2014 at 1:13 PM, Ted Yu yuzhih...@gmail.com wrote: Adding builds@ On Fri, Oct 3, 2014 at 1:07 PM, Colin McCabe cmcc...@alumni.cmu.edu wrote: It looks like builds are failing on the H9 host

Re: [VOTE] Merge HDFS-6581 to trunk - Writing to replicas in memory.

2014-09-24 Thread Colin McCabe
, Colin On Tue, Sep 23, 2014 at 6:09 PM, Colin McCabe cmcc...@alumni.cmu.edu wrote: This seems like a really aggressive timeframe for a merge. We still haven't implemented: * Checksum skipping on read and write from lazy persisted replicas. * Allowing mmaped reads from the lazy persisted data

Re: [VOTE] Merge HDFS-6581 to trunk - Writing to replicas in memory.

2014-09-24 Thread Colin McCabe
at the same time. Let's continue this discussion on HDFS-6919 and HDFS-6988 and see if we can come up with a solution that works for everyone. best, Colin Regards, Arpit On Wed, Sep 24, 2014 at 2:19 PM, Colin McCabe cmcc...@alumni.cmu.edu wrote: On Wed, Sep 24, 2014 at 11:12 AM, Suresh

Re: [VOTE] Merge HDFS-6581 to trunk - Writing to replicas in memory.

2014-09-23 Thread Colin McCabe
This seems like a really aggressive timeframe for a merge. We still haven't implemented: * Checksum skipping on read and write from lazy persisted replicas. * Allowing mmaped reads from the lazy persisted data. * Any eviction strategy other than LRU. * Integration with cache pool limits (how do

Re: [DISCUSS] Allow continue reading from being-written file using same stream

2014-09-19 Thread Colin McCabe
On Thu, Sep 18, 2014 at 11:06 AM, Vinayakumar B vinayakum...@apache.org wrote: bq. I don't know about the merits of this, but I do know that native filesystems implement this by not raising the EOF exception on the seek() but only on the read ... some of the non-HDFS filesystems Hadoop support

Re: [DISCUSS] Allow continue reading from being-written file using same stream

2014-09-19 Thread Colin McCabe
On Fri, Sep 19, 2014 at 9:41 AM, Vinayakumar B vinayakum...@apache.org wrote: Thanks Colin for the detailed explanation. On Fri, Sep 19, 2014 at 9:38 PM, Colin McCabe cmcc...@alumni.cmu.edu wrote: On Thu, Sep 18, 2014 at 11:06 AM, Vinayakumar B vinayakum...@apache.org wrote: bq. I don't

Re: Updates on migration to git

2014-08-27 Thread Colin McCabe
Thanks for making this happen, Karthik and Daniel. Great job. best, Colin On Tue, Aug 26, 2014 at 5:59 PM, Karthik Kambatla ka...@cloudera.com wrote: Yes, we have requested for force-push disabled on trunk and branch-* branches. I didn't test it though :P, it is not writable yet. On Tue,

Re: HDFS-6902 FileWriter should be closed in finally block in BlockReceiver#receiveBlock()

2014-08-25 Thread Colin McCabe
Let's discuss this on the JIRA. I think Tsuyoshi OZAWA's solution is good. Colin On Thu, Aug 21, 2014 at 7:08 AM, Ted Yu yuzhih...@gmail.com wrote: bq. else there is a memory leak Moving call of close() would prevent the leak. bq. but then this code snippet could be java and can be messy

Re: [DISCUSS] Switch to log4j 2

2014-08-18 Thread Colin McCabe
On Fri, Aug 15, 2014 at 8:50 AM, Aaron T. Myers a...@cloudera.com wrote: Not necessarily opposed to switching logging frameworks, but I believe we can actually support async logging with today's logging system if we wanted to, e.g. as was done for the HDFS audit logger in this JIRA:

Re: [VOTE] Migration from subversion to git for version control

2014-08-11 Thread Colin McCabe
+1. best, Colin On Fri, Aug 8, 2014 at 7:57 PM, Karthik Kambatla ka...@cloudera.com wrote: I have put together this proposal based on recent discussion on this topic. Please vote on the proposal. The vote runs for 7 days. 1. Migrate from subversion to git for version control. 2.

Re: [DISCUSS] Assume Private-Unstable for classes that are not annotated

2014-07-25 Thread Colin McCabe
+1. Colin On Tue, Jul 22, 2014 at 2:54 PM, Karthik Kambatla ka...@cloudera.com wrote: Hi devs As you might have noticed, we have several classes and methods in them that are not annotated at all. This is seldom intentional. Avoiding incompatible changes to all these classes can be

Re: Finding file size during block placement

2014-07-25 Thread Colin McCabe
On Wed, Jul 23, 2014 at 8:15 AM, Arjun baksh...@mail.uc.edu wrote: Hi, I want to write a block placement policy that takes the size of the file being placed into account. Something like what is done in CoHadoop or BEEMR paper. I have the following questions: Hadoop uses a stream metaphor.

Re: [Vote] Merge The HDFS XAttrs Feature Branch (HDFS-2006) to Trunk

2014-05-20 Thread Colin McCabe
Great job, guys. +1. I don't think we need to finish libhdfs support before we merge (unless you want to). Colin On Wed, May 14, 2014 at 5:47 AM, Gangumalla, Uma uma.ganguma...@intel.comwrote: Hello HDFS Devs, I would like to call for a vote to merge the HDFS Extended Attributes

Re: In-Memory Reference FS implementations

2014-03-06 Thread Colin McCabe
NetFlix's Apache-licensed S3mper system provides consistency for an S3-backed store. http://techblog.netflix.com/2014/01/s3mper-consistency-in-cloud.html It would be nice to see this or something like it integrated with Hadoop. I fear that a lot of applications are not ready for eventual

Re: [VOTE] Release Apache Hadoop 2.3.0

2014-02-11 Thread Colin McCabe
Looks good. +1, also non-binding. I downloaded the source tarball, checked md5, built, ran some unit tests, ran an HDFS cluster. cheers, Colin On Tue, Feb 11, 2014 at 6:53 PM, Andrew Wang andrew.w...@cloudera.com wrote: Thanks for putting this together Arun. +1 non-binding Downloaded

Re: Next releases

2013-12-06 Thread Colin McCabe
If 2.4 is released in January, I think it's very unlikely to include symlinks. There is still a lot of work to be done before they're usable. You can look at the progress on HADOOP-10019. For some of the subtasks, it will require some community discussion before any code can be written. For

Re: Deprecate BackupNode

2013-12-05 Thread Colin McCabe
+1 Colin On Dec 4, 2013 3:07 PM, Suresh Srinivas sur...@hortonworks.com wrote: It is almost an year a jira proposed deprecating backup node - https://issues.apache.org/jira/browse/HDFS-4114. Maintaining it adds unnecessary work. As an example, when I added support for retry cache there were

Re: Next releases

2013-11-14 Thread Colin McCabe
On Wed, Nov 13, 2013 at 10:10 AM, Arun C Murthy a...@hortonworks.com wrote: On Nov 12, 2013, at 1:54 PM, Todd Lipcon t...@cloudera.com wrote: On Mon, Nov 11, 2013 at 2:57 PM, Colin McCabe cmcc...@alumni.cmu.eduwrote: To be honest, I'm not aware of anything in 2.2.1 that shouldn't

Re: Next releases

2013-11-11 Thread Colin McCabe
HADOOP-10020 is a JIRA that disables symlinks temporarily. They will be disabled in 2.2.1 as well, if the plan is to have only minor fixes in that branch. To be honest, I'm not aware of anything in 2.2.1 that shouldn't be there. However, I have only been following the HDFS and common side of

Re: HDFS single datanode cluster issues

2013-11-07 Thread Colin McCabe
First of all, HDFS isn't really the right choice for single-node environments. I would recommend using LocalFileSystem in this case. If you're evaluating HDFS and only have one computer, it will really be better to run several VMs to see how it works, rather than running just one Datanode. You

Re: Replacing the JSP web UIs to HTML 5 applications

2013-11-01 Thread Colin McCabe
the current JSP based web ui. thx On Mon, Oct 28, 2013 at 11:16 AM, Colin McCabe cmcc...@alumni.cmu.edu wrote: This is a really interesting project, Haohui. I think it will make our web UI much nicer. I have a few concerns about

Re: libhdfs portability

2013-10-28 Thread Colin McCabe
On Mon, Oct 28, 2013 at 4:24 PM, Kyle Sletmoe kyle.slet...@urbanrobotics.net wrote: I have written a WebHDFSClient and I do not believe that reusing connections is enough to noticeably speed up transfers in my case. I did some tests and on average it took roughly 14 minutes to transfer a 3.6 GB

Re: [VOTE] Merge HDFS-4949 to trunk

2013-10-28 Thread Colin McCabe
With 3 +1s, the vote passes. Thanks, all. best, Colin On Fri, Oct 25, 2013 at 4:01 PM, Colin McCabe cmcc...@alumni.cmu.edu wrote: On Fri, Oct 25, 2013 at 10:07 AM, Suresh Srinivas sur...@hortonworks.com wrote: I posted a comment in the other thread about feature branch merges. My

Re: [VOTE] Merge HDFS-4949 to trunk

2013-10-17 Thread Colin McCabe
+1. Thanks, guys. best, Colin On Thu, Oct 17, 2013 at 3:01 PM, Andrew Wang andrew.w...@cloudera.com wrote: Hello all, I'd like to call a vote to merge the HDFS-4949 branch (in-memory caching) to trunk. Colin McCabe and I have been hard at work the last 3.5 months implementing this feature

Re: Build Still Unstable: CDH5beta1-Hadoop-Common-2.1.0-CDH-JDK7-tests-JDK7-run-JDK7 - Build # 21

2013-10-16 Thread Colin McCabe
This looks pretty similar to https://jira.cloudera.com/browse/CDH-10759 Probably need to take a look at this test to see why it's not managing its threads correctly. Colin On Tue, Oct 15, 2013 at 8:37 AM, Jenkins dev-kitc...@cloudera.com wrote: I offer a cookie, to whoever fixes me. See

Re: Build Still Unstable: CDH5beta1-Hadoop-Common-2.1.0-CDH-JDK7-tests-JDK7-run-JDK7 - Build # 21

2013-10-16 Thread Colin McCabe
Sorry for the noise. I posted to the wrong list. best, Colin On Wed, Oct 16, 2013 at 9:13 AM, Colin McCabe cmcc...@cloudera.com wrote: This looks pretty similar to https://jira.cloudera.com/browse/CDH-10759 Probably need to take a look at this test to see why it's not managing its threads

Re: 2.1.2 (Was: Re: [VOTE] Release Apache Hadoop 2.1.1-beta)

2013-10-02 Thread Colin McCabe
On Tue, Oct 1, 2013 at 8:59 PM, Arun C Murthy a...@hortonworks.com wrote: Yes, sorry if it wasn't clear. As others seem to agree, I think we'll be better getting a protocol/api stable GA done and then iterating on bugs etc. I'm not super worried about HADOOP-9984 since symlinks just made it

Re: 2.1.2 (Was: Re: [VOTE] Release Apache Hadoop 2.1.1-beta)

2013-10-02 Thread Colin McCabe
I don't think HADOOP-9972 is a must-do for the next Apache release, whatever version number it ends up having. It's just adding a new API, not changing any existing ones, and it can be done entirely in generic code. (The globber doesn't involve FileSystem or AFS subclasses). My understanding is

Re: symlink support in Hadoop 2 GA

2013-09-19 Thread Colin McCabe
What we're trying to get to here is a consensus on whether FileSystem#listStatus and FileSystem#globStatus should return symlinks __as_symlinks__. If 2.1-beta goes out with these semantics, I think we are not going to be able to change them later. That is what will happen in the do nothing

Re: symlink support in Hadoop 2 GA

2013-09-17 Thread Colin McCabe
The issue is not modifying existing APIs. The issue is that code has been written that makes assumptions that are incompatible with the existence of things that are not files or directories. For example, there is a lot of code out there that looks at FileStatus#isFile, and if it returns false,

Re: hdfs native build failing in trunk

2013-09-16 Thread Colin McCabe
The relevant line is: [exec] gcc: vfork: Resource temporarily unavailable Looks like the build slave was overloaded and could not create new processes? Colin On Mon, Sep 16, 2013 at 4:43 AM, Alejandro Abdelnur t...@cloudera.com wrote: It seems a commit of native code in YARN has triggered a

Re: [VOTE] Release Apache Hadoop 2.1.0-beta

2013-08-22 Thread Colin McCabe
On Wed, Aug 21, 2013 at 3:49 PM, Stack st...@duboce.net wrote: On Wed, Aug 21, 2013 at 1:25 PM, Colin McCabe cmcc...@alumni.cmu.eduwrote: St.Ack wrote: + Once I figured where the logs were, found that JAVA_HOME was not being exported (don't need this in hadoop-2.0.5 for instance). Adding

Re: [VOTE] Release Apache Hadoop 2.1.0-beta

2013-08-21 Thread Colin McCabe
St.Ack wrote: + Once I figured where the logs were, found that JAVA_HOME was not being exported (don't need this in hadoop-2.0.5 for instance). Adding an exported JAVA_HOME to my running shell which don't seem right but it took care of it (I gave up pretty quick on messing w/

Re: Secure deletion of blocks

2013-08-20 Thread Colin McCabe
If I've got the right idea about this at all? From the man page for wipe(1); Journaling filesystems (such as Ext3 or ReiserFS) are now being used by default by most Linux distributions. No secure deletion program that does filesystem-level calls can sanitize files on such filesystems, because

Re: Secure deletion of blocks

2013-08-20 Thread Colin McCabe
Just to clarify, ext4 has the option to turn off journalling. ext3 does not. Not sure about reiser. Colin On Tue, Aug 20, 2013 at 12:42 PM, Colin McCabe cmcc...@alumni.cmu.eduwrote: If I've got the right idea about this at all? From the man page for wipe(1); Journaling filesystems

Re: Feature request to provide DFSInputStream subclassing mechanism

2013-08-08 Thread Colin McCabe
There is work underway to decouple the block layer and the namespace layer of HDFS from each other. Once this is done, block behaviors like the one you describe will be easy to implement. It's a use case very similar to the hierarchical storage management (HSM) use case that we've discussed

Re: I'm interested in working with HDFS-4680. Can somebody be a mentor?

2013-07-17 Thread Colin McCabe
think he wanted to do it incrementally. best, Colin McCabe On Wed, Jul 17, 2013 at 1:44 PM, Sreejith Ramakrishnan sreejith.c...@gmail.com wrote: Hey, I was originally researching options to work on ACCUMULO-1197. Basically, it was a bid to pass trace functionality through the DFSClient. I

Re: data loss after cluster wide power loss

2013-07-08 Thread Colin McCabe
...@hortonworks.com wrote: On Wed, Jul 3, 2013 at 8:12 AM, Colin McCabe cmcc...@alumni.cmu.edu wrote: On Mon, Jul 1, 2013 at 8:48 PM, Suresh Srinivas sur...@hortonworks.com wrote: Dave, Thanks for the detailed email. Sorry I did not read all the details you had sent earlier completely (on my phone

Re: data loss after cluster wide power loss

2013-07-03 Thread Colin McCabe
On Mon, Jul 1, 2013 at 8:48 PM, Suresh Srinivas sur...@hortonworks.com wrote: Dave, Thanks for the detailed email. Sorry I did not read all the details you had sent earlier completely (on my phone). As you said, this is not related to data loss related to HBase log and hsync. I think you are

Re: dfs.datanode.socket.reuse.keepalive

2013-06-17 Thread Colin McCabe
threads) is likely to be more contended. -Todd On Fri, Jun 7, 2013 at 4:29 PM, Colin McCabe cmcc...@alumni.cmu.edu wrote: Hi all, HDFS-941 added dfs.datanode.socket.reuse.keepalive. This allows DataXceiver worker threads in the DataNode to linger for a second or two after finishing

Re: Why is FileSystem.createNonRecursive deprecated?

2013-06-12 Thread Colin McCabe
This seems inconsistent. If the method is deprecated just because it's in org.apache.hadoop.FileSystem, shouldn't all FileSystem methods be marked as deprecated? On the other hand, a user opening up FileSystem.java would probably not realize that it is deprecated. The JavaDoc for the class

Re: [jira] [Created] (HDFS-4824) FileInputStreamCache.close leaves dangling reference to FileInputStreamCache.cacheCleaner

2013-05-15 Thread Colin McCabe
Hi Shouvanik, Why not try asking the Talend community? Also, this question belongs on the user list. thanks, Colin On Wed, May 15, 2013 at 4:20 AM, Shouvanik Haldar shouvanik.hal...@gmail.com wrote: Hi, I am facing a problem. I am using Talend for scheduling and running a job. But, I

Re: Is Hadoop SequenceFile binary safe?

2013-05-02 Thread Colin McCabe
It seems like we could just set up an escape sequence and make it actually binary-safe, rather than just probabilistically. The escape sequence would only be inserted when there would otherwise be confusion between data and a sync marker. best, Colin On Thu, May 2, 2013 at 3:26 AM, Hs

Re: VOTE: HDFS-347 merge

2013-04-12 Thread Colin McCabe
-347 win the votes finally. Does there need some additional configuration to enable these features? On Fri, Apr 12, 2013 at 2:05 AM, Colin McCabe cmcc...@alumni.cmu.edu wrote: The merge vote is now closed. With three +1s, it passes. thanks, Colin On Wed, Apr 10, 2013 at 10:00

Re: VOTE: HDFS-347 merge

2013-04-11 Thread Colin McCabe
. It is as functional as the old version and way easier to set up/configure. -Todd On Mon, Apr 1, 2013 at 4:32 PM, Colin McCabe cmcc...@alumni.cmu.edu wrote: Hi all, I think it's time to merge the HDFS-347 branch back to trunk. It's been under review and testing for several months, and provides

Re: testHDFSConf.xml

2013-04-10 Thread Colin McCabe
On Wed, Apr 10, 2013 at 10:16 AM, Jay Vyas jayunit...@gmail.com wrote: Hello HDFS brethren ! I've noticed that the testHDFSConf.xml has alot of references to supergroup. https://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/testHDFSConf.xml

Re: VOTE: HDFS-347 merge

2013-04-08 Thread Colin McCabe
From: Suresh Srinivas sur...@hortonworks.com To: hdfs-dev@hadoop.apache.org hdfs-dev@hadoop.apache.org Sent: Wednesday, March 6, 2013 5:09 AM Subject: Re: VOTE: HDFS-347 merge Thanks Colin. Will check it out as soon as I can. On Tue, Mar 5, 2013 at 12:24 PM, Colin McCabe cmcc

Re: VOTE: HDFS-347 merge

2013-04-08 Thread Colin McCabe
of the code in the branch, and we have people now running this code in production scenarios. It is as functional as the old version and way easier to set up/configure. -Todd On Mon, Apr 1, 2013 at 4:32 PM, Colin McCabe cmcc...@alumni.cmu.edu wrote: Hi all, I think it's time to merge the HDFS

Re: VOTE: HDFS-347 merge

2013-04-02 Thread Colin McCabe
On Mon, Apr 1, 2013 at 6:58 PM, Colin McCabe cmcc...@alumni.cmu.edu wrote: On Mon, Apr 1, 2013 at 5:04 PM, Suresh Srinivas sur...@hortonworks.comwrote: Colin, For the record, the last email in the previous thread in ended with the following comment from Nicholas: It is great to hear

VOTE: HDFS-347 merge

2013-04-01 Thread Colin McCabe
Hi all, I think it's time to merge the HDFS-347 branch back to trunk. It's been under review and testing for several months, and provides both a performance advantage, and the ability to use short-circuit local reads without compromising system security. Previously, we tried to merge this and

Re: VOTE: HDFS-347 merge

2013-04-01 Thread Colin McCabe
, or renaming function X to Y, then I think we can easily do it after the merge. thanks, Colin I did not see any response (unless I missed it). Can you please address it? Regards, Suresh On Mon, Apr 1, 2013 at 4:32 PM, Colin McCabe cmcc...@alumni.cmu.edu wrote: Hi all, I think it's

Re: Heartbeat interval and timeout: why 3 secs and 10 min?

2013-03-13 Thread Colin McCabe
My understanding is that the 10 minute timeout helps to avoid replication storms, especially during startup. You might be interested in HDFS-3703, which adds a stale state which datanodes are placed into after 30 seconds of missing heartbeats. (This is an optional feature controlled by

Re: VOTE: HDFS-347 merge

2013-03-05 Thread Colin McCabe
On Tue, Feb 26, 2013 at 5:09 PM, Suresh Srinivas sur...@hortonworks.com wrote: Suresh, if you're willing to support and maintain HDFS-2246, do you have cycles to propose a patch to the HDFS-347 branch reintegrating HDFS-2246 with the simplifications you outlined? In your review, did you find

Re: VOTE: HDFS-347 merge

2013-02-27 Thread Colin McCabe
Here is a compromise proposal, which hopefully will satisfy both sides: We keep the old block reader and have a configuration option that enables it. So in addition to dfs.client.use.legacy.blockreader, which we already have, we would have dfs.client.use.legacy.blockreader.local. Does that make

Re: VOTE: HDFS-347 merge

2013-02-25 Thread Colin McCabe
On Sat, Feb 23, 2013 at 4:23 PM, Tsz Wo Sze szets...@yahoo.com wrote: I still do not see a valid reason to remove HDFS-2246 immediately. Some users may have insecure clusters and they don't want to change their configuration. BTW, is Unix Domain Socket supported by all Unix-like systems?

Re: VOTE: HDFS-347 merge

2013-02-22 Thread Colin McCabe
On Thu, Feb 21, 2013 at 1:24 PM, Chris Douglas cdoug...@apache.org wrote: On Wed, Feb 20, 2013 at 5:12 PM, Aaron T. Myers a...@cloudera.com wrote: Given that the only substantive concerns with HDFS-347 seem to be about Windows support for local reads, for now we only merge this branch to

VOTE: HDFS-347 merge

2013-02-17 Thread Colin McCabe
on a number of clusters. This iniial VOTE is to merge only into trunk. Just as we have done with our other recent merges, we will consider merging into branch-2 after the code has been in trunk for few weeks. Please cast your vote by EOD Sunday 2/24. best, Colin McCabe [1] https

Re: MiniDFSCluster

2012-09-05 Thread Colin McCabe
Hi Vlad, I think you might be on to something. File a JIRA? It should be a simple improvement, I think. cheers, Colin On Wed, Sep 5, 2012 at 10:42 AM, Vladimir Rozov v.ro...@comcast.net wrote: There are few methods on MiniDFSCluster class that are declared as static (getBlockFile,

validating user IDs

2012-06-11 Thread Colin McCabe
Hi all, I recently pulled the latest source, and ran a full build. The command line was this: mvn compile -Pnative I was confronted with this: [INFO] Requested user cmccabe has id 500, which is below the minimum allowed 1000 [INFO] FAIL: test-container-executor [INFO]

Re: validating user IDs

2012-06-11 Thread Colin McCabe
to the current OS limit? Even if this means detecting the OS version and assuming its default limit. thx On Mon, Jun 11, 2012 at 3:57 PM, Colin McCabe cmcc...@alumni.cmu.eduwrote: Hi all, I recently pulled the latest source, and ran a full build.  The command line was this: mvn compile -Pnative I

ioreply

2012-04-28 Thread Colin McCabe
Here is an interesting idea: recording traces of the filesystem operations applications do, and allowing these traces to be replayed later. ioreplay is mainly intended for replaying of recorded (using strace) IO traces, which is useful for standalone benchmarking. It provides many features to

Re: [DISCUSS] Remove append?

2012-03-26 Thread Colin McCabe
On Fri, Mar 23, 2012 at 7:44 PM, Scott Carey sc...@richrelevance.com wrote: On 3/22/12 10:25 AM, Eli Collins e...@cloudera.com wrote: On Thu, Mar 22, 2012 at 1:26 AM, Konstantin Shvachko shv.had...@gmail.com wrote: Eli, I went over the entire discussion on the topic, and did not get it. Is

Re: [DISCUSS] Remove append?

2012-03-26 Thread Colin McCabe
On Thu, Mar 22, 2012 at 5:49 PM, Eli Collins e...@cloudera.com wrote: On Thu, Mar 22, 2012 at 5:03 PM, Tsz Wo Sze szets...@yahoo.com wrote: @Eli, Removing a feature would simplify the design and code.  I think this is a generally true statement but not specific to Append.  The question is

Re: [DISCUSS] Remove append?

2012-03-26 Thread Colin McCabe
On Mon, Mar 26, 2012 at 1:55 PM, Tsz Wo Sze szets...@yahoo.com wrote: Just one comment: If we do decide to keep append in, we should get it to be actually stable and usable.  In my opinion, this should definitely happen before adding any new operations. @Colin, append is currently stable and,