Re: Memory mapped resources

2011-04-13 Thread Benson Margulies
Guys, I'm not the one who said 'HDFS' unless I had a brain bubble in my original message. I asked for a distribution mechanism for code+mappable data. I appreciate the arrival of some suggestions. Ted is correct that I know quite a bit about mmap; I had a lot to do with the code in ObjectStore

Re: Memory mapped resources

2011-04-13 Thread M. C. Srivas
Sorry, don't mean to say you don't know mmap or didn't do cool things in the past. But you will see why anyone would've interpreted this original post, given the title of the posting and the following wording, to mean can I mmap files that are in hdfs On Mon, Apr 11, 2011 at 3:57 PM, Benson

Re: Memory mapped resources

2011-04-13 Thread Benson Margulies
Point taken. On Wed, Apr 13, 2011 at 10:33 AM, M. C. Srivas mcsri...@gmail.com wrote: Sorry, don't mean to say you don't know mmap or didn't do cool things in the past. But you will see why anyone would've interpreted this original post, given the title of the posting and the following

Re: Memory mapped resources

2011-04-13 Thread Lance Norskog
There are systems for file-system plumbing out to user processes, and FUSE does this on Linux, and there is a package for hadoop. However- pretending a remote resource is local holds a place of honor on the system design antipattern hall of fame. On Wed, Apr 13, 2011 at 7:35 AM, Benson Margulies

RE: Memory mapped resources

2011-04-12 Thread Kevin.Leach
: Memory mapped resources Yes. But only one such block. That is what I meant by chunk. That is fine if you want that chunk but if you want to mmap the entire file, it isn't real useful. On Mon, Apr 11, 2011 at 6:48 PM, Jason Rutherglen jason.rutherg...@gmail.com wrote: What do you mean by local

Re: Memory mapped resources

2011-04-12 Thread Michael Flester
We have some very large files that we access via memory mapping in Java. Someone's asked us about how to make this conveniently deployable in Hadoop. If we tell them to put the files into hdfs, can we obtain a File for the underlying file on any given node? We sometimes find it convenient to

Re: Memory mapped resources

2011-04-12 Thread Ted Dunning
12:09 AM To: Jason Rutherglen Cc: common-user@hadoop.apache.org; Edward Capriolo Subject: Re: Memory mapped resources Yes. But only one such block. That is what I meant by chunk. That is fine if you want that chunk but if you want to mmap the entire file, it isn't real useful. On Mon

Re: Memory mapped resources

2011-04-12 Thread Ted Dunning
Well, no. You could mmap all the blocks that are local to the node your program is on. The others you will have to read more conventionally. If all blocks are guaranteed local, this would work. I don't think that guarantee is possible on a non-trivial cluster. On Tue, Apr 12, 2011 at 6:32 AM,

Re: Memory mapped resources

2011-04-12 Thread Jason Rutherglen
The others you will have to read more conventionally True. I think there are emergent use cases that demand data locality, eg, an optimized HBase system, search, and MMap'ing. If all blocks are guaranteed local, this would work. I don't think that guarantee is possible on a non-trivial

Re: Memory mapped resources

2011-04-12 Thread Ted Dunning
Blocks live where they land when first created. They can be moved due to node failure or rebalancing, but it is typically pretty expensive to do this. It certainly is slower than just reading the file. If you really, really want mmap to work, then you need to set up some native code that builds

Re: Memory mapped resources

2011-04-12 Thread Benson Margulies
Here's the OP again. I want to make it clear that my question here has to do with the problem of distributing 'the program' around the cluster, not 'the data'. In the case at hand, the issue a system that has a large data resource that it needs to do its work. Every instance of the code needs the

Re: Memory mapped resources

2011-04-12 Thread Jason Rutherglen
To get around the chunks or blocks problem, I've been implementing a system that simply sets a max block size that is too large for a file to reach. In this way there will only be one block for HDFS file, and so MMap'ing or other single file ops become trivial. On Tue, Apr 12, 2011 at 10:40 AM,

Re: Memory mapped resources

2011-04-12 Thread Ted Dunning
Actually, it doesn't become trivial. It just becomes total fail or total win instead of almost always being partial win. It doesn't meet Benson's need. On Tue, Apr 12, 2011 at 11:09 AM, Jason Rutherglen jason.rutherg...@gmail.com wrote: To get around the chunks or blocks problem, I've been

Re: Memory mapped resources

2011-04-12 Thread Luke Lu
You can use distributed cache for memory mapped files (they're local to the node the tasks run on.) http://developer.yahoo.com/hadoop/tutorial/module5.html#auxdata On Tue, Apr 12, 2011 at 10:40 AM, Benson Margulies bimargul...@gmail.com wrote: Here's the OP again. I want to make it clear that

Re: Memory mapped resources

2011-04-12 Thread M. C. Srivas
I am not sure if you realize, but HDFS is not VM integrated. What you are asking for is support *inside* the linux kernel for HDFS file systems. I don't see that happening for the next few years, and probably never at all. (HDFS is all Java today, and Java certainly is not going to go inside the

Re: Memory mapped resources

2011-04-12 Thread Ted Dunning
Benson is actually a pretty sophisticated guy who knows a lot about mmap. I engaged with him yesterday on this since I know him from Apache. On Tue, Apr 12, 2011 at 7:16 PM, M. C. Srivas mcsri...@gmail.com wrote: I am not sure if you realize, but HDFS is not VM integrated.

Memory mapped resources

2011-04-11 Thread Benson Margulies
We have some very large files that we access via memory mapping in Java. Someone's asked us about how to make this conveniently deployable in Hadoop. If we tell them to put the files into hdfs, can we obtain a File for the underlying file on any given node?

Re: Memory mapped resources

2011-04-11 Thread Jason Rutherglen
Yes you can however it will require customization of HDFS. Take a look at HDFS-347 specifically the HDFS-347-branch-20-append.txt patch. I have been altering it for use with HBASE-3529. Note that the patch noted is for the -append branch which is mainly for HBase. On Mon, Apr 11, 2011 at 3:57

Re: Memory mapped resources

2011-04-11 Thread Edward Capriolo
On Mon, Apr 11, 2011 at 7:05 PM, Jason Rutherglen jason.rutherg...@gmail.com wrote: Yes you can however it will require customization of HDFS.  Take a look at HDFS-347 specifically the HDFS-347-branch-20-append.txt patch.  I have been altering it for use with HBASE-3529.  Note that the patch

Re: Memory mapped resources

2011-04-11 Thread Ted Dunning
Also, it only provides access to a local chunk of a file which isn't very useful. On Mon, Apr 11, 2011 at 5:32 PM, Edward Capriolo edlinuxg...@gmail.comwrote: On Mon, Apr 11, 2011 at 7:05 PM, Jason Rutherglen jason.rutherg...@gmail.com wrote: Yes you can however it will require customization

Re: Memory mapped resources

2011-04-11 Thread Jason Rutherglen
What do you mean by local chunk? I think it's providing access to the underlying file block? On Mon, Apr 11, 2011 at 6:30 PM, Ted Dunning tdunn...@maprtech.com wrote: Also, it only provides access to a local chunk of a file which isn't very useful. On Mon, Apr 11, 2011 at 5:32 PM, Edward

Re: Memory mapped resources

2011-04-11 Thread Ted Dunning
Yes. But only one such block. That is what I meant by chunk. That is fine if you want that chunk but if you want to mmap the entire file, it isn't real useful. On Mon, Apr 11, 2011 at 6:48 PM, Jason Rutherglen jason.rutherg...@gmail.com wrote: What do you mean by local chunk? I think it's