Guys, I'm not the one who said 'HDFS' unless I had a brain bubble in
my original message. I asked for a distribution mechanism for
code+mappable data. I appreciate the arrival of some suggestions.
Ted is correct that I know quite a bit about mmap; I had a lot to do
with the code in ObjectStore
Sorry, don't mean to say you don't know mmap or didn't do cool things in the
past.
But you will see why anyone would've interpreted this original post, given
the title of the posting and the following wording, to mean can I mmap
files that are in hdfs
On Mon, Apr 11, 2011 at 3:57 PM, Benson
Point taken.
On Wed, Apr 13, 2011 at 10:33 AM, M. C. Srivas mcsri...@gmail.com wrote:
Sorry, don't mean to say you don't know mmap or didn't do cool things in the
past.
But you will see why anyone would've interpreted this original post, given
the title of the posting and the following
There are systems for file-system plumbing out to user processes, and
FUSE does this on Linux, and there is a package for hadoop. However-
pretending a remote resource is local holds a place of honor on the
system design antipattern hall of fame.
On Wed, Apr 13, 2011 at 7:35 AM, Benson Margulies
: Memory mapped resources
Yes. But only one such block. That is what I meant by chunk.
That is fine if you want that chunk but if you want to mmap the entire
file,
it isn't real useful.
On Mon, Apr 11, 2011 at 6:48 PM, Jason Rutherglen
jason.rutherg...@gmail.com wrote:
What do you mean by local
We have some very large files that we access via memory mapping in
Java. Someone's asked us about how to make this conveniently
deployable in Hadoop. If we tell them to put the files into hdfs, can
we obtain a File for the underlying file on any given node?
We sometimes find it convenient to
12:09 AM
To: Jason Rutherglen
Cc: common-user@hadoop.apache.org; Edward Capriolo
Subject: Re: Memory mapped resources
Yes. But only one such block. That is what I meant by chunk.
That is fine if you want that chunk but if you want to mmap the entire
file,
it isn't real useful.
On Mon
Well, no.
You could mmap all the blocks that are local to the node your program is on.
The others you will have to read more conventionally. If all blocks are
guaranteed local, this would work. I don't think that guarantee is possible
on a non-trivial cluster.
On Tue, Apr 12, 2011 at 6:32 AM,
The others you will have to read more conventionally
True. I think there are emergent use cases that demand data locality,
eg, an optimized HBase system, search, and MMap'ing.
If all blocks are guaranteed local, this would work. I don't think that
guarantee is possible
on a non-trivial
Blocks live where they land when first created. They can be moved due to
node failure or rebalancing, but it is typically pretty expensive to do
this. It certainly is slower than just reading the file.
If you really, really want mmap to work, then you need to set up some native
code that builds
Here's the OP again.
I want to make it clear that my question here has to do with the
problem of distributing 'the program' around the cluster, not 'the
data'. In the case at hand, the issue a system that has a large data
resource that it needs to do its work. Every instance of the code
needs the
To get around the chunks or blocks problem, I've been implementing a
system that simply sets a max block size that is too large for a file
to reach. In this way there will only be one block for HDFS file, and
so MMap'ing or other single file ops become trivial.
On Tue, Apr 12, 2011 at 10:40 AM,
Actually, it doesn't become trivial. It just becomes total fail or total
win instead of almost always being partial win. It doesn't meet Benson's
need.
On Tue, Apr 12, 2011 at 11:09 AM, Jason Rutherglen
jason.rutherg...@gmail.com wrote:
To get around the chunks or blocks problem, I've been
You can use distributed cache for memory mapped files (they're local
to the node the tasks run on.)
http://developer.yahoo.com/hadoop/tutorial/module5.html#auxdata
On Tue, Apr 12, 2011 at 10:40 AM, Benson Margulies
bimargul...@gmail.com wrote:
Here's the OP again.
I want to make it clear that
I am not sure if you realize, but HDFS is not VM integrated. What you are
asking for is support *inside* the linux kernel for HDFS file systems. I
don't see that happening for the next few years, and probably never at all.
(HDFS is all Java today, and Java certainly is not going to go inside the
Benson is actually a pretty sophisticated guy who knows a lot about mmap.
I engaged with him yesterday on this since I know him from Apache.
On Tue, Apr 12, 2011 at 7:16 PM, M. C. Srivas mcsri...@gmail.com wrote:
I am not sure if you realize, but HDFS is not VM integrated.
We have some very large files that we access via memory mapping in
Java. Someone's asked us about how to make this conveniently
deployable in Hadoop. If we tell them to put the files into hdfs, can
we obtain a File for the underlying file on any given node?
Yes you can however it will require customization of HDFS. Take a
look at HDFS-347 specifically the HDFS-347-branch-20-append.txt patch.
I have been altering it for use with HBASE-3529. Note that the patch
noted is for the -append branch which is mainly for HBase.
On Mon, Apr 11, 2011 at 3:57
On Mon, Apr 11, 2011 at 7:05 PM, Jason Rutherglen
jason.rutherg...@gmail.com wrote:
Yes you can however it will require customization of HDFS. Take a
look at HDFS-347 specifically the HDFS-347-branch-20-append.txt patch.
I have been altering it for use with HBASE-3529. Note that the patch
Also, it only provides access to a local chunk of a file which isn't very
useful.
On Mon, Apr 11, 2011 at 5:32 PM, Edward Capriolo edlinuxg...@gmail.comwrote:
On Mon, Apr 11, 2011 at 7:05 PM, Jason Rutherglen
jason.rutherg...@gmail.com wrote:
Yes you can however it will require customization
What do you mean by local chunk? I think it's providing access to the
underlying file block?
On Mon, Apr 11, 2011 at 6:30 PM, Ted Dunning tdunn...@maprtech.com wrote:
Also, it only provides access to a local chunk of a file which isn't very
useful.
On Mon, Apr 11, 2011 at 5:32 PM, Edward
Yes. But only one such block. That is what I meant by chunk.
That is fine if you want that chunk but if you want to mmap the entire file,
it isn't real useful.
On Mon, Apr 11, 2011 at 6:48 PM, Jason Rutherglen
jason.rutherg...@gmail.com wrote:
What do you mean by local chunk? I think it's
22 matches
Mail list logo