Paul Sheer wrote:
> Hadoop backend for PostGreSQL
Resurrecting an old thread, it seems some guys at Yale implemented
something very similar to what this thread was discussing.
http://dbmsmusings.blogspot.com/2009/07/announcing-release-of-hadoopdb-longer.html
> >
> >It's an open source stack t
With a distributed data store, the data would become a logical
object - no adding or removal of machines would affect the data.
This is an ideal that would remove a tremendous maintenance
burden from many sites well, at least the one's I have worked
at as far as I can see.
Two things:
1)
Tom Lane wrote:
It's interesting to speculate about where we could draw an abstraction
boundary that would be more useful. I don't think the MySQL guys got it
right either...
The supposed smgr abstraction of PostgreSQL, which tells more or less
how to get a byte to the disk, is quite far away
why not just stream it in via set-returning functions and make sure
that we can mark a set returning function as "STREAMABLE" or so (to
prevent joins, whatever).
is it the easiest way to get it right and it helps in many other cases.
i think that the storage manager is definitely the wrong pla
>
>
> As far as I can tell, the PG storage manager API is at the wrong level
> of abstraction for pretty much everything. These days, everything we do
> is atop the Unix filesystem API, and anything that smgr might have been
>
Is there a complete list of filesystem API calls somewhere that I can
| I believe there is more than that which would need to be done
nowadays. I seem to recall that the storage manager|
| abstraction has slowly been dedicated/optimized for md over the past 6
years or so. It may even be easier/preferred
| to write a hadoop specific access method dependi
"Jonah H. Harris" writes:
> I believe there is more than that which would need to be done nowadays. I
> seem to recall that the storage manager abstraction has slowly been
> dedicated/optimized for md over the past 6 years or so.
As far as I can tell, the PG storage manager API is at the wrong l
On Sun, Feb 22, 2009 at 3:47 PM, Robert Haas wrote:
> In theory, I think you could make postgres work on any type of
> underlying storage you like by writing a second smgr implementation
> that would exist alongside md.c. The fly in the ointment is that
> you'd need a more sophisticated implemen
Hi,
Paul Sheer wrote:
> This is not problem: Performance is a secondary consideration (at least
> as far as the problem I was referring to).
Well, if you don't mind your database running .. ehm.. creeping several
orders of magnitudes slower, you might also be interested in
Single-System Image Clu
Paul Sheer wrote
I have also found it's no use having RAID or ZFS. Each of these ties
the data to an OS installation. If the OS needs to be reinstalled, all
the data has to be manually moved in a way that is, well... dangerous.
How about network storage, fiber attach? If you move the db you
On Mon, Feb 23, 2009 at 9:08 AM, Paul Sheer wrote:
>> It would only be possible to have the actual PostgreSQL backends
>> running on a single node anyway, because they use shared memory to
>
> This is not problem: Performance is a secondary consideration (at least
> as far as the problem I was ref
> It would only be possible to have the actual PostgreSQL backends
> running on a single node anyway, because they use shared memory to
This is not problem: Performance is a secondary consideration (at least
as far as the problem I was referring to).
The primary usefulness is to have the data be
On Mon, Feb 23, 2009 at 3:56 PM, pi song wrote:
> I think the point that you can access more system cache is right but that
> doesn't mean it will be more efficient than accessing from your local disk.
> Take Hadoop for example, your request for file content will have to go to
> Namenode (file ch
On Sun, Feb 22, 2009 at 5:18 PM, pi song wrote:
> One more problem is that data placement on HDFS is inherent, meaning you
> have no explicit control. Thus, you cannot place two sets of data which are
> likely to be joined together on the same node = uncontrollable latency
> during query processin
One more problem is that data placement on HDFS is inherent, meaning you
have no explicit control. Thus, you cannot place two sets of data which are
likely to be joined together on the same node = uncontrollable latency
during query processing.
Pi Song
On Mon, Feb 23, 2009 at 7:47 AM, Robert Haas
On Sat, Feb 21, 2009 at 9:37 PM, pi song wrote:
> 1) Hadoop file system is very optimized for mostly read operation
> 2) As of a few months ago, hdfs doesn't support file appending.
> There might be a bit of impedance to make them go together.
> However, I think it should a very good initiative to
hi ...
i think the easiest way to do this is to simply add a mechanism to
functions which allows a function to "stream" data through.
it would basically mean losing join support as you cannot "read data
again" in a way which is good enough good enough for joining with the
function providing
1) Hadoop file system is very optimized for mostly read operation2) As of a
few months ago, hdfs doesn't support file appending.
There might be a bit of impedance to make them go together.
However, I think it should a very good initiative to come up with ideas to
be able to run postgres on distri
Hadoop backend for PostGreSQL
A problem that my client has, and one that I come across often,
is that a database seems to always be associated with a particular
physical machine, a physical machine that has to be upgraded,
replaced, or otherwise maintained.
Even if the database is replicated,
19 matches
Mail list logo