Re: [HACKERS] Fixed length data types issue

2006-09-22 Thread Bruno Wolff III
On Mon, Sep 11, 2006 at 19:05:12 -0400, Gregory Stark [EMAIL PROTECTED] wrote: I'm not sure how gmp and the others represent their data but my first guess is that there's no particular reason the base of the mantissa and exponent have to be the same as the base the exponent is interpreted

Re: [HACKERS] Fixed length data types issue

2006-09-18 Thread Bruno Wolff III
On Fri, Sep 08, 2006 at 15:08:18 -0400, Andrew Dunstan [EMAIL PROTECTED] wrote: From time to time the idea of a logical vs physical mapping for columns has been mentioned. Among other benefits, that might allow us to do some rearrangement of physical ordering to reduce space wasted on

Re: [HACKERS] Fixed length data types issue

2006-09-15 Thread Heikki Linnakangas
Gregory Stark wrote: It's limited but I wouldn't say it's very limiting. In the cases where it doesn't apply there's no way out anyways. A UTF8 field will need a length header in some form. Actually, you can determine the length of a UTF-8 encoded character by looking at the most significant

Re: [HACKERS] Fixed length data types issue

2006-09-15 Thread Martijn van Oosterhout
On Fri, Sep 15, 2006 at 10:01:19AM +0100, Heikki Linnakangas wrote: Gregory Stark wrote: It's limited but I wouldn't say it's very limiting. In the cases where it doesn't apply there's no way out anyways. A UTF8 field will need a length header in some form. Actually, you can determine the

Re: [HACKERS] Fixed length data types issue

2006-09-15 Thread Heikki Linnakangas
Martijn van Oosterhout wrote: On Fri, Sep 15, 2006 at 10:01:19AM +0100, Heikki Linnakangas wrote: Actually, you can determine the length of a UTF-8 encoded character by looking at the most significant bits of the first byte. So we could store a UTF-8 encoded CHAR(1) field without any additional

Re: [HACKERS] Fixed length data types issue

2006-09-15 Thread Martijn van Oosterhout
On Fri, Sep 15, 2006 at 11:43:52AM +0100, Heikki Linnakangas wrote: My gut feeling is that it wouldn't be that bad compared to what we have now or the new proposed varlena scheme, but before someone actually tries it and shows some numbers, this is just hand-waving. Well, that depends on

Re: [HACKERS] Fixed length data types issue

2006-09-15 Thread Heikki Linnakangas
Martijn van Oosterhout wrote: I don't think making a special typlen value just for a type that can store a single UTF-8 character is smart. I just can't see enough use to make it worth it. Assuming that we can set encoding per-column one day, I agree. If you have a CHAR(1) field, you're

Re: [HACKERS] Fixed length data types issue

2006-09-15 Thread Mario Weilguni
: [EMAIL PROTECTED]; pgsql-hackers@postgresql.org Betreff: Re: [HACKERS] Fixed length data types issue Martijn van Oosterhout wrote: I don't think making a special typlen value just for a type that can store a single UTF-8 character is smart. I just can't see enough use to make it worth

Re: [HACKERS] Fixed length data types issue

2006-09-15 Thread Martijn van Oosterhout
On Fri, Sep 15, 2006 at 01:38:54PM +0200, Mario Weilguni wrote: What about the char type? Isn't it designed for that? Or will this type disappear in future releases? char is used in the system catalogs, I don't think it's going to go any time soon. There it's used as a (surprise) single byte

Re: [HACKERS] Fixed length data types issue

2006-09-15 Thread Gregory Stark
Martijn van Oosterhout kleptog@svana.org writes: I don't think making a special typlen value just for a type that can store a single UTF-8 character is smart. I just can't see enough use to make it worth it. Well there are lots of data types that can probably tell how long they are based on

Re: [HACKERS] Fixed length data types issue

2006-09-14 Thread Markus Schaber
Hi, Jim, Jim Nasby wrote: I'd love to have the ability to control toasting thresholds manually. This could result in a lot of speed improvements in cases where a varlena field isn't frequently accessed and will be fairly large, yet not large enough to normally trigger toasting. An address

Re: [HACKERS] Fixed length data types issue

2006-09-14 Thread Bruce Momjian
Gregory Stark wrote: Alvaro Herrera [EMAIL PROTECTED] writes: Gregory Stark wrote: Well char doesn't have quite the same semantics as CHAR(1). If that's the consensus though then I can work on either fixing char semantics to match CHAR(1) or adding a separate type instead. What

Re: [HACKERS] Fixed length data types issue

2006-09-14 Thread Mark Dilger
My apologies if you are seeing this twice. I posted it last night, but it still does not appear to have made it to the group. Mark Dilger wrote: Tom Lane wrote: Mark Dilger [EMAIL PROTECTED] writes: Tom Lane wrote: Please provide a stack trace --- AFAIK there shouldn't be any reason why a

Re: [HACKERS] Fixed length data types issue

2006-09-14 Thread Bruce Momjian
Bruce Momjian wrote: Gregory Stark wrote: Alvaro Herrera [EMAIL PROTECTED] writes: Gregory Stark wrote: Well char doesn't have quite the same semantics as CHAR(1). If that's the consensus though then I can work on either fixing char semantics to match CHAR(1) or

Re: [HACKERS] Fixed length data types issue

2006-09-14 Thread Gregory Stark
Bruce Momjian [EMAIL PROTECTED] writes: One very nifty trick would be to fix char to act as CHAR(), and map CHAR(1) automatically to char. Sorry, probably a stupid idea considering multi-byte encodings. I suppose it could be an optimization for single-byte encodings, but that seems very

Re: [HACKERS] Fixed length data types issue

2006-09-14 Thread mark
On Thu, Sep 14, 2006 at 10:21:30PM +0100, Gregory Stark wrote: One very nifty trick would be to fix char to act as CHAR(), and map CHAR(1) automatically to char. Sorry, probably a stupid idea considering multi-byte encodings. I suppose it could be an optimization for single-byte

Re: [HACKERS] Fixed length data types issue

2006-09-13 Thread Jim Nasby
On Sep 11, 2006, at 1:57 PM, Gregory Stark wrote: Tom Lane [EMAIL PROTECTED] writes: I think its's more important to pick bitpatterns that reduce the number of cases heap_deform_tuple has to think about while decoding the length of a field --- every if in that inner loop is expensive.

Re: [HACKERS] Fixed length data types issue

2006-09-13 Thread Mark Dilger
Mark Dilger wrote: Tom Lane wrote: Mark Dilger [EMAIL PROTECTED] writes: ... The argument made upthread that a quadratic number of conversion operators is necessitated doesn't seem right to me, given that each type could upcast to the canonical built in type. (int1 = smallint, int3 =

Re: [HACKERS] Fixed length data types issue

2006-09-13 Thread Tom Lane
Mark Dilger [EMAIL PROTECTED] writes: int1 works perfectly, as far as I can tell. int3 works great in memory, but can't be stored to a table. The problem seems to be that store_att_byval allows data of size 1 byte but not size 3 bytes, forcing me to pass int3 by reference. But when I

Re: [HACKERS] Fixed length data types issue

2006-09-13 Thread Arturo Perez
In article [EMAIL PROTECTED], [EMAIL PROTECTED] (Jim Nasby) wrote: I'd love to have the ability to control toasting thresholds manually. ... Being able to force a field to be toasted before it normally would could drastically improve tuple density without requiring the developer to use

Re: [HACKERS] Fixed length data types issue

2006-09-13 Thread Mark Dilger
Tom Lane wrote: Mark Dilger [EMAIL PROTECTED] writes: int1 works perfectly, as far as I can tell. int3 works great in memory, but can't be stored to a table. The problem seems to be that store_att_byval allows data of size 1 byte but not size 3 bytes, forcing me to pass int3 by reference.

Re: [HACKERS] Fixed length data types issue

2006-09-13 Thread Tom Lane
Mark Dilger [EMAIL PROTECTED] writes: Tom Lane wrote: Please provide a stack trace --- AFAIK there shouldn't be any reason why a pass-by-ref 3-byte type wouldn't work. (gdb) bt #0 0xb7e01d45 in memcpy () from /lib/libc.so.6 #1 0x08077ece in heap_fill_tuple (tupleDesc=0x83c2ef7,

Re: [HACKERS] Fixed length data types issue

2006-09-13 Thread Mark Dilger
Tom Lane wrote: Mark Dilger [EMAIL PROTECTED] writes: Tom Lane wrote: Please provide a stack trace --- AFAIK there shouldn't be any reason why a pass-by-ref 3-byte type wouldn't work. (gdb) bt #0 0xb7e01d45 in memcpy () from /lib/libc.so.6 #1 0x08077ece in heap_fill_tuple

Re: [HACKERS] Fixed length data types issue

2006-09-12 Thread Simon Riggs
On Mon, 2006-09-11 at 14:25 -0400, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: Is this an 8.2 thing? You are joking, no? Confirming, using an open question, and a smile. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end

Re: [HACKERS] Fixed length data types issue

2006-09-12 Thread Gregory Stark
Alvaro Herrera [EMAIL PROTECTED] writes: Gregory Stark wrote: Well char doesn't have quite the same semantics as CHAR(1). If that's the consensus though then I can work on either fixing char semantics to match CHAR(1) or adding a separate type instead. What semantics? The main bit

Re: [HACKERS] Fixed length data types issue

2006-09-11 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes: Gregory Stark [EMAIL PROTECTED] writes: I'm a bit confused by this and how it would be handled in your sketch. I assumed we needed a bit pattern dedicated to 4-byte length headers because even though it would never occur on disk it would be necessary to for

Re: [HACKERS] Fixed length data types issue

2006-09-11 Thread Markus Schaber
Hi, Tom, Tom Lane wrote: The only way we could pack stuff without alignment is to go over to the idea that memory and disk representations are different --- where in this case the conversion might just be a memcpy to a known-aligned location. The performance costs of that seem pretty

Re: [HACKERS] Fixed length data types issue

2006-09-11 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes: Mark Dilger [EMAIL PROTECTED] writes: ... The argument made upthread that a quadratic number of conversion operators is necessitated doesn't seem right to me, given that each type could upcast to the canonical built in type. (int1 = smallint, int3 =

Re: [HACKERS] Fixed length data types issue

2006-09-11 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes: Also Heikki points out here that it would be nice to allow for the case for a 0-byte header. I don't think there's enough code space for that; at least not compared to its use case. Well it's irrelevant if we add a special data type to handle CHAR(1). But

Re: [HACKERS] Fixed length data types issue

2006-09-11 Thread Alvaro Herrera
Gregory Stark wrote: Tom Lane [EMAIL PROTECTED] writes: Also Heikki points out here that it would be nice to allow for the case for a 0-byte header. I don't think there's enough code space for that; at least not compared to its use case. Well it's irrelevant if we add a special

Re: [HACKERS] Fixed length data types issue

2006-09-11 Thread Martijn van Oosterhout
On Mon, Sep 11, 2006 at 03:13:36PM +0100, Gregory Stark wrote: Tom Lane [EMAIL PROTECTED] writes: Also Heikki points out here that it would be nice to allow for the case for a 0-byte header. I don't think there's enough code space for that; at least not compared to its use case.

Re: [HACKERS] Fixed length data types issue

2006-09-11 Thread Gregory Stark
Alvaro Herrera [EMAIL PROTECTED] writes: Well it's irrelevant if we add a special data type to handle CHAR(1). In that case you should probably be using char ... Well char doesn't have quite the same semantics as CHAR(1). If that's the consensus though then I can work on either fixing char

Re: [HACKERS] Fixed length data types issue

2006-09-11 Thread Simon Riggs
On Sun, 2006-09-10 at 21:16 -0400, Tom Lane wrote: After further thought I have an alternate proposal (snip) * If high order bit of datum's first byte is 0, then it's an uncompressed datum in what's essentially the same as our current in-memory format except that the 4-byte length word must

Re: [HACKERS] Fixed length data types issue

2006-09-11 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] writes: I'm imagining that it would give you the same old uncompressed in-memory representation as it does now, ie, 4-byte length word and uncompressed data. Sure, but how would you know? Sometimes you would get a pointer to

Re: [HACKERS] Fixed length data types issue

2006-09-11 Thread Alvaro Herrera
Gregory Stark wrote: Alvaro Herrera [EMAIL PROTECTED] writes: Well it's irrelevant if we add a special data type to handle CHAR(1). In that case you should probably be using char ... Well char doesn't have quite the same semantics as CHAR(1). If that's the consensus though then I

Re: [HACKERS] Fixed length data types issue

2006-09-11 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes: In any case it seems a bit backwards to me. Wouldn't it be better to preserve bits in the case of short length words where they're precious rather than long ones? If we make 0xxx the 1-byte case it means ... Well, I don't find that real persuasive:

Re: [HACKERS] Fixed length data types issue

2006-09-11 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes: Gregory Stark [EMAIL PROTECTED] writes: In any case it seems a bit backwards to me. Wouldn't it be better to preserve bits in the case of short length words where they're precious rather than long ones? If we make 0xxx the 1-byte case it means ...

Re: [HACKERS] Fixed length data types issue

2006-09-11 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes: Is this an 8.2 thing? You are joking, no? If not, is Numeric508 applied? No, that got rejected as being too much of a restriction of the dynamic range, eg John's comment here: http://archives.postgresql.org/pgsql-general/2005-12/msg00246.php I think a

Re: [HACKERS] Fixed length data types issue

2006-09-11 Thread mark
On Mon, Sep 11, 2006 at 01:15:43PM -0400, Tom Lane wrote: Gregory Stark [EMAIL PROTECTED] writes: In any case it seems a bit backwards to me. Wouldn't it be better to preserve bits in the case of short length words where they're precious rather than long ones? If we make 0xxx the 1-byte

Re: [HACKERS] Fixed length data types issue

2006-09-11 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes: No, that got rejected as being too much of a restriction of the dynamic range, eg John's comment here: http://archives.postgresql.org/pgsql-general/2005-12/msg00246.php That logic seems questionable. John makes two points: a) crypto applications are within

Re: [HACKERS] Fixed length data types issue

2006-09-11 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] writes: No, that got rejected as being too much of a restriction of the dynamic range, eg John's comment here: http://archives.postgresql.org/pgsql-general/2005-12/msg00246.php That logic seems questionable. John makes two

Re: [HACKERS] Fixed length data types issue

2006-09-11 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes: That's utterly irrelevant. The point is that there are standard applications today in which people need that much precision; therefore, the argument that 10^508 is far more than anyone could want is on exceedingly shaky ground. My point is those

Re: [HACKERS] Fixed length data types issue

2006-09-11 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes: At first I meant that as a reductio ad absurdum argument, but, uh, come to think of it why *do* we have our own arbitrary precision library? Is there any particular reason we can't use one of the existing binary implementations? Going over to binary

Re: [HACKERS] Fixed length data types issue

2006-09-11 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes: Gregory Stark [EMAIL PROTECTED] writes: At first I meant that as a reductio ad absurdum argument, but, uh, come to think of it why *do* we have our own arbitrary precision library? Is there any particular reason we can't use one of the existing binary

Re: [HACKERS] Fixed length data types issue

2006-09-11 Thread mark
On Mon, Sep 11, 2006 at 07:05:12PM -0400, Gregory Stark wrote: Tom Lane [EMAIL PROTECTED] writes: Gregory Stark [EMAIL PROTECTED] writes: At first I meant that as a reductio ad absurdum argument, but, uh, come to think of it why *do* we have our own arbitrary precision library? Is

Re: [HACKERS] Fixed length data types issue

2006-09-10 Thread Mark Dilger
Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: No one has mentioned that we page value on disk to match the CPU alignment. This is done for efficiency, but is not strictly required. Well, it is unless you are willing to give up support of non-Intel CPUs; most other popular chips are

Re: [HACKERS] Fixed length data types issue

2006-09-10 Thread Martijn van Oosterhout
On Sun, Sep 10, 2006 at 11:55:35AM -0700, Mark Dilger wrote: Well, it is unless you are willing to give up support of non-Intel CPUs; most other popular chips are strict about alignment, and will fail an attempt to do a nonaligned fetch. Intel CPUs are detectable at compile time, right? Do

Re: [HACKERS] Fixed length data types issue

2006-09-10 Thread Mark Dilger
Martijn van Oosterhout wrote: On Sun, Sep 10, 2006 at 11:55:35AM -0700, Mark Dilger wrote: Well, it is unless you are willing to give up support of non-Intel CPUs; most other popular chips are strict about alignment, and will fail an attempt to do a nonaligned fetch. Intel CPUs are detectable

Re: [HACKERS] Fixed length data types issue

2006-09-10 Thread Tom Lane
Mark Dilger [EMAIL PROTECTED] writes: ... The argument made upthread that a quadratic number of conversion operators is necessitated doesn't seem right to me, given that each type could upcast to the canonical built in type. (int1 = smallint, int3 = integer, ascii1 = text, ascii2 = text,

Re: [HACKERS] Fixed length data types issue

2006-09-10 Thread Bruce Momjian
Added to TODO: * Consider ways of storing rows more compactly on disk o Store disk pages with no alignment/padding? o Reorder physical storage order to reduce padding? o Support a smaller header for short variable-length

Re: [HACKERS] Fixed length data types issue

2006-09-10 Thread Mark Dilger
Tom Lane wrote: Mark Dilger [EMAIL PROTECTED] writes: ... The argument made upthread that a quadratic number of conversion operators is necessitated doesn't seem right to me, given that each type could upcast to the canonical built in type. (int1 = smallint, int3 = integer, ascii1 = text,

Re: [HACKERS] Fixed length data types issue

2006-09-10 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes: * Consider ways of storing rows more compactly on disk o Support a smaller header for short variable-length fields? With respect to the business of having different on-disk and in-memory representations, we have that already today: see

Re: [HACKERS] Fixed length data types issue

2006-09-10 Thread Bruce Momjian
Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: * Consider ways of storing rows more compactly on disk o Support a smaller header for short variable-length fields? With respect to the business of having different on-disk and in-memory representations, we have that

Re: [HACKERS] Fixed length data types issue

2006-09-10 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes: Tom Lane wrote: Either way, I think it would be interesting to consider (a) length word either one or two bytes, not four. You can't need more than 2 bytes for a datum that fits in a disk page ... That is an interesting observation, though could

Re: [HACKERS] Fixed length data types issue

2006-09-10 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes: Bruce Momjian [EMAIL PROTECTED] writes: Tom Lane wrote: Either way, I think it would be interesting to consider (a) length word either one or two bytes, not four. You can't need more than 2 bytes for a datum that fits in a disk page ... That is

Re: [HACKERS] Fixed length data types issue

2006-09-10 Thread Bruce Momjian
Gregory Stark wrote: Tom Lane [EMAIL PROTECTED] writes: Bruce Momjian [EMAIL PROTECTED] writes: Tom Lane wrote: Either way, I think it would be interesting to consider (a) length word either one or two bytes, not four. You can't need more than 2 bytes for a datum that fits in

Re: [HACKERS] Fixed length data types issue

2006-09-10 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes: I'm a bit confused by this and how it would be handled in your sketch. I assumed we needed a bit pattern dedicated to 4-byte length headers because even though it would never occur on disk it would be necessary to for the uncompressed and/or detoasted

Re: [HACKERS] Fixed length data types issue

2006-09-10 Thread Bruce Momjian
Tom Lane wrote: After further thought I have an alternate proposal that does that, but it's got its own disadvantage: it requires storing uncompressed 4-byte length words in big-endian byte order everywhere. This might be a showstopper (does anyone know the cost of ntohl() on modern Intel

Re: [HACKERS] Fixed length data types issue

2006-09-10 Thread Kevin Brown
Tom Lane wrote: (does anyone know the cost of ntohl() on modern Intel CPUs?) I wrote a simple test program to determine this: #include arpa/inet.h int main (int argc, char *argv[]) { unsigned long i; uint32_t a; a = 0; for (i = 0 ; i 40L ; ++i) {

Re: [HACKERS] Fixed length data types issue

2006-09-10 Thread Jeremy Drake
On Sun, 10 Sep 2006, Kevin Brown wrote: Tom Lane wrote: (does anyone know the cost of ntohl() on modern Intel CPUs?) I have a system with an Athlon 64 3200+ (2.0 GHz) running in 64-bit mode, another one with the same processor running in 32-bit mode, a a third running a Pentium 4 1.5 GHz

Re: [HACKERS] Fixed length data types issue

2006-09-09 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes: The performance costs of that seem pretty daunting, however, especially when you reflect that simply stepping over a varlena field would require memcpy'ing its length word to someplace. I think if you give up on disk and in-memory representations being the

Re: [HACKERS] Fixed length data types issue

2006-09-09 Thread Gregory Stark
Gregory Stark [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] writes: The performance costs of that seem pretty daunting, however, especially when you reflect that simply stepping over a varlena field would require memcpy'ing its length word to someplace. I think if you give up

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Peter Eisentraut
Gregory Stark wrote: This is most obviously the case for data warehouses that are doing lots of sequential scans of tables that don't fit in cache. In a data warehouse, you won't have many caching effects anyway. But it's largely true for OLTP applications too. The more compact the data the

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Peter Eisentraut
Gregory Stark wrote: I think we have to find a way to remove the varlena length header entirely for fixed length data types since it's going to be the same for every single record in the table. But that won't help in the example you posted upthread, because char(N) is not fixed-length. --

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Gregory Stark
Bruce Momjian [EMAIL PROTECTED] writes: Gregory Stark wrote: But I think this is a dead-end route. What you're looking at is the number 1 repeated for *every* record in the table. And what your proposing amounts to noticing that the number 4 fits in a byte and doesn't need a whole word

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Gregory Stark
Peter Eisentraut [EMAIL PROTECTED] writes: Gregory Stark wrote: I think we have to find a way to remove the varlena length header entirely for fixed length data types since it's going to be the same for every single record in the table. But that won't help in the example you posted

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Gregory Stark
Peter Eisentraut [EMAIL PROTECTED] writes: Gregory Stark wrote: But that won't help in the example you posted upthread, because char(N) is not fixed-length. Sure it is because any sane database--certainly any sane database using char(N)--is in C locale anyways. This matter is

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Andrew - Supernews
On 2006-09-08, Gregory Stark [EMAIL PROTECTED] wrote: But that won't help in the example you posted upthread, because char(N) is not fixed-length. Sure it is because any sane database--certainly any sane database using char(N)--is in C locale anyways. You're confusing locale and charset.

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Gregory Stark
Gregory Stark [EMAIL PROTECTED] writes: Peter Eisentraut [EMAIL PROTECTED] writes: Gregory Stark wrote: But that won't help in the example you posted upthread, because char(N) is not fixed-length. Sure it is because any sane database--certainly any sane database using

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Peter Eisentraut
Gregory Stark wrote: But that won't help in the example you posted upthread, because char(N) is not fixed-length. Sure it is because any sane database--certainly any sane database using char(N)--is in C locale anyways. This matter is completely independent of the choice of locale and

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Heikki Linnakangas
Gregory Stark wrote: But why would you use UTF8 to encode fixed length ascii strings? The encoding is set per-database. Even if you need UTF-8 to encode user-supplied strings, there can still be many small ASCII fields in the database. Country code, currency code etc. -- Heikki Linnakangas

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Andrew Dunstan
Heikki Linnakangas wrote: Gregory Stark wrote: But why would you use UTF8 to encode fixed length ascii strings? The encoding is set per-database. Even if you need UTF-8 to encode user-supplied strings, there can still be many small ASCII fields in the database. Country code, currency code

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Martijn van Oosterhout
On Fri, Sep 08, 2006 at 05:54:01AM -0400, Andrew Dunstan wrote: The encoding is set per-database. Even if you need UTF-8 to encode user-supplied strings, there can still be many small ASCII fields in the database. Country code, currency code etc. ISTM we should revisit this when we get

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Heikki Linnakangas
Martijn van Oosterhout wrote: I think that if SQL COLLATE gets in we'll get this almost for free. Collation and charset are both properties of strings. Once you've got a mechanism to know the collation of a string, you just attach the charset to the same place. The only difference is that

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Martijn van Oosterhout
On Fri, Sep 08, 2006 at 11:58:59AM +0100, Heikki Linnakangas wrote: Martijn van Oosterhout wrote: I think that if SQL COLLATE gets in we'll get this almost for free. Collation and charset are both properties of strings. Once you've got a mechanism to know the collation of a string, you just

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Martijn van Oosterhout
On Thu, Sep 07, 2006 at 04:57:04PM -0400, Gregory Stark wrote: Uhm, an ICU source tree is over 40 *megabytes*. That's almost as much as the rest of Postgres itself and that doesn't even include documentation. Even if you exclude the data and regression tests you're still talking about depending

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Peter Eisentraut
Heikki Linnakangas wrote: have a default set per-database, per-table or per-column, but it's not a property of the actual value of a field. I think that the phrase collation of a string doesn't make sense. The real problem is that the established method dividing up the locale categories

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Heikki Linnakangas
Peter Eisentraut wrote: The real problem is that the established method dividing up the locale categories ignores both the technological and the linguistic reality. In reality, all properties like lc_collate, lc_ctype, and lc_numeric are dependent on the property language of the text. I don't

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Martijn van Oosterhout
On Fri, Sep 08, 2006 at 02:14:58PM +0200, Peter Eisentraut wrote: So mathematically, you are right, the collation is a property of the operation, not of the operands. But semantically, the operands do carry the information of what collation order they would like to be compared under, and

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread mark
On Fri, Sep 08, 2006 at 08:57:12AM +0200, Peter Eisentraut wrote: Gregory Stark wrote: I think we have to find a way to remove the varlena length header entirely for fixed length data types since it's going to be the same for every single record in the table. But that won't help in the

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread mark
On Fri, Sep 08, 2006 at 08:50:57AM +0200, Peter Eisentraut wrote: Gregory Stark wrote: But it's largely true for OLTP applications too. The more compact the data the more tuples fit on a page and the greater the chance you have the page you need in cache. But a linear amount of more RAM is

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Martijn van Oosterhout
On Fri, Sep 08, 2006 at 09:28:21AM -0400, [EMAIL PROTECTED] wrote: But that won't help in the example you posted upthread, because char(N) is not fixed-length. It can be fixed-length, or at least, have an upper bound. If marked up to contain only ascii characters, it doesn't, at least in

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Tom Lane
Martijn van Oosterhout kleptog@svana.org writes: On Thu, Sep 07, 2006 at 04:57:04PM -0400, Gregory Stark wrote: Uhm, an ICU source tree is over 40 *megabytes*. I don't understand this argument. No-one asked what size the LDAP libraries were when we added support for them. No-one cares that

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Gregory Stark
Martijn van Oosterhout kleptog@svana.org writes: I'm still missing the argument of why you can't just make a 16-byte type. Around half the datatypes in postgresql are fixed-length and have no header. I'm completely confused about why people are hung up about bytea(16) not being fixed length

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Martijn van Oosterhout
On Fri, Sep 08, 2006 at 10:35:58AM -0400, Tom Lane wrote: The reason this is a relevant consideration: we are talking about changes that would remove existing functionality for people who don't have that library. Huh? If you don't select ICU at compile time you get no difference from what we

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Tom Lane
Martijn van Oosterhout kleptog@svana.org writes: On Fri, Sep 08, 2006 at 10:35:58AM -0400, Tom Lane wrote: what's more, the docs suggest that it doesn't support anything wider than UTF16. Well, that's not true, which part of the docs were you looking at? AFAICT, most of the useful operations

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Martijn van Oosterhout
On Fri, Sep 08, 2006 at 12:19:19PM -0400, Tom Lane wrote: Martijn van Oosterhout kleptog@svana.org writes: On Fri, Sep 08, 2006 at 10:35:58AM -0400, Tom Lane wrote: what's more, the docs suggest that it doesn't support anything wider than UTF16. Well, that's not true, which part of the

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Tom Lane
Martijn van Oosterhout kleptog@svana.org writes: AFAICT, most of the useful operations work on UChar, which is uint16: http://icu.sourceforge.net/apiref/icu4c/umachine_8h.html#6bb9fad572d65b30= 5324ef288165e2ac Oh, you're confusing UCS-2 with UTF-16, Ah, you're right, I did misunderstand

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread mark
On Fri, Sep 08, 2006 at 12:57:29PM -0400, Tom Lane wrote: Martijn van Oosterhout kleptog@svana.org writes: AFAICT, most of the useful operations work on UChar, which is uint16: http://icu.sourceforge.net/apiref/icu4c/umachine_8h.html#6bb9fad572d65b30= 5324ef288165e2ac Oh, you're confusing

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Martijn van Oosterhout
On Fri, Sep 08, 2006 at 12:57:29PM -0400, Tom Lane wrote: Ah, you're right, I did misunderstand that. However, it's still apparently the case that ICU works mostly with UTF16 and handles other encodings only via conversion to UTF16. That's a pretty serious mismatch with our needs --- we'll

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Alvaro Herrera
[EMAIL PROTECTED] wrote: I think I've been involved in a discussion like this in the past. Was it mentioned in this list before? Yes the UTF-8 vs UTF-16 encoding means that UTF-8 applications are at a disadvantage when using the library. UTF-16 is considered more efficient to work with for

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Bruce Momjian
Gregory Stark wrote: Bruce Momjian [EMAIL PROTECTED] writes: Gregory Stark wrote: But I think this is a dead-end route. What you're looking at is the number 1 repeated for *every* record in the table. And what your proposing amounts to noticing that the number 4 fits in a

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes: No one has mentioned that we page value on disk to match the CPU alignment. This is done for efficiency, but is not strictly required. Well, it is unless you are willing to give up support of non-Intel CPUs; most other popular chips are strict about

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Andrew Dunstan
Bruce Momjian wrote: No one has mentioned that we page value on disk to match the CPU alignment. This is done for efficiency, but is not strictly required. From time to time the idea of a logical vs physical mapping for columns has been mentioned. Among other benefits, that might allow

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Bruce Momjian
Martijn van Oosterhout wrote: -- Start of PGP signed section. On Fri, Sep 08, 2006 at 09:28:21AM -0400, [EMAIL PROTECTED] wrote: But that won't help in the example you posted upthread, because char(N) is not fixed-length. It can be fixed-length, or at least, have an upper bound. If

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Bruce Momjian
Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: No one has mentioned that we page value on disk to match the CPU alignment. This is done for efficiency, but is not strictly required. Well, it is unless you are willing to give up support of non-Intel CPUs; most other popular chips

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread mark
On Fri, Sep 08, 2006 at 02:39:03PM -0400, Alvaro Herrera wrote: [EMAIL PROTECTED] wrote: I think I've been involved in a discussion like this in the past. Was it mentioned in this list before? Yes the UTF-8 vs UTF-16 encoding means that UTF-8 applications are at a disadvantage when using

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Alvaro Herrera
[EMAIL PROTECTED] wrote: On Fri, Sep 08, 2006 at 02:39:03PM -0400, Alvaro Herrera wrote: [EMAIL PROTECTED] wrote: I think I've been involved in a discussion like this in the past. Was it mentioned in this list before? Yes the UTF-8 vs UTF-16 encoding means that UTF-8 applications are

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread mark
On Fri, Sep 08, 2006 at 04:42:09PM -0400, Alvaro Herrera wrote: [EMAIL PROTECTED] wrote: The authors of the library in question? Java? Anybody whose primary alphabet isn't LATIN1 based? :-) Well, for Latin-9 alphabets, Latin-9 is still more space-efficient than UTF-8. That covers a lot of

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Alvaro Herrera
[EMAIL PROTECTED] wrote: On Fri, Sep 08, 2006 at 04:42:09PM -0400, Alvaro Herrera wrote: But Martijn already clarified that ICU does not actually force you to switch everything to UTF-16, so this is not an issue anyway. If my memory is correct, it does this by converting it to UTF-16

  1   2   >