Re: [HACKERS] invalidly encoded strings

2007-09-10 Thread Albe Laurenz
Tom Lane wrote: . for chr() under UTF8, it seems to be generally agreed that the argument should represent the codepoint and the function should return the correspondingly encoded character. If so, possible the argument should be a bigint to accommodate the full range of possible code points.

Re: [HACKERS] invalidly encoded strings

2007-09-10 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes: Jeff Davis [EMAIL PROTECTED] writes: I think the concern is when they use only one slash, like: E'\377\000\377'::bytea which, as I mentioned before, is not correct anyway. Wait, why would this be wrong? How would you enter the three byte bytea of

Re: [HACKERS] invalidly encoded strings

2007-09-10 Thread db
I think the concern is when they use only one slash, like: E'\377\000\377'::bytea which, as I mentioned before, is not correct anyway. Wait, why would this be wrong? How would you enter the three byte bytea of consisting of those three bytes described above? Either as E'\\377\\000\\377'

Re: [HACKERS] Hash index todo list item

2007-09-10 Thread Jens-Wolfhard Schicke
--On Samstag, September 08, 2007 18:56:23 -0400 Mark Mielke [EMAIL PROTECTED] wrote: Kenneth Marshall wrote: Along with the hypothetical performance wins, the hash index space efficiency would be improved by a similar factor. Obviously, all of these ideas would need to be tested in various

[HACKERS] Include Lists for Text Search

2007-09-10 Thread Simon Riggs
It seems possible to write your own functions to support various possibilities with text search. One of the more common thoughts is to have a list of words that you would like to include, i.e. the opposite of a stop word list. There are clear indications that indexing too many words is a

Re: [HACKERS] A Silly Idea for Vertically-Oriented Databases

2007-09-10 Thread Simon Riggs
On Fri, 2007-09-07 at 13:52 -0700, Avery Payne wrote: So I've been seeing/hearing all of the hoopla over vertical databases (column stores), and how they'll not only slice bread but also make toast, etc. I've done some quick searches for past articles on C-Store, Vertica, Column Store,

Re: [HACKERS] Include Lists for Text Search

2007-09-10 Thread Heikki Linnakangas
Simon Riggs wrote: It seems possible to write your own functions to support various possibilities with text search. One of the more common thoughts is to have a list of words that you would like to include, i.e. the opposite of a stop word list. There are clear indications that indexing

Re: [HACKERS] Include Lists for Text Search

2007-09-10 Thread Oleg Bartunov
On Mon, 10 Sep 2007, Simon Riggs wrote: It seems possible to write your own functions to support various possibilities with text search. One of the more common thoughts is to have a list of words that you would like to include, i.e. the opposite of a stop word list. There are clear

Re: [HACKERS] integrated tsearch doesn't work with non utf8 database

2007-09-10 Thread Teodor Sigaev
Note the Seq Scan on pg_ts_config_map, with filter on ts_lexize(mapdict, $1). That means that it will call ts_lexize on every dictionary, which will try to load every dictionary. And loading danish_stem dictionary fails in latin2 encoding, because of the problem with the stopword file. Attached

Re: [HACKERS] invalidly encoded strings

2007-09-10 Thread Andrew Dunstan
Tom Lane wrote: Andrew Dunstan [EMAIL PROTECTED] writes: Tom Lane wrote: In the short run it might be best to do it in scan.l after all. I have not come up with a way of doing that and handling the bytea case. AFAICS we have no realistic choice other than to

Re: [HACKERS] Include Lists for Text Search

2007-09-10 Thread Simon Riggs
On Mon, 2007-09-10 at 12:58 +0100, Heikki Linnakangas wrote: Simon Riggs wrote: It seems possible to write your own functions to support various possibilities with text search. One of the more common thoughts is to have a list of words that you would like to include, i.e. the opposite

Re: [HACKERS] Include Lists for Text Search

2007-09-10 Thread Oleg Bartunov
On Mon, 10 Sep 2007, Simon Riggs wrote: On Mon, 2007-09-10 at 12:58 +0100, Heikki Linnakangas wrote: Simon Riggs wrote: It seems possible to write your own functions to support various possibilities with text search. One of the more common thoughts is to have a list of words that you would

Re: [HACKERS] Include Lists for Text Search

2007-09-10 Thread Oleg Bartunov
On Mon, 10 Sep 2007, Simon Riggs wrote: On Mon, 2007-09-10 at 16:10 +0400, Oleg Bartunov wrote: On Mon, 10 Sep 2007, Simon Riggs wrote: It seems possible to write your own functions to support various possibilities with text search. One of the more common thoughts is to have a list of words

Re: [HACKERS] ispell dictionary broken in CVS HEAD ?

2007-09-10 Thread Teodor Sigaev
Is anyone working on providing basic regression tests for the different dictionary types? Seems like the main stumbling block is providing I'll do some tests for dictionaries, but it will be synthetic dictionary. Original ispell files is rather big, so I'll make rather simple and small one.

Re: [HACKERS] Include Lists for Text Search

2007-09-10 Thread Teodor Sigaev
How does that allow me to limit the number of words to a known list? If all dictionaries returns NULL for token the this token will not be indexed at all. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW:

Re: [HACKERS] Include Lists for Text Search

2007-09-10 Thread Simon Riggs
On Mon, 2007-09-10 at 16:10 +0400, Oleg Bartunov wrote: On Mon, 10 Sep 2007, Simon Riggs wrote: It seems possible to write your own functions to support various possibilities with text search. One of the more common thoughts is to have a list of words that you would like to include,

Re: [HACKERS] Include Lists for Text Search

2007-09-10 Thread Teodor Sigaev
There are clear indications that indexing too many words is a problem for both GIN and GIST. If people already know what they'll be looking GIN doesn't depend strongly on number of words. It has log(N) behaviour for numbers of words because of using B-Tree over words. -- Teodor Sigaev

Re: [HACKERS] ispell dictionary broken in CVS HEAD ?

2007-09-10 Thread Heikki Linnakangas
Tom Lane wrote: I don't know enough about ispell to understand what its config files look like. (There's a problem of missing documentation here, too...) Yeah :(. The file format that ispell accepts is kind of ad hoc. It accepts hunspell and ispell and myspell variants, but only a subset of

Re: [HACKERS] Include Lists for Text Search

2007-09-10 Thread Simon Riggs
On Mon, 2007-09-10 at 16:35 +0400, Oleg Bartunov wrote: On Mon, 10 Sep 2007, Simon Riggs wrote: On Mon, 2007-09-10 at 16:10 +0400, Oleg Bartunov wrote: On Mon, 10 Sep 2007, Simon Riggs wrote: It seems possible to write your own functions to support various possibilities with text

Re: [HACKERS] Include Lists for Text Search

2007-09-10 Thread Simon Riggs
On Mon, 2007-09-10 at 16:48 +0400, Teodor Sigaev wrote: There are clear indications that indexing too many words is a problem for both GIN and GIST. If people already know what they'll be looking GIN is great, sorry if that sounded negative. GIN doesn't depend strongly on number of words.

Re: [HACKERS] invalidly encoded strings

2007-09-10 Thread Andrew Dunstan
Albe Laurenz wrote: I'd like to repeat my suggestion for chr() and ascii(). Instead of the code point, I'd prefer the actual encoding of the character as argument to chr() and return value of ascii(). [snip] Of course, if it is generally perceived that the code point is more useful

Re: [HACKERS] invalidly encoded strings

2007-09-10 Thread Tom Lane
Andrew Dunstan [EMAIL PROTECTED] writes: Perhaps we're talking at cross purposes. The problem with doing encoding validation in scan.l is that it lacks context. Null bytes are only the tip of the bytea iceberg, since any arbitrary sequence of bytes can be valid for a bytea. If you think

Re: [HACKERS] Include Lists for Text Search

2007-09-10 Thread Oleg Bartunov
On Mon, 10 Sep 2007, Simon Riggs wrote: On Mon, 2007-09-10 at 16:48 +0400, Teodor Sigaev wrote: There are clear indications that indexing too many words is a problem for both GIN and GIST. If people already know what they'll be looking GIN is great, sorry if that sounded negative. GIN

Re: [HACKERS] Include Lists for Text Search

2007-09-10 Thread Oleg Bartunov
On Mon, 10 Sep 2007, Simon Riggs wrote: On Mon, 2007-09-10 at 16:35 +0400, Oleg Bartunov wrote: On Mon, 10 Sep 2007, Simon Riggs wrote: On Mon, 2007-09-10 at 16:10 +0400, Oleg Bartunov wrote: On Mon, 10 Sep 2007, Simon Riggs wrote: It seems possible to write your own functions to support

Re: [HACKERS] invalidly encoded strings

2007-09-10 Thread Tom Lane
Andrew Dunstan [EMAIL PROTECTED] writes: The reason we are prepared to make an exception for Unicode is precisely because the code point maps to an encoding pattern independently of architecture, ISTM. Right --- there is a well-defined standard for the numerical value of each character in

Re: [HACKERS] invalidly encoded strings

2007-09-10 Thread Tom Lane
Andrew Dunstan [EMAIL PROTECTED] writes: Tom Lane wrote: Those should be checked already --- if not, the right fix is still to fix it there, not in per-datatype code. I think we are OK though, eg see need_transcoding logic in copy.c. Well, a little experimentation shows that we currently

Re: [HACKERS] Include Lists for Text Search

2007-09-10 Thread Tom Lane
Oleg Bartunov [EMAIL PROTECTED] writes: On Mon, 10 Sep 2007, Simon Riggs wrote: Can we include that functionality now? This could be realized very easyly using dict_strict, which returns only known words, and mapping contains only this dictionary. So, feel free to write it and submit. ...

Re: [HACKERS] invalidly encoded strings

2007-09-10 Thread Andrew Dunstan
Tom Lane wrote: Andrew Dunstan [EMAIL PROTECTED] writes: Perhaps we're talking at cross purposes. The problem with doing encoding validation in scan.l is that it lacks context. Null bytes are only the tip of the bytea iceberg, since any arbitrary sequence of bytes can be valid

Re: [HACKERS] Include Lists for Text Search

2007-09-10 Thread Oleg Bartunov
On Mon, 10 Sep 2007, Tom Lane wrote: Oleg Bartunov [EMAIL PROTECTED] writes: On Mon, 10 Sep 2007, Simon Riggs wrote: Can we include that functionality now? This could be realized very easyly using dict_strict, which returns only known words, and mapping contains only this dictionary. So,

Re: [HACKERS] Hash index todo list item

2007-09-10 Thread Jens-Wolfhard Schicke
More random thoughts: - Hash-Indices are best for unique keys, but every table needs a new hash key, which means one more random page access. Is there any way to build multi-_table_ indices? A join might then fetch all table rows with a given unique key after one page fetch for the combined

Re: [HACKERS] Hash index todo list item

2007-09-10 Thread Mark Mielke
Kenneth Marshall wrote: On Sun, Sep 02, 2007 at 10:41:22PM -0400, Tom Lane wrote: Kenneth Marshall [EMAIL PROTECTED] writes: ... This is the rough plan. Does anyone see anything critical that is missing at this point? Sounds pretty good. Let me brain-dump one item on you: one

Re: [HACKERS] integrated tsearch doesn't work with non utf8 database

2007-09-10 Thread Tom Lane
Teodor Sigaev [EMAIL PROTECTED] writes: Note the Seq Scan on pg_ts_config_map, with filter on ts_lexize(mapdict, $1). That means that it will call ts_lexize on every dictionary, which will try to load every dictionary. And loading danish_stem dictionary fails in latin2 encoding, because of the

Re: [HACKERS] A Silly Idea for Vertically-Oriented Databases

2007-09-10 Thread Mark Mielke
Simon Riggs wrote: ISTM we would be able to do this fairly well if we implemented Index-only columns. i.e. columns that don't exist in the heap, only in an index. Taken to the extreme, all columns could be removed from the heap and placed in an index(es). Only the visibility information would

Re: [HACKERS] integrated tsearch doesn't work with non utf8 database

2007-09-10 Thread Heikki Linnakangas
Tom Lane wrote: Teodor Sigaev [EMAIL PROTECTED] writes: Note the Seq Scan on pg_ts_config_map, with filter on ts_lexize(mapdict, $1). That means that it will call ts_lexize on every dictionary, which will try to load every dictionary. And loading danish_stem dictionary fails in latin2

Re: [HACKERS] A Silly Idea for Vertically-Oriented Databases

2007-09-10 Thread Alvaro Herrera
Mark Mielke wrote: Simon Riggs wrote: ISTM we would be able to do this fairly well if we implemented Index-only columns. i.e. columns that don't exist in the heap, only in an index. Taken to the extreme, all columns could be removed from the heap and placed in an index(es). Only the

Re: [HACKERS] Hash index todo list item

2007-09-10 Thread Martijn van Oosterhout
On Sat, Sep 08, 2007 at 06:56:23PM -0400, Mark Mielke wrote: I think that if the case of 1 entry per hash becomes common enough to be significant, and the key is stored in the hash, that a btree will perform equal or better, and there is no point in pursuing such a hash index model. This

Re: [HACKERS] invalidly encoded strings

2007-09-10 Thread Tatsuo Ishii
Andrew Dunstan [EMAIL PROTECTED] writes: The reason we are prepared to make an exception for Unicode is precisely because the code point maps to an encoding pattern independently of architecture, ISTM. Right --- there is a well-defined standard for the numerical value of each

Re: [HACKERS] Include Lists for Text Search

2007-09-10 Thread Simon Riggs
On Mon, 2007-09-10 at 10:21 -0400, Tom Lane wrote: Oleg Bartunov [EMAIL PROTECTED] writes: On Mon, 10 Sep 2007, Simon Riggs wrote: Can we include that functionality now? This could be realized very easyly using dict_strict, which returns only known words, and mapping contains only this

Re: [HACKERS] invalidly encoded strings

2007-09-10 Thread Tom Lane
BTW, I'm sure this was discussed but I forgot the conclusion: should chr(0) throw an error? If we're trying to get rid of embedded-null problems, seems it must. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill

Re: [HACKERS] invalidly encoded strings

2007-09-10 Thread Martijn van Oosterhout
On Tue, Sep 11, 2007 at 12:30:51AM +0900, Tatsuo Ishii wrote: Why do you think that employing the Unicode code point as the chr() argument could avoid endianness issues? Are you going to represent Unicode code point as UCS-4? Then you have to specify the endianness anyway. (see the UCS-4

Re: [HACKERS] invalidly encoded strings

2007-09-10 Thread Andrew Dunstan
Tatsuo Ishii wrote: I don't understand whole discussion. Why do you think that employing the Unicode code point as the chr() argument could avoid endianness issues? Are you going to represent Unicode code point as UCS-4? Then you have to specify the endianness anyway. (see the UCS-4

Re: [HACKERS] invalidly encoded strings

2007-09-10 Thread Andrew Dunstan
Tom Lane wrote: BTW, I'm sure this was discussed but I forgot the conclusion: should chr(0) throw an error? If we're trying to get rid of embedded-null problems, seems it must. I think it should, yes. cheers andrew ---(end of

Re: [HACKERS] integrated tsearch doesn't work with non utf8 database

2007-09-10 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes: Tom Lane wrote: Uh, how will that help? AFAICS it still has to call ts_lexize with every dictionary. No, ts_lexize is no longer in the seq scan filter, but in the sort key that's calculated only for those rows that match the filter 'mapcfg=? AND

Re: [HACKERS] integrated tsearch doesn't work with non utf8 database

2007-09-10 Thread Teodor Sigaev
I think Teodor's solution is wrong as it stands, because if the subquery finds matches for mapcfg and maptokentype, but none of those rows produce a non-null ts_lexize result, it will instead emit one row with a null result, which is not what should happen. But concatenation with NULL will have

Re: [HACKERS] invalidly encoded strings

2007-09-10 Thread Tom Lane
Andrew Dunstan [EMAIL PROTECTED] writes: Tom Lane wrote: BTW, I'm sure this was discussed but I forgot the conclusion: should chr(0) throw an error? I think it should, yes. OK. Looking back, there was also some mention of changing chr's argument to bigint, but I'd counsel against doing

Re: [HACKERS] integrated tsearch doesn't work with non utf8 database

2007-09-10 Thread Tom Lane
Teodor Sigaev [EMAIL PROTECTED] writes: I think Teodor's solution is wrong as it stands, because if the subquery finds matches for mapcfg and maptokentype, but none of those rows produce a non-null ts_lexize result, it will instead emit one row with a null result, which is not what should

[HACKERS] Ts_rank internals

2007-09-10 Thread Heikki Linnakangas
Hi, I tried to understand how ts_rank works, but I failed. What does Cover function do? How does it work? What is the DocRepresentation data structure like? I can see the definition of the struct, and the get_docrep function to convert to that format, but by reading those I can't figure out what

Re: [HACKERS] invalidly encoded strings

2007-09-10 Thread Andrew Dunstan
Tom Lane wrote: OK. Looking back, there was also some mention of changing chr's argument to bigint, but I'd counsel against doing that. We should not need it since we only support 4-byte UTF8, hence code points only up to 21 bits (and indeed even 6-byte UTF8 can only have 31-bit code points,

Re: [HACKERS] Maybe some more low-hanging fruit in the latestCompletedXid patch.

2007-09-10 Thread Tom Lane
Florian G. Pflug [EMAIL PROTECTED] writes: Currently, we do not assume that either the childXids array, nor the xid cache in the proc array are sorted by ascending xid order. I believe that we could simplify the code, further reduce the locking requirements, and enabled a transaction to

Re: [HACKERS] Are we done with sync-commit-defaults-to-off patch?

2007-09-10 Thread Andrew Dunstan
Tom Lane wrote: Alvaro Herrera [EMAIL PROTECTED] writes: Maybe we should lower the autovac naptime too, just to make it do some more stuff (and to see if it breaks something else just because of being running). Well, Andrew has committed the pg_regress extension to allow buildfarm

Re: [HACKERS] Maybe some more low-hanging fruit in the latestCompletedXid patch.

2007-09-10 Thread Florian G. Pflug
Tom Lane wrote: Florian G. Pflug [EMAIL PROTECTED] writes: Currently, we do not assume that either the childXids array, nor the xid cache in the proc array are sorted by ascending xid order. I believe that we could simplify the code, further reduce the locking requirements, and enabled a

Re: [HACKERS] invalidly encoded strings

2007-09-10 Thread Martijn van Oosterhout
On Mon, Sep 10, 2007 at 11:48:29AM -0400, Tom Lane wrote: BTW, I'm sure this was discussed but I forgot the conclusion: should chr(0) throw an error? If we're trying to get rid of embedded-null problems, seems it must. It is pointed out on wikipedia that Java sometimes uses to byte pair C0 80

Re: [HACKERS] GucContext of log_autovacuum

2007-09-10 Thread Bruce Momjian
FYI, this has been committed by Tom. --- ITAGAKI Takahiro wrote: The GucContext of log_autovacuum is PGC_BACKEND in the CVS HEAD, but should it be PGC_SIGHUP? We cannot modify the variable on-the-fly because the

Re: [HACKERS] ispell dictionary broken in CVS HEAD ?

2007-09-10 Thread Teodor Sigaev
Is anyone working on providing basic regression tests for the different dictionary types? Seems like the main stumbling block is providing I make some small tests (http://www.sigaev.ru/misc/ispell_samples.tgz). So, what is better practice to builtin it? Make it installable with regular

Re: [HACKERS] ispell dictionary broken in CVS HEAD ?

2007-09-10 Thread Tom Lane
Teodor Sigaev [EMAIL PROTECTED] writes: Is anyone working on providing basic regression tests for the different dictionary types? Seems like the main stumbling block is providing I make some small tests (http://www.sigaev.ru/misc/ispell_samples.tgz). So, what is better practice to builtin

Re: [HACKERS] A Silly Idea for Vertically-Oriented Databases

2007-09-10 Thread Avery Payne
ISTM we would be able to do this fairly well if we implemented Index-only columns. i.e. columns that don't exist in the heap, only in an index. Taken to the extreme, all columns could be removed from the heap and placed in an index(es). Only the visibility information would remain on the

Re: [HACKERS] [ADMIN] reindexdb hangs

2007-09-10 Thread Alvaro Herrera
Tom Lane wrote: Alvaro Herrera [EMAIL PROTECTED] writes: I'm not sure I follow. Are you suggesting adding a new function, similar to pg_class_ownercheck, which additionally checks for temp-ness? No, I was just suggesting adding the check for temp-ness in cluster() and cluster_rel() where

Re: [HACKERS] A Silly Idea for Vertically-Oriented Databases

2007-09-10 Thread Alvaro Herrera
Avery Payne wrote: gt;I thought maybe we can call it COAST, Column-oriented attribute storage technique, :-) I like it. :-)a rel=nofollow href=http://www.2ndQuadrant.com;/a I just wish I would have read this before applying for a project name at pgfoundry, the current proposal is given

Re: [HACKERS] [ADMIN] reindexdb hangs

2007-09-10 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes: I am unsure if I should backpatch to 8.1: the code in cluster.c has changed, and while it is relatively easy to modify the patch, this is a rare bug and nobody has reported it in CLUSTER (not many people clusters temp tables, it seems). Should I patch

[HACKERS] txn in pg_stat_activity

2007-09-10 Thread Tom Lane
I have just noticed that a column txn_start has appeared in pg_stat_activity since 8.2. It's a good idea, but who chose the name? We do not use that abbreviation for transaction anywhere else in Postgres, certainly not in any user-exposed places. I'm inclined to rename it to xact_start, which is

Re: [HACKERS] txn in pg_stat_activity

2007-09-10 Thread Neil Conway
On Mon, 2007-09-10 at 21:04 -0400, Tom Lane wrote: I have just noticed that a column txn_start has appeared in pg_stat_activity since 8.2. It's a good idea, but who chose the name? Me. I'm inclined to rename it to xact_start, which is an abbreviation that we *do* use in the code, and in

Re: [HACKERS] invalidly encoded strings

2007-09-10 Thread Tatsuo Ishii
Tatsuo Ishii wrote: I don't understand whole discussion. Why do you think that employing the Unicode code point as the chr() argument could avoid endianness issues? Are you going to represent Unicode code point as UCS-4? Then you have to specify the endianness anyway. (see the

Re: [HACKERS] invalidly encoded strings

2007-09-10 Thread Tatsuo Ishii
Tatsuo Ishii wrote: I don't understand whole discussion. Why do you think that employing the Unicode code point as the chr() argument could avoid endianness issues? Are you going to represent Unicode code point as UCS-4? Then you have to specify the endianness anyway. (see the

Re: [HACKERS] invalidly encoded strings

2007-09-10 Thread Tom Lane
Tatsuo Ishii [EMAIL PROTECTED] writes: If you regard the unicode code point as simply a number, why not regard the multibyte characters as a number too? Because there's a standard specifying the Unicode code points *as numbers*. The mapping from those numbers to UTF8 strings (and other

Re: [HACKERS] invalidly encoded strings

2007-09-10 Thread Andrew Dunstan
Tatsuo Ishii wrote: If you regard the unicode code point as simply a number, why not regard the multibyte characters as a number too? I mean, since 0xC2A9 = 49833, select chr(49833) should work fine no? No. The number corresponding to a given byte pattern depends on the endianness of

Re: [HACKERS] invalidly encoded strings

2007-09-10 Thread Tatsuo Ishii
Tatsuo Ishii [EMAIL PROTECTED] writes: If you regard the unicode code point as simply a number, why not regard the multibyte characters as a number too? Because there's a standard specifying the Unicode code points *as numbers*. The mapping from those numbers to UTF8 strings (and other

Re: [HACKERS] txn in pg_stat_activity

2007-09-10 Thread Tom Lane
Neil Conway [EMAIL PROTECTED] writes: I personally find xact to be a less intuitive abbreviation of transaction than txn, but for the sake of consistency, I agree it is probably better to use xact_start. Barring other objections, I'll go make this happen. regards, tom

Re: [HACKERS] invalidly encoded strings

2007-09-10 Thread Jeff Davis
On Tue, 2007-09-11 at 11:27 +0900, Tatsuo Ishii wrote: BTW, it strikes me that there is another hole that we need to plug in this area, and that's the convert() function. Being able to create a value of type text that is not in the database encoding is simply broken. Perhaps we could

Re: [HACKERS] invalidly encoded strings

2007-09-10 Thread Tatsuo Ishii
On Tue, 2007-09-11 at 11:27 +0900, Tatsuo Ishii wrote: BTW, it strikes me that there is another hole that we need to plug in this area, and that's the convert() function. Being able to create a value of type text that is not in the database encoding is simply broken. Perhaps we

Re: [HACKERS] invalidly encoded strings

2007-09-10 Thread Andrew Dunstan
Tatsuo Ishii wrote: BTW, it strikes me that there is another hole that we need to plug in this area, and that's the convert() function. Being able to create a value of type text that is not in the database encoding is simply broken. Perhaps we could make it work on bytea instead (providing a

Re: [HACKERS] invalidly encoded strings

2007-09-10 Thread Andrew Dunstan
Jeff Davis wrote: On Tue, 2007-09-11 at 11:27 +0900, Tatsuo Ishii wrote: BTW, it strikes me that there is another hole that we need to plug in this area, and that's the convert() function. Being able to create a value of type text that is not in the database encoding is simply broken.

Re: [HACKERS] invalidly encoded strings

2007-09-10 Thread Tom Lane
Andrew Dunstan [EMAIL PROTECTED] writes: I'm not sure we are going to be able to catch every path by which invalid data can get into the database in one release. I suspect we might need two or three goes at this. (I'm just wondering if the routines that return cstrings are a possible

Re: [HACKERS] invalidly encoded strings

2007-09-10 Thread Tom Lane
Tatsuo Ishii [EMAIL PROTECTED] writes: BTW, it strikes me that there is another hole that we need to plug in this area, and that's the convert() function. Being able to create a value of type text that is not in the database encoding is simply broken. Perhaps we could make it work on bytea

Re: [HACKERS] invalidly encoded strings

2007-09-10 Thread Tatsuo Ishii
Tatsuo Ishii [EMAIL PROTECTED] writes: BTW, it strikes me that there is another hole that we need to plug in this area, and that's the convert() function. Being able to create a value of type text that is not in the database encoding is simply broken. Perhaps we could make it work on

[HACKERS] What is happening on buildfarm member dugong?

2007-09-10 Thread Tom Lane
dugong has been failing contribcheck repeatably for the last day or so, with a very interesting symptom: CREATE DATABASE is failing with ERROR: could not fsync segment 0 of relation 1663/40960/41403: No such file or directory ERROR: checkpoint request failed HINT: Consult recent messages in

Re: [HACKERS] invalidly encoded strings

2007-09-10 Thread Jeff Davis
On Tue, 2007-09-11 at 11:53 +0900, Tatsuo Ishii wrote: Isn't the collation a locale issue, not an encoding issue? Is there a ja_JP.UTF-8 that defines the proper order? I don't think it helps. The point is, he needs different language's collation, while PostgreSQL allows only one

[HACKERS] Testing 8.3 LDC vs. 8.2.4 with aggressive BGW

2007-09-10 Thread Greg Smith
Renaming the old thread to more appropriately address the topic: On Wed, 5 Sep 2007, Kevin Grittner wrote: Then I would test the new background writer with synchronous commits under the 8.3 beta, using various settings. The 0.5, 0.7 and 0.9 settings you recommended for a test are how far from

Re: [HACKERS] invalidly encoded strings

2007-09-10 Thread Jeff Davis
On Tue, 2007-09-11 at 12:29 +0900, Tatsuo Ishii wrote: Please show me concrete examples how I could introduce a vulnerability using this kind of convert() usage. Try the sequence below. Then, try to dump and then reload the database. When you try to reload it, you will get an error: ERROR:

Re: [HACKERS] Ts_rank internals

2007-09-10 Thread Teodor Sigaev
I tried to understand how ts_rank works, but I failed. What does Cover function do? How does it work? What is the DocRepresentation data structure like? I can see the definition of the struct, and the get_docrep function to convert to that format, but by reading those I can't figure out what the

Re: [HACKERS] invalidly encoded strings

2007-09-10 Thread Tatsuo Ishii
On Tue, 2007-09-11 at 12:29 +0900, Tatsuo Ishii wrote: Please show me concrete examples how I could introduce a vulnerability using this kind of convert() usage. Try the sequence below. Then, try to dump and then reload the database. When you try to reload it, you will get an error: