>> Try the sequence below. Then, try to dump and then reload the database.
>> When you try to reload it, you will get an error:
>>
>> ERROR: invalid byte sequence for encoding "UTF8": 0xbd
>
> I know this could be a problem (like chr() with invalid byte pattern).
And that's enough of a problem al
On Tue, 2007-09-11 at 14:50 +0900, Tatsuo Ishii wrote:
>
> > On Tue, 2007-09-11 at 12:29 +0900, Tatsuo Ishii wrote:
> > > Please show me concrete examples how I could introduce a
> vulnerability
> > > using this kind of convert() usage.
> >
> > Try the sequence below. Then, try to dump and then r
On Tue, Sep 11, 2007 at 11:27:50AM +0900, Tatsuo Ishii wrote:
> SELECT * FROM japanese_table ORDER BY convert(japanese_text using
> utf8_to_euc_jp);
>
> Without using convert(), he will get random order of data. This is
> because Kanji characters are in random order in UTF-8, while Kanji
> charac
> On Tue, 2007-09-11 at 12:29 +0900, Tatsuo Ishii wrote:
> > Please show me concrete examples how I could introduce a vulnerability
> > using this kind of convert() usage.
>
> Try the sequence below. Then, try to dump and then reload the database.
> When you try to reload it, you will get an err
I tried to understand how ts_rank works, but I failed. What does Cover
function do? How does it work? What is the DocRepresentation data
structure like? I can see the definition of the struct, and the
get_docrep function to convert to that format, but by reading those I
can't figure out what the r
On Tue, 2007-09-11 at 12:29 +0900, Tatsuo Ishii wrote:
> Please show me concrete examples how I could introduce a vulnerability
> using this kind of convert() usage.
Try the sequence below. Then, try to dump and then reload the database.
When you try to reload it, you will get an error:
ERROR: i
Renaming the old thread to more appropriately address the topic:
On Wed, 5 Sep 2007, Kevin Grittner wrote:
Then I would test the new background writer with synchronous commits under
the 8.3 beta, using various settings. The 0.5, 0.7 and 0.9 settings you
recommended for a test are how far from
On Tue, 2007-09-11 at 11:53 +0900, Tatsuo Ishii wrote:
> > Isn't the collation a locale issue, not an encoding issue? Is there a
> > ja_JP.UTF-8 that defines the proper order?
>
> I don't think it helps. The point is, he needs different language's
> collation, while PostgreSQL allows only one coll
dugong has been failing contribcheck repeatably for the last day or so,
with a very interesting symptom: CREATE DATABASE is failing with
ERROR: could not fsync segment 0 of relation 1663/40960/41403: No such file or
directory
ERROR: checkpoint request failed
HINT: Consult recent messages in th
> Tatsuo Ishii <[EMAIL PROTECTED]> writes:
> >> BTW, it strikes me that there is another hole that we need to plug in
> >> this area, and that's the convert() function. Being able to create
> >> a value of type text that is not in the database encoding is simply
> >> broken. Perhaps we could make
Tatsuo Ishii <[EMAIL PROTECTED]> writes:
>> BTW, it strikes me that there is another hole that we need to plug in
>> this area, and that's the convert() function. Being able to create
>> a value of type text that is not in the database encoding is simply
>> broken. Perhaps we could make it work o
Andrew Dunstan <[EMAIL PROTECTED]> writes:
> I'm not sure we are going to be able to catch every path by which
> invalid data can get into the database in one release. I suspect we
> might need two or three goes at this. (I'm just wondering if the
> routines that return cstrings are a possible v
Jeff Davis wrote:
On Tue, 2007-09-11 at 11:27 +0900, Tatsuo Ishii wrote:
BTW, it strikes me that there is another hole that we need to plug in
this area, and that's the convert() function. Being able to create
a value of type text that is not in the database encoding is simply
broken. Per
Tatsuo Ishii wrote:
BTW, it strikes me that there is another hole that we need to plug in
this area, and that's the convert() function. Being able to create
a value of type text that is not in the database encoding is simply
broken. Perhaps we could make it work on bytea instead (providing
a
> On Tue, 2007-09-11 at 11:27 +0900, Tatsuo Ishii wrote:
> > > BTW, it strikes me that there is another hole that we need to plug in
> > > this area, and that's the convert() function. Being able to create
> > > a value of type text that is not in the database encoding is simply
> > > broken. Per
On Tue, 2007-09-11 at 11:27 +0900, Tatsuo Ishii wrote:
> > BTW, it strikes me that there is another hole that we need to plug in
> > this area, and that's the convert() function. Being able to create
> > a value of type text that is not in the database encoding is simply
> > broken. Perhaps we co
Neil Conway <[EMAIL PROTECTED]> writes:
> I personally find "xact" to be a less intuitive abbreviation of
> "transaction" than "txn", but for the sake of consistency, I agree it is
> probably better to use "xact_start".
Barring other objections, I'll go make this happen.
r
> Tatsuo Ishii <[EMAIL PROTECTED]> writes:
> > If you regard the unicode code point as simply a number, why not
> > regard the multibyte characters as a number too?
>
> Because there's a standard specifying the Unicode code points *as
> numbers*. The mapping from those numbers to UTF8 strings (an
Tatsuo Ishii wrote:
If you regard the unicode code point as simply a number, why not
regard the multibyte characters as a number too? I mean, since 0xC2A9
= 49833, "select chr(49833)" should work fine no?
No. The number corresponding to a given byte pattern depends on the
endianness of
Tatsuo Ishii <[EMAIL PROTECTED]> writes:
> If you regard the unicode code point as simply a number, why not
> regard the multibyte characters as a number too?
Because there's a standard specifying the Unicode code points *as
numbers*. The mapping from those numbers to UTF8 strings (and other
repr
> Tatsuo Ishii wrote:
> >
> > I don't understand whole discussion.
> >
> > Why do you think that employing the Unicode code point as the chr()
> > argument could avoid endianness issues? Are you going to represent
> > Unicode code point as UCS-4? Then you have to specify the endianness
> > anyway.
> Tatsuo Ishii wrote:
> >
> > I don't understand whole discussion.
> >
> > Why do you think that employing the Unicode code point as the chr()
> > argument could avoid endianness issues? Are you going to represent
> > Unicode code point as UCS-4? Then you have to specify the endianness
> > anyway.
On Mon, 2007-09-10 at 21:04 -0400, Tom Lane wrote:
> I have just noticed that a column "txn_start" has appeared in
> pg_stat_activity since 8.2. It's a good idea, but who chose the name?
Me.
> I'm inclined to rename it to "xact_start", which is an abbreviation
> that we *do* use in the code, and
I have just noticed that a column "txn_start" has appeared in
pg_stat_activity since 8.2. It's a good idea, but who chose the name?
We do not use that abbreviation for "transaction" anywhere else in
Postgres, certainly not in any user-exposed places.
I'm inclined to rename it to "xact_start", whi
Alvaro Herrera <[EMAIL PROTECTED]> writes:
> I am unsure if I should backpatch to 8.1: the code in cluster.c has
> changed, and while it is relatively easy to modify the patch, this is a
> rare bug and nobody has reported it in CLUSTER (not many people clusters
> temp tables, it seems). Should I p
Avery Payne wrote:
> >I thought maybe we can call it COAST, Column-oriented attribute storage
> technique, :-)
>
> I like it. :-)http://www.2ndQuadrant.com";> I
> just wish I would have read this before applying for a project name
> at pgfoundry, the current proposal is given as "pg-cstore".
Tom Lane wrote:
> Alvaro Herrera <[EMAIL PROTECTED]> writes:
> > I'm not sure I follow. Are you suggesting adding a new function,
> > similar to pg_class_ownercheck, which additionally checks for temp-ness?
>
> No, I was just suggesting adding the check for temp-ness in cluster()
> and cluster_re
>ISTM we would be able to do this fairly well if we implemented
>Index-only columns. i.e. columns that don't exist in the heap, only in
>an index.
>Taken to the extreme, all columns could be removed from the heap and
>placed in an index(es). Only the visibility information would remain on
>
Teodor Sigaev <[EMAIL PROTECTED]> writes:
>> Is anyone working on providing basic regression tests for the different
>> dictionary types? Seems like the main stumbling block is providing
> I make some small tests (http://www.sigaev.ru/misc/ispell_samples.tgz). So,
> what is better practice to b
Is anyone working on providing basic regression tests for the different
dictionary types? Seems like the main stumbling block is providing
I make some small tests (http://www.sigaev.ru/misc/ispell_samples.tgz). So,
what is better practice to builtin it? Make it installable with regular
proc
FYI, this has been committed by Tom.
---
ITAGAKI Takahiro wrote:
> The GucContext of log_autovacuum is PGC_BACKEND in the CVS HEAD,
> but should it be PGC_SIGHUP? We cannot modify the variable on-the-fly
> because the parame
On Mon, Sep 10, 2007 at 11:48:29AM -0400, Tom Lane wrote:
> BTW, I'm sure this was discussed but I forgot the conclusion: should
> chr(0) throw an error? If we're trying to get rid of embedded-null
> problems, seems it must.
It is pointed out on wikipedia that Java sometimes uses to byte pair C0
Tom Lane wrote:
"Florian G. Pflug" <[EMAIL PROTECTED]> writes:
Currently, we do not assume that either the childXids array, nor the xid
cache in the proc array are sorted by ascending xid order. I believe that
we could simplify the code, further reduce the locking requirements, and
enabled a tra
Tom Lane wrote:
Alvaro Herrera <[EMAIL PROTECTED]> writes:
Maybe we should lower the autovac naptime too, just to make it do some
more stuff (and to see if it breaks something else just because of being
running).
Well, Andrew has committed the pg_regress extension to allow buildfarm
"Florian G. Pflug" <[EMAIL PROTECTED]> writes:
> Currently, we do not assume that either the childXids array, nor
> the xid cache in the proc array are sorted by ascending xid order.
> I believe that we could simplify the code, further reduce the locking
> requirements, and enabled a transaction to
Tom Lane wrote:
OK. Looking back, there was also some mention of changing chr's
argument to bigint, but I'd counsel against doing that. We should not
need it since we only support 4-byte UTF8, hence code points only up to
21 bits (and indeed even 6-byte UTF8 can only have 31-bit code points,
Hi,
I tried to understand how ts_rank works, but I failed. What does Cover
function do? How does it work? What is the DocRepresentation data
structure like? I can see the definition of the struct, and the
get_docrep function to convert to that format, but by reading those I
can't figure out what t
Teodor Sigaev <[EMAIL PROTECTED]> writes:
>> I think Teodor's solution is wrong as it stands, because if the subquery
>> finds matches for mapcfg and maptokentype, but none of those rows
>> produce a non-null ts_lexize result, it will instead emit one row with a
>> null result, which is not what sh
Andrew Dunstan <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> BTW, I'm sure this was discussed but I forgot the conclusion: should
>> chr(0) throw an error?
> I think it should, yes.
OK. Looking back, there was also some mention of changing chr's
argument to bigint, but I'd counsel against doi
I think Teodor's solution is wrong as it stands, because if the subquery
finds matches for mapcfg and maptokentype, but none of those rows
produce a non-null ts_lexize result, it will instead emit one row with a
null result, which is not what should happen.
But concatenation with NULL will have r
"Heikki Linnakangas" <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> Uh, how will that help? AFAICS it still has to call ts_lexize with
>> every dictionary.
> No, ts_lexize is no longer in the seq scan filter, but in the sort key
> that's calculated only for those rows that match the filter 'map
Tom Lane wrote:
BTW, I'm sure this was discussed but I forgot the conclusion: should
chr(0) throw an error? If we're trying to get rid of embedded-null
problems, seems it must.
I think it should, yes.
cheers
andrew
---(end of broadcast)
Tatsuo Ishii wrote:
I don't understand whole discussion.
Why do you think that employing the Unicode code point as the chr()
argument could avoid endianness issues? Are you going to represent
Unicode code point as UCS-4? Then you have to specify the endianness
anyway. (see the UCS-4 standard
On Tue, Sep 11, 2007 at 12:30:51AM +0900, Tatsuo Ishii wrote:
> Why do you think that employing the Unicode code point as the chr()
> argument could avoid endianness issues? Are you going to represent
> Unicode code point as UCS-4? Then you have to specify the endianness
> anyway. (see the UCS-4 s
BTW, I'm sure this was discussed but I forgot the conclusion: should
chr(0) throw an error? If we're trying to get rid of embedded-null
problems, seems it must.
regards, tom lane
---(end of broadcast)---
TIP 2: Don't 'kill -
On Mon, 2007-09-10 at 10:21 -0400, Tom Lane wrote:
> Oleg Bartunov <[EMAIL PROTECTED]> writes:
> > On Mon, 10 Sep 2007, Simon Riggs wrote:
> >> Can we include that functionality now?
>
> > This could be realized very easyly using dict_strict, which returns
> > only known words, and mapping contain
> Andrew Dunstan <[EMAIL PROTECTED]> writes:
> > The reason we are prepared to make an exception for Unicode is precisely
> > because the code point maps to an encoding pattern independently of
> > architecture, ISTM.
>
> Right --- there is a well-defined standard for the numerical value of
> ea
On Sat, Sep 08, 2007 at 06:56:23PM -0400, Mark Mielke wrote:
> I think that if the case of >1 entry per hash becomes common enough to
> be significant, and the key is stored in the hash, that a btree will
> perform equal or better, and there is no point in pursuing such a hash
> index model. Th
Mark Mielke wrote:
> Simon Riggs wrote:
>> ISTM we would be able to do this fairly well if we implemented
>> Index-only columns. i.e. columns that don't exist in the heap, only in
>> an index.
>> Taken to the extreme, all columns could be removed from the heap and
>> placed in an index(es). Only t
Tom Lane wrote:
> Teodor Sigaev <[EMAIL PROTECTED]> writes:
>>> Note the Seq Scan on pg_ts_config_map, with filter on ts_lexize(mapdict,
>>> $1). That means that it will call ts_lexize on every dictionary, which
>>> will try to load every dictionary. And loading danish_stem dictionary
>>> fails in
Simon Riggs wrote:
ISTM we would be able to do this fairly well if we implemented
Index-only columns. i.e. columns that don't exist in the heap, only in
an index.
Taken to the extreme, all columns could be removed from the heap and
placed in an index(es). Only the visibility information would
Teodor Sigaev <[EMAIL PROTECTED]> writes:
>> Note the Seq Scan on pg_ts_config_map, with filter on ts_lexize(mapdict,
>> $1). That means that it will call ts_lexize on every dictionary, which
>> will try to load every dictionary. And loading danish_stem dictionary
>> fails in latin2 encoding, becau
Kenneth Marshall wrote:
On Sun, Sep 02, 2007 at 10:41:22PM -0400, Tom Lane wrote:
Kenneth Marshall <[EMAIL PROTECTED]> writes:
... This is the rough plan. Does anyone see anything critical that
is missing at this point?
Sounds pretty good. Let me brain-dump one item on you: one
More random thoughts:
- Hash-Indices are best for unique keys, but every table needs a new hash
key, which means one more random page access. Is there any way to build
multi-_table_ indices? A join might then fetch all table rows with a given
unique key after one page fetch for the combined in
On Mon, 10 Sep 2007, Tom Lane wrote:
Oleg Bartunov <[EMAIL PROTECTED]> writes:
On Mon, 10 Sep 2007, Simon Riggs wrote:
Can we include that functionality now?
This could be realized very easyly using dict_strict, which returns
only known words, and mapping contains only this dictionary. So,
Tom Lane wrote:
Andrew Dunstan <[EMAIL PROTECTED]> writes:
Perhaps we're talking at cross purposes.
The problem with doing encoding validation in scan.l is that it lacks
context. Null bytes are only the tip of the bytea iceberg, since any
arbitrary sequence of bytes can be valid
Oleg Bartunov <[EMAIL PROTECTED]> writes:
> On Mon, 10 Sep 2007, Simon Riggs wrote:
>> Can we include that functionality now?
> This could be realized very easyly using dict_strict, which returns
> only known words, and mapping contains only this dictionary. So,
> feel free to write it and submit
Andrew Dunstan <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> Those should be checked already --- if not, the right fix is still to
>> fix it there, not in per-datatype code. I think we are OK though,
>> eg see "need_transcoding" logic in copy.c.
> Well, a little experimentation shows that we c
Andrew Dunstan <[EMAIL PROTECTED]> writes:
> The reason we are prepared to make an exception for Unicode is precisely
> because the code point maps to an encoding pattern independently of
> architecture, ISTM.
Right --- there is a well-defined standard for the numerical value of
each character i
On Mon, 10 Sep 2007, Simon Riggs wrote:
On Mon, 2007-09-10 at 16:35 +0400, Oleg Bartunov wrote:
On Mon, 10 Sep 2007, Simon Riggs wrote:
On Mon, 2007-09-10 at 16:10 +0400, Oleg Bartunov wrote:
On Mon, 10 Sep 2007, Simon Riggs wrote:
It seems possible to write your own functions to support v
On Mon, 10 Sep 2007, Simon Riggs wrote:
On Mon, 2007-09-10 at 16:48 +0400, Teodor Sigaev wrote:
There are clear indications that indexing too many words is a problem
for both GIN and GIST. If people already know what they'll be looking
GIN is great, sorry if that sounded negative.
GIN doesn
Andrew Dunstan <[EMAIL PROTECTED]> writes:
> Perhaps we're talking at cross purposes.
> The problem with doing encoding validation in scan.l is that it lacks
> context. Null bytes are only the tip of the bytea iceberg, since any
> arbitrary sequence of bytes can be valid for a bytea.
If you thi
Albe Laurenz wrote:
I'd like to repeat my suggestion for chr() and ascii().
Instead of the code point, I'd prefer the actual encoding of
the character as argument to chr() and return value of ascii().
[snip]
Of course, if it is generally perceived that the code point
is more useful than
On Mon, 2007-09-10 at 16:48 +0400, Teodor Sigaev wrote:
> > There are clear indications that indexing too many words is a problem
> > for both GIN and GIST. If people already know what they'll be looking
GIN is great, sorry if that sounded negative.
> GIN doesn't depend strongly on number of word
On Mon, 2007-09-10 at 16:35 +0400, Oleg Bartunov wrote:
> On Mon, 10 Sep 2007, Simon Riggs wrote:
>
> > On Mon, 2007-09-10 at 16:10 +0400, Oleg Bartunov wrote:
> >> On Mon, 10 Sep 2007, Simon Riggs wrote:
> >>
> >>> It seems possible to write your own functions to support various
> >>> possibiliti
Tom Lane wrote:
> I don't know enough about ispell to
> understand what its config files look like. (There's a problem of
> missing documentation here, too...)
Yeah :(. The file format that ispell accepts is kind of ad hoc. It
accepts hunspell and ispell and myspell variants, but only a subset of
There are clear indications that indexing too many words is a problem
for both GIN and GIST. If people already know what they'll be looking
GIN doesn't depend strongly on number of words. It has log(N) behaviour for
numbers of words because of using B-Tree over words.
--
Teodor Sigaev
On Mon, 2007-09-10 at 16:10 +0400, Oleg Bartunov wrote:
> On Mon, 10 Sep 2007, Simon Riggs wrote:
>
> > It seems possible to write your own functions to support various
> > possibilities with text search.
> >
> > One of the more common thoughts is to have a list of words that you
> > would like to
How does that allow me to limit the number of words to a known list?
If all dictionaries returns NULL for token the this token will not be indexed at
all.
--
Teodor Sigaev E-mail: [EMAIL PROTECTED]
WWW: http:
Is anyone working on providing basic regression tests for the different
dictionary types? Seems like the main stumbling block is providing
I'll do some tests for dictionaries, but it will be synthetic dictionary.
Original ispell files is rather big, so I'll make rather simple and small one.
On Mon, 10 Sep 2007, Simon Riggs wrote:
On Mon, 2007-09-10 at 16:10 +0400, Oleg Bartunov wrote:
On Mon, 10 Sep 2007, Simon Riggs wrote:
It seems possible to write your own functions to support various
possibilities with text search.
One of the more common thoughts is to have a list of words
On Mon, 10 Sep 2007, Simon Riggs wrote:
On Mon, 2007-09-10 at 12:58 +0100, Heikki Linnakangas wrote:
Simon Riggs wrote:
It seems possible to write your own functions to support various
possibilities with text search.
One of the more common thoughts is to have a list of words that you
would li
On Mon, 2007-09-10 at 12:58 +0100, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > It seems possible to write your own functions to support various
> > possibilities with text search.
> >
> > One of the more common thoughts is to have a list of words that you
> > would like to include, i.e. the
Tom Lane wrote:
Andrew Dunstan <[EMAIL PROTECTED]> writes:
Tom Lane wrote:
In the short run it might be best to do it in scan.l after all.
I have not come up with a way of doing that and handling the bytea case.
AFAICS we have no realistic choice other than to rej
Note the Seq Scan on pg_ts_config_map, with filter on ts_lexize(mapdict,
$1). That means that it will call ts_lexize on every dictionary, which
will try to load every dictionary. And loading danish_stem dictionary
fails in latin2 encoding, because of the problem with the stopword file.
Attached
On Mon, 10 Sep 2007, Simon Riggs wrote:
It seems possible to write your own functions to support various
possibilities with text search.
One of the more common thoughts is to have a list of words that you
would like to include, i.e. the opposite of a stop word list.
There are clear indications
Simon Riggs wrote:
> It seems possible to write your own functions to support various
> possibilities with text search.
>
> One of the more common thoughts is to have a list of words that you
> would like to include, i.e. the opposite of a stop word list.
>
> There are clear indications that ind
On Fri, 2007-09-07 at 13:52 -0700, Avery Payne wrote:
> So I've been seeing/hearing all of the hoopla over vertical databases
> (column stores), and how they'll not only slice bread but also make
> toast, etc. I've done some quick searches for past articles on
> "C-Store", "Vertica", "Column S
It seems possible to write your own functions to support various
possibilities with text search.
One of the more common thoughts is to have a list of words that you
would like to include, i.e. the opposite of a stop word list.
There are clear indications that indexing too many words is a problem
--On Samstag, September 08, 2007 18:56:23 -0400 Mark Mielke
<[EMAIL PROTECTED]> wrote:
Kenneth Marshall wrote:
Along with the hypothetical performance
wins, the hash index space efficiency would be improved by a similar
factor. Obviously, all of these ideas would need to be tested in
various work
>>> I think the concern is when they use only one slash, like:
>>> E'\377\000\377'::bytea
>>> which, as I mentioned before, is not correct anyway.
>
> Wait, why would this be wrong? How would you enter the three byte bytea of
> consisting of those three bytes described above?
Either as
E'\\377\
"Tom Lane" <[EMAIL PROTECTED]> writes:
> Jeff Davis <[EMAIL PROTECTED]> writes:
>
>> I think the concern is when they use only one slash, like:
>> E'\377\000\377'::bytea
>> which, as I mentioned before, is not correct anyway.
Wait, why would this be wrong? How would you enter the three byte by
Tom Lane wrote:
>> . for chr() under UTF8, it seems to be generally agreed
>> that the argument should represent the codepoint and the
>> function should return the correspondingly encoded character.
>> If so, possible the argument should be a bigint to
>> accommodate the full range of possible cod
83 matches
Mail list logo