Andrew Dunstan <[EMAIL PROTECTED]> writes:
> One issue I notice is that it mangles the log message to add a tab
> character before each newline. We do this in standard text logs to make
> them more readable for humans. but the whole point of having CSV logs is
> to make them machine readable, an
> From: [EMAIL PROTECTED]> To: [EMAIL PROTECTED]> CC:
> pgsql-hackers@postgresql.org> Subject: Re: [HACKERS] To all the pgsql
> developers..Have a look at the operators proposed by me in my researc> > On
> Sat, Jun 02, 2007 at 01:37:19PM +, Tasneem Memon wrote:> > We can make
> the syste
> From: [EMAIL PROTECTED]> To: pgsql-hackers@postgresql.org> Tasneem,> > > >
> The margins to the op2, i.e. m1 and m2, are added dynamically on > > > both
> the sides, considering the value it contains. To keep this > > > margin big
> is important for a certain reason discussed later.> > >
Andrew Dunstan wrote:
Now that we've fixed the partial/interleaved log line issue, I have
returned to trying toi get the CSV log patch into shape. Sadly, it still
needs lots of work, even after Greg Smith and I both attacked it, so I
am now going through it with a fine tooth comb.
One issue
Now that we've fixed the partial/interleaved log line issue, I have
returned to trying toi get the CSV log patch into shape. Sadly, it still
needs lots of work, even after Greg Smith and I both attacked it, so I
am now going through it with a fine tooth comb.
One issue I notice is that it ma
"Tom Lane" <[EMAIL PROTECTED]> writes:
> Heikki Linnakangas <[EMAIL PROTECTED]> writes:
>> In utils/adt/tid.c, there's two mysterious functions with no comments,
>> and no-one calling them inside backend code AFAICT: currtid_byreloid and
>> currtid_byrelname. What do they do/did?
>
> IIRC, the O
Heikki Linnakangas <[EMAIL PROTECTED]> writes:
> In utils/adt/tid.c, there's two mysterious functions with no comments,
> and no-one calling them inside backend code AFAICT: currtid_byreloid and
> currtid_byrelname. What do they do/did?
IIRC, the ODBC driver uses them, or once did, to deal with
"Heikki Linnakangas" <[EMAIL PROTECTED]> writes:
> In utils/adt/tid.c, there's two mysterious functions with no comments, and
> no-one calling them inside backend code AFAICT: currtid_byreloid and
> currtid_byrelname. What do they do/did?
The comments for heap_get_latest_tid() seem to apply. The
Heikki Linnakangas wrote:
In utils/adt/tid.c, there's two mysterious functions with no comments,
and no-one calling them inside backend code AFAICT: currtid_byreloid
and currtid_byrelname. What do they do/did?
If you have a look at the CVS annotations (
http://developer.postgresql.org/cvs
If index lookup speed or packing truly was the primary concern, people
would use a suitably sized SEQUENCE. They would not use UUID.
I believe the last time I calculated this, the result was that you
could fit 50% more entries in the index if you use a 32-bit sequence
number instead of a 128-bit U
On Jun 14, 2007, at 7:21 AM, Heikki Linnakangas wrote:
We have these GUC variables that define a fraction of something:
#autovacuum_vacuum_scale_factor = 0.2 # fraction of rel size before
# vacuum
#autovacuum_analyze_scale_factor = 0.1 # fraction of rel
In utils/adt/tid.c, there's two mysterious functions with no comments,
and no-one calling them inside backend code AFAICT: currtid_byreloid and
currtid_byrelname. What do they do/did?
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
---(end of broadc
On Fri, Jun 15, 2007 at 11:05:01AM -0400, Robert Wojciechowski wrote:
> Also, treating UUIDs as time based is completely valid -- that is the
> point of version 1 UUIDs. They have quite a few advantages over random UUIDs.
It's a leap from extracting the UUID as time, to sorting by UUID for
results
I had the same problem so I tried building with increasingly older versions of
the MinGW runtime. It turns out version 3.9 is the more recent version without
the conflict in sys/time.h.
Looking for a
On Fri, Jun 15, 2007 at 09:40:29AM -0500, Michael Glaesemann wrote:
> On Jun 14, 2007, at 19:04 , [EMAIL PROTECTED] wrote:
> >For UUID, I
> >would value random access before sequential performance. Why would
> >anybody scan UUID through the index in "sequential" order?
> AIUI, to allow UUID columns
On Fri, 15 Jun 2007 22:28:34 +0200, Gregory Maxwell <[EMAIL PROTECTED]>
wrote:
On 6/15/07, Gregory Stark <[EMAIL PROTECTED]> wrote:
While in theory spreading out the writes could have a detrimental
effect I
think we should wait until we see actual numbers. I have a pretty strong
suspicion t
Stephen Frost <[EMAIL PROTECTED]> writes:
> Any chance of this being increased?
No. Changing typmod to something other than int32 would require many
thousands of lines of diffs just in the core distro. I don't even want
to think about how much outside code would break.
r
On 6/15/07, Gregory Stark <[EMAIL PROTECTED]> wrote:
While in theory spreading out the writes could have a detrimental effect I
think we should wait until we see actual numbers. I have a pretty strong
suspicion that the effect would be pretty minimal. We're still doing the same
amount of i/o tota
"Greg Smith" <[EMAIL PROTECTED]> writes:
> On Fri, 15 Jun 2007, Gregory Stark wrote:
>
>> If I understand it right Greg Smith's concern is that in a busier system
>> where even *with* the load distributed checkpoint the i/o bandwidth demand
>> during t he checkpoint was *still* being pushed over 1
To support this sanely though wouldn't you need to know which language rule a
tsvector was generated with? Like, have a byte in the tsvector tagging it with
the language rule forever more?
No. As corner case, dictionary might return just a number or a hash value.
What I'm wondering about is
"Teodor Sigaev" <[EMAIL PROTECTED]> writes:
>> Hm, are you trying to say that it's sane to have different tsvectors in
>> a column computed under different language settings? Maybe we're all
>
> Yes, I think so.
>
> That might have sense for close languages. Norwegian languages has two
> dialect
I propose changing the typmodin signature to "typmodin(cstring[]) returns
int4", that is, the typmods will be passed as strings not integers. This
will incur a bit of extra conversion overhead for the normal uses where
the typmods are integers, but I think the gain in flexibility is worth
agree
Michael Paesold wrote:
Heikki Linnakangas wrote:
Here's an updated WIP version of the LDC patch. I just spreads the
writes, that achieves the goal of smoothing the checkpoint I/O spikes.
I think sorting the writes etc. is interesting but falls in the
category of further development and should
Gregory Stark wrote:
"Heikki Linnakangas" <[EMAIL PROTECTED]> writes:
Now that the checkpoints are spread out more, the response times are very
smooth.
So obviously the reason the results are so dramatic is that the checkpoints
used to push the i/o bandwidth demand up over 100%. By spreading i
Am Freitag, 15. Juni 2007 18:14 schrieb Tom Lane:
> The current discussion about the tsearch-in-core patch has convinced me
> that there are plausible use-cases for typmod values that aren't simple
> integers. For instance it could be sane for a type to want a locale or
> language selection as a t
* Tom Lane ([EMAIL PROTECTED]) wrote:
> Stephen Frost <[EMAIL PROTECTED]> writes:
> > Would this allow for 'multi-value' typmods for user-defined types?
>
> If you can squeeze them into 31 bits of stored typmod, yes. That
> may mean that you still need the side table (with stored typmod being a
>
Is it worth providing an ArrayGetStringTypmods in core, when it won't
be used by any existing core datatypes?
I don't think so - cstring[] is a set of strings itself. I don't believe that we
could suggest something commonly useful without some real-world examples.
--
Teodor Sigaev
On Fri, 15 Jun 2007, Gregory Stark wrote:
If I understand it right Greg Smith's concern is that in a busier system
where even *with* the load distributed checkpoint the i/o bandwidth
demand during t he checkpoint was *still* being pushed over 100% then
spreading out the load would only exacerb
Stephen Frost <[EMAIL PROTECTED]> writes:
> Would this allow for 'multi-value' typmods for user-defined types?
If you can squeeze them into 31 bits of stored typmod, yes. That
may mean that you still need the side table (with stored typmod being a
lookup key for the table). But this gets you out
Teodor Sigaev <[EMAIL PROTECTED]> writes:
>> I propose changing the typmodin signature to "typmodin(cstring[]) returns
>> int4", that is, the typmods will be passed as strings not integers.
> And modify ArrayGetTypmods() to ArrayGetIntegerTypmods()
Right --- the decoding work will only have to ha
On Fri, 15 Jun 2007, Umar Farooq wrote:
Surprisingly, no matter what type of query I execute, when I use strace
to monitor the system calls generated they turn out to be the same for
ALL sorts of queries.
How are you calling strace? The master postgres progress forks off new
processes for e
* Tom Lane ([EMAIL PROTECTED]) wrote:
> I propose changing the typmodin signature to "typmodin(cstring[]) returns
> int4", that is, the typmods will be passed as strings not integers. This
> will incur a bit of extra conversion overhead for the normal uses where
> the typmods are integers, but I t
Hello All,Recently, I have been involved in some work that requires me to
monitor low level performance counters for pgsql. Specifically, when I execute
a particular query I want to be able to tell how many system calls get executed
on behalf of that query and time of each sys call. The idea is
So, added to my plan
(http://archives.postgresql.org/pgsql-hackers/2007-06/msg00618.php)
n) single encoded files. That will touch snowball, ispell, synonym, thesaurus
and simple dictionaries
n+1) use encoding names instead of locale's names in configuration
Tom Lane wrote:
Teodor Sigaev <[EM
Sure. I'm just assuming that the set of stopwords doesn't need to vary
depending on the encoding you're using for a language --- that is, if
you're willing to convert the encoding then the same stopword list file
should serve for all encodings of a given language. Do you think this
might be w
Teodor Sigaev <[EMAIL PROTECTED]> writes:
> But configuration for different languages might be differ, for example
> russian (and any cyrillic-based) configuration is differ from
> west-european configuration based on different character sets.
Sure. I'm just assuming that the set of stopwords doe
One possibility is that the user-visible specification is just a name
(eg, "english"), but the actual filename out on the filesystem is,
say, name.encoding.stop (eg, "english.utf8.stop") where we use PG's
names for the encodings. We could just fail if there's not a file
matching the database enco
On Fri, Jun 15, 2007 at 12:14:45PM -0400, Tom Lane wrote:
[snip]
> I propose changing the typmodin signature to "typmodin(cstring[])
> returns int4", that is, the typmods will be passed as strings not
> integers. This will incur a bit of extra conversion overhead for
> the normal uses where the
The current discussion about the tsearch-in-core patch has convinced me
that there are plausible use-cases for typmod values that aren't simple
integers. For instance it could be sane for a type to want a locale or
language selection as a typmod, eg tsvector('ru') or tsvector('sv').
(I'm not sayin
Hm, are you trying to say that it's sane to have different tsvectors in
a column computed under different language settings? Maybe we're all
Yes, I think so.
That might have sense for close languages. Norwegian languages has two dialects
and one of them has advanced rules for compound words,
On Fri, 2007-06-15 at 10:36 -0400, Tom Lane wrote:
> "Simon Riggs" <[EMAIL PROTECTED]> writes:
> > Although I'm happy to see tsearch finally hit the big time, I'm a bit
> > disappointed to see so many new datatype-specific SQL commands created.
>
> Per subsequent discussion we are down to just one
Teodor Sigaev <[EMAIL PROTECTED]> writes:
> Hmm. You mean to use language name in configuration, use current encoding to
> define which dictionary should be used (stemmers for the same language are
> different for different encoding) and recode dictionaries file from UTF8 to
> current locale. Did
Teodor Sigaev <[EMAIL PROTECTED]> writes:
>> It's the to_tsvector calls
>> that built the tsvector heap column that have a locale specified or
>> implicit. We need some way of annotating the heap column about this.
> It seems too restrictive to advanced users.
Hm, are you trying to say that it's
It's not really the index's problem; IIUC the behavior of the gist and
gin index opclasses is not locale-specific.
Right
It's the to_tsvector calls
that built the tsvector heap column that have a locale specified or
implicit. We need some way of annotating the heap column about this.
It se
The only reason the TS stuff needs an encoding spec is to figure out how
to read an external stop word file. I think my suggestion upthread is a
lot better: have just one stop word file per language, store them all in
UTF8, and convert to database encoding when loading them. The database
Hmm.
> I suggest that treating the UUID as anything other than a unique
> random value is a mistake. There should be no assumptions by users
> with regard to how the order is displayed.
You can always use random UUIDs -- that's a choice in UUID generation.
When dealing with random UUIDs you also (by t
On Friday 15 June 2007 00:46, Oleg Bartunov wrote:
> On Thu, 14 Jun 2007, Tom Lane wrote:
> > [ thinks some more... ] If we revived the GENERATED AS patch,
> > you could imagine computing tsvector columns via "GENERATED AS
> > to_tsvector('english'::regconfig, big_text_col)" instead of a
> > trigg
On Thursday 14 June 2007 15:10, Teodor Sigaev wrote:
> That changes are doable for several days. I'd like to make changes together
> with replacing of FULLTEXT keyword to TEXT SEARCH as you suggested.
AIUI the discussion on this change took place off list? Can we get a preview
of what the comman
Gregory Stark <[EMAIL PROTECTED]> writes:
> "Tom Lane" <[EMAIL PROTECTED]> writes:
>> It's not really the index's problem; IIUC the behavior of the gist and
>> gin index opclasses is not locale-specific. It's the to_tsvector calls
>> that built the tsvector heap column that have a locale specified
"Tom Lane" <[EMAIL PROTECTED]> writes:
> It's not really the index's problem; IIUC the behavior of the gist and
> gin index opclasses is not locale-specific. It's the to_tsvector calls
> that built the tsvector heap column that have a locale specified or
> implicit. We need some way of annotati
Teodor Sigaev <[EMAIL PROTECTED]> writes:
>> I'd suggest allowing either full names ("swedish") or the standard
>> two-letter abbreviations ("sv"). But let's stay away from locale names.
> We can use database's encoding name (the same names used in initdb -E)
AFAICS the encoding name shouldn't b
Teodor Sigaev <[EMAIL PROTECTED]> writes:
>> My guess right now is that we use a GUC that will default if a
>> pg_catalog configuration name matches the lc_ctype locale name, and we
>> have to throw an error if an accessed index creation GUC doesn't match
>> the current GUC.
> Where will index sto
I'd suggest allowing either full names ("swedish") or the standard
two-letter abbreviations ("sv"). But let's stay away from locale names.
We can use database's encoding name (the same names used in initdb -E)
--
Teodor Sigaev E-mail: [EMAIL PROTECTED]
1) Require the configuration to be always specified. The problem with
this is that casting (::tsquery) and operators (@@) have no way to
specify a configuration.
it's not comfortable for most often cases
2) Use a GUC that you can set for the configuration, and perhaps
default it if possible
Bruce Momjian <[EMAIL PROTECTED]> writes:
> Do locale names vary across operating systems?
Yes, which is the fatal flaw in the whole thing. The ru_RU part is
reasonably well standardized, but the encoding part is not. Considering
that encoding is exactly the part of it we don't care about for th
> > When done that way, you're going to see a lot of index B-tree
> > fragmentation with even DCE 1.1 (ISO/IEC 11578:1996) time based
UUIDs,
> > as described above. With random (version 4) or hashed based (version
3
> > or 5) UUIDs there's nothing that can be done to improve the
situation,
> > obvi
Go ahead and make the changes you want, and then I'll work on this.
So, I'm planing on this weekend:
1) rename FULLTEXT to TEXT SEARCH in SQL command
2) rework Snowball stemmer's as Tom suggested
3) ALTER FULLTEXT CONFIGURATION cfgname ADD/ALTER/DROP MAPPING
4) remove support of default configur
On Jun 14, 2007, at 19:04 , [EMAIL PROTECTED] wrote:
For UUID, I
would value random access before sequential performance. Why would
anybody scan UUID through the index in "sequential" order?
AIUI, to allow UUID columns to be indexed using BTREE, there needs to
be some ordering defined. So r
"Simon Riggs" <[EMAIL PROTECTED]> writes:
> Although I'm happy to see tsearch finally hit the big time, I'm a bit
> disappointed to see so many new datatype-specific SQL commands created.
Per subsequent discussion we are down to just one new set of commands,
CREATE/ALTER/DROP TEXT SEARCH CONFIGURA
Bruce Momjian wrote:
> My guess right now is that we use a GUC that will default if a
> pg_catalog configuration name matches the lc_ctype locale name, and we
> have to throw an error if an accessed index creation GUC doesn't match
> the current GUC.
>
> So we create a pg_catalog full text configu
Tom Lane wrote:
> Bruce Momjian <[EMAIL PROTECTED]> writes:
> > First, why are we specifying the server locale here since it never
> > changes:
>
> It's poorly described. What it should really say is the language
> that the text-to-be-searched is in. We can actually support multiple
> languages
danish, dutch, finnish, french, german, hungarian, italian, norwegian,
portuguese, spanish, swedish, russin and english
Albe Laurenz wrote:
Tom Lane wrote:
Teodor Sigaev <[EMAIL PROTECTED]> writes:
So, it's needed to change dictinitoption format of snowball
dictionaries to
point both stop-w
Tom Lane wrote:
> Teodor Sigaev <[EMAIL PROTECTED]> writes:
>> So, it's needed to change dictinitoption format of snowball
dictionaries to
>> point both stop-word file and language's name.
>
> Right.
Is there any chance to get support for other languages than English and
Russian into the tsearch
Teodor Sigaev <[EMAIL PROTECTED]> writes:
> I splited stemmers to two sets because of regression test. As I
> remember, there was some problems with loadable conversions and
> configure's flag --disable-shared
I'm not worried about supporting --disable-shared installations very
much. They didn't
"Heikki Linnakangas" <[EMAIL PROTECTED]> writes:
> I ran another series of tests, with a less aggressive bgwriter_delay setting,
> which also affects the minimum rate of the writes in the WIP patch I used.
>
> Now that the checkpoints are spread out more, the response times are very
> smooth.
So
Heikki Linnakangas wrote:
Here's an updated WIP version of the LDC patch. I just spreads the
writes, that achieves the goal of smoothing the checkpoint I/O spikes. I
think sorting the writes etc. is interesting but falls in the category
of further development and should be pushed to 8.4.
Why
On Fri, 2007-06-15 at 18:33 +0900, ITAGAKI Takahiro wrote:
> "Simon Riggs" <[EMAIL PROTECTED]> wrote:
>
> > > tests| pgbench | DBT-2 response time (avg/90%/max)
> > > ---+-+---
> > > LDC only |
Probably, having default text search configuration is not a good idea
and we could just require it as a mandatory parameter, which could
eliminate many confusion with selecting text search configuration.
Ugh. Having default configuration (by locale or by postgresql.conf or some other
way) simplif
I've done some more work on this point. After looking at the Snowball
code in more detail, I'm thinking it'd be a good idea to keep it at
arm's length in a loadable shared library, instead of incorporating it
I splited stemmers to two sets because of regression test. As I remember, there
was s
"Simon Riggs" <[EMAIL PROTECTED]> wrote:
> > tests| pgbench | DBT-2 response time (avg/90%/max)
> > ---+-+---
> > LDC only | 181 tps | 1.12 / 4.38 / 12.13 s
> > + BM_CHECKPOINT_NEEDED(*) | 187
> > tests| pgbench | DBT-2 response time
> (avg/90%/max)
> >
> ---+-+
> > ---+-+---
> > LDC only | 181 tps | 1.12 / 4.38 / 12.13 s
> > + BM_CHECKPOINT_NEEDED(*
On Wed, 2007-06-13 at 18:06 -0400, Bruce Momjian wrote:
> You bring up a very good point. There are fifteen new commands being
> added for full text indexing:
>
> alter-fulltext-config.sgml alter-fulltext-owner.sgml
> create-fulltext-dict.sgml drop-fulltext-dict.sgml
>
Heikki Linnakangas wrote:
Here's results from a batch of test runs with LDC. This patch only
spreads out the writes, fsyncs work as before. This patch also includes
the optimization that we don't write buffers that were dirtied after
starting the checkpoint.
http://community.enterprisedb.com/
73 matches
Mail list logo