Re: [HACKERS] text search: restricting the number of parsed words in headline generation

2012-08-15 Thread Sushant Sinha
; On Tue, Aug 23, 2011 at 10:31:42PM -0400, Tom Lane wrote: > >> Sushant Sinha writes: > >>> Doesn't this force the headline to be taken from the first N words of > >>> the document, independent of where the match was? That seems rather > >>> unworkable,

Re: [HACKERS] TS: Limited cover density ranking

2012-01-27 Thread Sushant Sinha
The rank counts 1/coversize. So bigger covers will not have much impact anyway. What is the need of the patch? -Sushant. On Fri, 2012-01-27 at 18:06 +0200, karave...@mail.bg wrote: > Hello, > > I have developed a variation of cover density ranking functions that > counts only covers that are le

Re: [HACKERS] Postgres 9.1: Adding rows to table causing too much latency in other queries

2011-12-19 Thread Sushant Sinha
On Mon, 2011-12-19 at 12:41 -0300, Euler Taveira de Oliveira wrote: > On 19-12-2011 12:30, Sushant Sinha wrote: > > I recently upgraded my postgres server from 9.0 to 9.1.2 and I am > > finding a peculiar problem.I have a program that periodically adds > rows > > to

Re: [HACKERS] Postgres 9.1: Adding rows to table causing too much latency in other queries

2011-12-19 Thread Sushant Sinha
On Mon, 2011-12-19 at 19:08 +0200, Marti Raudsepp wrote: > Another thought -- have you read about the GIN "fast updates" feature? > This existed in 9.0 too. Instead of updating the index directly, GIN > appends all changes to a sequential list, which needs to be scanned in > whole for read queries.

[HACKERS] Postgres 9.1: Adding rows to table causing too much latency in other queries

2011-12-19 Thread Sushant Sinha
I recently upgraded my postgres server from 9.0 to 9.1.2 and I am finding a peculiar problem.I have a program that periodically adds rows to this table using INSERT. Typically the number of rows is just 1-2 thousand when the table already has 500K rows. Whenever the program is adding rows, the perf

Re: [HACKERS] lexemes in prefix search going through dictionary modifications

2011-11-08 Thread Sushant Sinha
. -Sushant. On Tue, 2011-10-25 at 23:45 +0530, Sushant Sinha wrote: > On Tue, 2011-10-25 at 19:27 +0200, Florian Pflug wrote: > > > Assume, for example, that the postgres mailing list archive search used > > tsearch (which I think it does, but I'm not sure). It'd then

Re: [HACKERS] a tsearch issue

2011-11-06 Thread Sushant Sinha
On Fri, 2011-11-04 at 11:22 +0100, Pavel Stehule wrote: > Hello > > I found a interesting issue when I checked a tsearch prefix searching. > > We use a ispell based dictionary > > CREATE TEXT SEARCH DICTIONARY cspell >(template=ispell, dictfile = czech, afffile=czech, stopwords=czech); > CRE

Re: [HACKERS] lexemes in prefix search going through dictionary modifications

2011-10-25 Thread Sushant Sinha
On Tue, 2011-10-25 at 19:27 +0200, Florian Pflug wrote: > Assume, for example, that the postgres mailing list archive search used > tsearch (which I think it does, but I'm not sure). It'd then probably make > sense to add "postgres" to the list of stopwords, because it's bound to > appear in near

Re: [HACKERS] lexemes in prefix search going through dictionary modifications

2011-10-25 Thread Sushant Sinha
On Tue, 2011-10-25 at 18:05 +0200, Florian Pflug wrote: > On Oct25, 2011, at 17:26 , Sushant Sinha wrote: > > I am currently using the prefix search feature in text search. I find > > that the prefix characters are treated the same as a normal lexeme and > > passed through

[HACKERS] lexemes in prefix search going through dictionary modifications

2011-10-25 Thread Sushant Sinha
I am currently using the prefix search feature in text search. I find that the prefix characters are treated the same as a normal lexeme and passed through stemming and stopword dictionaries. This seems like a bug to me. db=# select to_tsquery('english', 's:*'); NOTICE: text-search query contain

Re: [HACKERS] text search: restricting the number of parsed words in headline generation

2011-08-23 Thread Sushant Sinha
> > Actually, this code seems probably flat-out wrong: won't every > successful call of hlCover() on a given document return exactly the same > q value (end position), namely the last token occurrence in the > document? How is that helpful? > >regards, tom lane > There is

Re: [HACKERS] text search: restricting the number of parsed words in headline generation

2011-08-23 Thread Sushant Sinha
> > Here is a simple patch that limits the number of words during the > > tokenization phase and puts an upper-bound on the headline generation. > > Doesn't this force the headline to be taken from the first N words of > the document, independent of where the match was? That seems rather > unwor

[HACKERS] text search: restricting the number of parsed words in headline generation

2011-08-23 Thread Sushant Sinha
Given a document and a query, the goal of headline generation is to produce text excerpts in which the query appears. Currently the headline generation in postgres follows the following steps: 1. Tokenize the documents and obtain the lexemes 2. Decide on lexemes that should be the part of the head

Re: [HACKERS] PL/Python: No stack trace for an exception

2011-07-21 Thread Sushant Sinha
On Thu, 2011-07-21 at 15:31 +0200, Jan Urbański wrote: > On 21/07/11 15:27, Sushant Sinha wrote: > > I am using plpythonu on postgres 9.0.2. One of my python functions was > > throwing a TypeError exception. However, I only see the exception in the > > database and not the st

[HACKERS] PL/Python: No stack trace for an exception

2011-07-21 Thread Sushant Sinha
I am using plpythonu on postgres 9.0.2. One of my python functions was throwing a TypeError exception. However, I only see the exception in the database and not the stack trace. It becomes difficult to debug if the stack trace is absent in Python. logdb=# select get_words(forminput) from fi;

[HACKERS] pg_trgm: unicode string not working

2011-06-12 Thread Sushant Sinha
I am using pg_trgm for spelling correction as prescribed in the documentation. But I see that it does not work for unicode sring. The database was initialized with utf8 encoding and the C locale. Here is the table: \d words Table "public.words" Column | Type | Modifiers +---

Re: [HACKERS] tsearch Parser Hacking

2011-02-14 Thread Sushant Sinha
I agree that it will be a good idea to rewrite the entire thing. However, in the mean time, I sent a proposal earlier http://archives.postgresql.org/pgsql-hackers/2010-08/msg00019.php And a patch later: http://archives.postgresql.org/pgsql-hackers/2010-09/msg00476.php Tom asked me to look into

Re: [HACKERS] english parser in text search: support for multiple words in the same position

2011-01-06 Thread Sushant Sinha
Do not know if this mail got lost in between or no one noticed it! On Thu, 2010-12-23 at 11:05 +0530, Sushant Sinha wrote: Just a reminder that this patch is discussing how to break url, emails etc into its components. > > On Mon, Oct 4, 2010 at 3:54 AM, Tom Lane wrote: > [

Re: [HACKERS] english parser in text search: support for multiple words in the same position

2010-12-22 Thread Sushant Sinha
Just a reminder that this patch is discussing how to break url, emails etc into its components. On Mon, Oct 4, 2010 at 3:54 AM, Tom Lane wrote: > [ sorry for not responding on this sooner, it's been hectic the last > couple weeks ] > > Sushant Sinha writes: > >> I

Re: [HACKERS] bug in ts_rank_cd

2010-12-22 Thread Sushant Sinha
Sorry for sounding the false alarm. I was not running the vanilla postgres and that is why I was seeing that problem. Should have checked with the vanilla one. -Sushant On Tue, 2010-12-21 at 23:03 -0500, Tom Lane wrote: > Sushant Sinha writes: > > There is a bug in ts_rank_cd. It

[HACKERS] bug in ts_rank_cd

2010-12-21 Thread Sushant Sinha
MY PREV EMAIL HAD A PROBLEM. Please reply to this one == There is a bug in ts_rank_cd. It does not correctly give rank when the query lexeme is the first one in the tsvector. Example: select ts_rank_cd(to_tsvector('english', 'abc sdd'), plainto

[HACKERS] bug in ts_rank_cd

2010-12-21 Thread Sushant Sinha
There is a bug in ts_rank_cd. It does not correctly give rank when the query lexeme is the first one in the tsvector. Example: select ts_rank_cd(to_tsvector('english', 'abc sdd'), plainto_tsquery('english', 'abc')); ts_rank_cd 0 select ts_rank_cd(to_tsvector('english'

Re: [HACKERS] planner row-estimates for tsvector seems horribly wrong

2010-10-24 Thread Sushant Sinha
ski wrote: > On 24/10/10 14:44, Sushant Sinha wrote: > > I am using gin index on a tsvector and doing basic search. I see the > > row-estimate of the planner to be horribly wrong. It is returning > > row-estimate as 4843 for all queries whether it matches zero rows, a > >

[HACKERS] planner row-estimates for tsvector seems horribly wrong

2010-10-24 Thread Sushant Sinha
I am using gin index on a tsvector and doing basic search. I see the row-estimate of the planner to be horribly wrong. It is returning row-estimate as 4843 for all queries whether it matches zero rows, a medium number of rows (88,000) or a large number of rows (726,000). The table has roughly a mi

Re: [HACKERS] Re: [GENERAL] Text search parser's treatment of URLs and emails

2010-10-12 Thread Sushant Sinha
On Tue, 2010-10-12 at 19:31 -0400, Tom Lane wrote: > This seems much of a piece with the existing proposal to allow > individual "words" of a URL to be reported separately: > https://commitfest.postgresql.org/action/patch_view?id=378 > > As I said in that thread, this could be done in a backwards

Re: [HACKERS] english parser in text search: support for multiple words in the same position

2010-09-28 Thread Sushant Sinha
Any updates on this? On Tue, Sep 21, 2010 at 10:47 PM, Sushant Sinha wrote: > > I looked at this patch a bit. I'm fairly unhappy that it seems to be > > inventing a brand new mechanism to do something the ts parser can > > already do. Why didn't you code the

Re: [HACKERS] Configuring Text Search parser?

2010-09-21 Thread Sushant Sinha
Your changes are somewhat fine. It will get you tokens with "_" characters in it. However, it is not nice to mix your new token with existing token like NUMWORD. Give a new name to your new type of token .. probably UnderscoreWord. Then on seeing "_", move to a state that can identify the new token

Re: [HACKERS] english parser in text search: support for multiple words in the same position

2010-09-21 Thread Sushant Sinha
> I looked at this patch a bit. I'm fairly unhappy that it seems to be > inventing a brand new mechanism to do something the ts parser can > already do. Why didn't you code the url-part mechanism using the > existing support for compound words? I am not familiar with compound word implementatio

Re: [HACKERS] english parser in text search: support for multiple words in the same position

2010-09-08 Thread Sushant Sinha
For the headline generation to work properly, email/file/url/host need to become skip tokens. Updating the patch with that change. -Sushant. On Sat, 2010-09-04 at 13:25 +0530, Sushant Sinha wrote: > Updating the patch with emitting parttoken and registering it with > snowball

Re: [HACKERS] english parser in text search: support for multiple words in the same position

2010-09-04 Thread Sushant Sinha
Updating the patch with emitting parttoken and registering it with snowball config. -Sushant. On Fri, 2010-09-03 at 09:44 -0400, Robert Haas wrote: > On Wed, Sep 1, 2010 at 2:42 AM, Sushant Sinha wrote: > > I have attached a patch that emits parts of a host token, a url token, >

Re: [HACKERS] english parser in text search: support for multiple words in the same position

2010-08-31 Thread Sushant Sinha
complicate the patch with that, I wanted to get feedback on any other major problem with the patch. -Sushant. On Mon, 2010-08-02 at 10:20 -0400, Tom Lane wrote: > Sushant Sinha writes: > >> This would needlessly increase the number of tokens. Instead you'd > >> better make

Re: [HACKERS] english parser in text search: support for multiple words in the same position

2010-08-02 Thread Sushant Sinha
On Mon, 2010-08-02 at 09:32 -0400, Robert Haas wrote: > On Mon, Aug 2, 2010 at 9:12 AM, Sushant Sinha wrote: > > The current text parser already returns url and url_path. That already > > increases the number of unique tokens. I am only asking for adding of > > normal eng

Re: [HACKERS] english parser in text search: support for multiple words in the same position

2010-08-02 Thread Sushant Sinha
> On 08/01/2010 08:04 PM, Sushant Sinha wrote: > > 1. We do not have separate tokens "wikipedia" and "org" > > 2. If we have the two tokens we should have them at adjacent position so > > that a phrase search for "wikipedia org" should work. >

[HACKERS] english parser in text search: support for multiple words in the same position

2010-08-01 Thread Sushant Sinha
Currently the english parser in text search does not support multiple words in the same position. Consider a word "wikipedia.org". The text search would return a single token "wikipedia.org". However if someone searches for "wikipedia org" then there will not be a match. There are two problems here

[HACKERS] lexeme ordering in tsvector

2009-11-30 Thread Sushant Sinha
It seems like the ordering of lexemes in tsvector has changed from 8.3 to 8.4. For example in 8.3.1, postgres=# select to_tsvector('english', 'quit everytime'); to_tsvector --- 'quit':1 'everytim':2 The lexemes are arranged by length and then by string comparison

Re: [HACKERS] Very bad FTS performance with the Polish config

2009-11-18 Thread Sushant Sinha
ts_headline calls ts_lexize equivalent to break the text. Off course there is algorithm to process the tokens and generate the headline. I would be really surprised if the algorithm to generate the headline is somehow dependent on language (as it only processes the tokens). So Oleg is right when he

Re: [HACKERS] It's June 1; do you know where your release is?

2009-06-02 Thread Sushant Sinha
On Tue, 2009-06-02 at 17:26 -0700, Josh Berkus wrote: > > * possible bug in cover density ranking? > > -- From Teodor's response, this is maybe a doc patch and not a code > patch. Teodor? Oleg? I personally think that this is a bug, because we are assigning very high rank when we are no

Re: [HACKERS] dot to be considered as a word delimiter?

2009-06-02 Thread Sushant Sinha
Thanks, Sushant. On Tue, Jun 2, 2009 at 8:47 AM, Kenneth Marshall wrote: > On Mon, Jun 01, 2009 at 08:22:23PM -0500, Kevin Grittner wrote: > > Sushant Sinha wrote: > > > > > I think that dot should be considered by as a word delimiter because > > > when dot is not

[HACKERS] dot to be considered as a word delimiter?

2009-05-29 Thread Sushant Sinha
Currently it seems like that dot is not considered as a word delimiter by the english parser. lawdb=# select to_tsvector('english', 'Mr.J.Sai Deepak'); to_tsvector - 'deepak':2 'mr.j.sai':1 (1 row) So the word obtained is "mr.j.sai" rather than three words "

Re: [HACKERS] possible bug in cover density ranking?

2009-05-01 Thread Sushant Sinha
I see this as open items here http://wiki.postgresql.org/wiki/PostgreSQL_8.4_Open_Items Any interest in fixing this? -Sushant. On Thu, 2009-01-29 at 13:54 -0500, Sushant Sinha wrote: > > > On Thu, Jan 29, 2009 at 12:38 PM, Teodor Sigaev > wrote: > Is this what

Re: [HACKERS] [GENERAL] Fragments in tsearch2 headline

2009-04-13 Thread Sushant Sinha
:57 -0400, Tom Lane wrote: > Sushant Sinha writes: > > Sorry for the delay. Here is the patch with FragmentDelimiter option. > > It requires an extra option in HeadlineParsedText and uses that option > > during generateHeadline. > > I did some editing of the docum

Re: [HACKERS] patch for space around the FragmentDelimiter

2009-03-01 Thread Sushant Sinha
yeah you are right. I did not know that you can pass space using double quotes. -Sushant. On Sun, 2009-03-01 at 20:49 -0500, Tom Lane wrote: > Sushant Sinha writes: > > FragmentDelimiter is an argument for ts_headline function to separates > > different headline fragments. The de

[HACKERS] patch for space around the FragmentDelimiter

2009-03-01 Thread Sushant Sinha
FragmentDelimiter is an argument for ts_headline function to separates different headline fragments. The default delimiter is " ... ". Currently if someone specifies the delimiter as an option to the function, no extra space is added around the delimiter. However, it does not look good without spac

Re: [HACKERS] Ellipses around result fragment of ts_headline

2009-02-14 Thread Sushant Sinha
Sorry ... I thought you were running the development branch. -Sushant. On Sat, 2009-02-14 at 16:34 -0500, Tom Lane wrote: > Sushant Sinha writes: > > I think we currently do that. > > ... since about four months ago. > > 2008-10-17 14:05 teodor > > * doc/sr

Re: [HACKERS] Ellipses around result fragment of ts_headline

2009-02-14 Thread Sushant Sinha
gt; to '...' || _headline(v1.copy, query, 'MinWords = 17') || '...', but as you > can clearly see this would always occur, and not be intelligent regarding > the fragments. I hope that you're correct and that it is implemented, and > not documented > &g

Re: [HACKERS] Ellipses around result fragment of ts_headline

2009-02-14 Thread Sushant Sinha
I think we currently do that. We add ellipses only when we encounter a new fragment. So there should not be ellipses if we are at the end of the document or if that is the first fragment (includes the beginning of the document). Here is the code in generateHeadline, ts_parse.c that adds the ellipse

Re: [HACKERS] possible bug in cover density ranking?

2009-01-29 Thread Sushant Sinha
On Thu, Jan 29, 2009 at 12:38 PM, Teodor Sigaev wrote: > Is this what is desired? It seems to me that Wdoc is getting a high >> ranking even when we are not sure of the position information. >> > 0.1 is not very high rank, and we could not suggest any reasonable rank in > this case. This document

[HACKERS] possible bug in cover density ranking?

2009-01-28 Thread Sushant Sinha
I am running postgres 8.3.1. In tsrank.c I am looking at the cover density function used for ranking while doing text search: float4 calc_rank_cd(float4 *arrdata, TSVector txt, TSQuery query, int method) Here is the excerpt of code that I think may possibly have bug when document is big enough to

Re: [HACKERS] text search patch status update?

2009-01-07 Thread Sushant Sinha
m was wrong. ;-) > > --- > > Heikki Linnakangas wrote: > > Sushant Sinha wrote: > > > Patch #2. I think this is a straigt forward bug fix. > > > > Yes, I think you're right. In hlCover(), *q is 0 when the only match is > > the first item in th

Re: [HACKERS] text search patch status update?

2008-09-16 Thread Sushant Sinha
AIL PROTECTED] > wrote: > Sushant Sinha escribió: > > Any status updates on the following patches? > > > > 1. Fragments in tsearch2 headlines: > > http://archives.postgresql.org/pgsql-hackers/2008-08/msg00043.php > > > > 2. Bug in hlCover: > > http://arc

[HACKERS] text search patch status update?

2008-09-15 Thread Sushant Sinha
Any status updates on the following patches? 1. Fragments in tsearch2 headlines: http://archives.postgresql.org/pgsql-hackers/2008-08/msg00043.php 2. Bug in hlCover: http://archives.postgresql.org/pgsql-hackers/2008-08/msg00089.php -Sushant. -- Sent via pgsql-hackers mailing list (pgsql-hacke

Re: [HACKERS] small bug in hlCover

2008-08-03 Thread Sushant Sinha
On Mon, 2008-08-04 at 00:36 -0300, Euler Taveira de Oliveira wrote: > Sushant Sinha escreveu: > > I think there is a slight bug in hlCover function in wparser_def.c > > > The bug is not in the hlCover. In prsd_headline, if we didn't find a > suitable bestlen (i.e. >

Re: [HACKERS] small bug in hlCover

2008-08-03 Thread Sushant Sinha
Has any one noticed this? -Sushant. On Wed, 2008-07-16 at 23:01 -0400, Sushant Sinha wrote: > I think there is a slight bug in hlCover function in wparser_def.c > > If there is only one query item and that is the first word in the text, > then hlCover does not returns any cover. Thi

Re: [HACKERS] [GENERAL] Fragments in tsearch2 headline

2008-08-02 Thread Sushant Sinha
file that tests different aspects of the new headline generation function. Let me know if anything else is needed. -Sushant. On Thu, 2008-07-24 at 00:28 +0400, Oleg Bartunov wrote: > On Wed, 23 Jul 2008, Sushant Sinha wrote: > > > I guess it is more readable to add cover separator a

Re: [HACKERS] [GENERAL] Fragments in tsearch2 headline

2008-07-23 Thread Sushant Sinha
I guess it is more readable to add cover separator at the end of a fragment than in the front. Let me know what you think and I can update it. I think the right place for cover separator is in the structure HeadlineParsedText just like startsel and stopsel. This will enable users to specify their

Re: [HACKERS] phrase search

2008-07-18 Thread Sushant Sinha
I looked at query operators for tsquery and here are some of the new query operators for position based queries. I am just proposing some changes and the questions I have. 1. What is the meaning of such a query operator? foo #5 bar -> true if the document has word "foo" followed by "bar" at 5th p

Re: [HACKERS] [GENERAL] Fragments in tsearch2 headline

2008-07-17 Thread Sushant Sinha
008, Sushant Sinha wrote: > > > I will add test queries and their results for the corner cases in a > > separate file. I guess the only thing I am confused about is what should > > be the behavior of headline generation when Query items have words of > > size less than Short

[HACKERS] small bug in hlCover

2008-07-16 Thread Sushant Sinha
I think there is a slight bug in hlCover function in wparser_def.c If there is only one query item and that is the first word in the text, then hlCover does not returns any cover. This is evident in this example when ts_headline only generates the min_words: testdb=# select ts_headline('1 2 3 4 5

Re: [HACKERS] [GENERAL] Fragments in tsearch2 headline

2008-07-16 Thread Sushant Sinha
ur code: > > =# select ts_headline('1 2 3 4 5 1 2 3 1','1&3'::tsquery,'maxfragments=2'); > ts_headline > -- > ... 2 ... > > and so on > > > Oleg > On Tue, 15 Jul 2008, Sushant Sinha wrote: > > &

Re: [HACKERS] [GENERAL] Fragments in tsearch2 headline

2008-07-15 Thread Sushant Sinha
attached are two patches: 1. documentation 2. regression tests for headline with fragments. -Sushant. On Tue, 2008-07-15 at 13:29 +0400, Teodor Sigaev wrote: > > Attached a new patch that: > > > > 1. fixes previous bug > > 2. better handles the case when cover size is greater than the MaxWords

Re: [HACKERS] [GENERAL] Fragments in tsearch2 headline

2008-07-14 Thread Sushant Sinha
Attached a new patch that: 1. fixes previous bug 2. better handles the case when cover size is greater than the MaxWords. Basically it divides a cover greater than MaxWords into fragments of MaxWords, resizes each such fragment so that each end of the fragment contains a query word and then evalua

Re: [HACKERS] initdb in current cvs head broken?

2008-07-10 Thread Sushant Sinha
You are right. I did not do make clean last time. After make clean, make all, and make install it works fine. -Sushant. On Thu, 2008-07-10 at 17:55 +0530, Pavan Deolasee wrote: > On Thu, Jul 10, 2008 at 5:36 PM, Sushant Sinha <[EMAIL PROTECTED]> wrote: > > > > > > >

[HACKERS] initdb in current cvs head broken?

2008-07-10 Thread Sushant Sinha
I am trying to generate a patch with respect to the current CVS head. So ai rsynced the tree, then did cvs up and installed the db. However, when I did initdb on a data directory it is stuck: It is stuck after printing creating template1 creating template1 database in /home/postgres/data/base/1 ..

Re: [HACKERS] [GENERAL] Fragments in tsearch2 headline

2008-06-21 Thread Sushant Sinha
I have an attached an updated patch with following changes: 1. Respects ShortWord and MinWords 2. Uses hlCover instead of Cover 3. Does not store norm (or lexeme) for headline marking 4. Removes ts_rank.h 5. Earlier it was counting even NONWORDTOKEN in the headline. Now it only counts the actual w

Re: [HACKERS] phrase search

2008-06-03 Thread Sushant Sinha
On Tue, 2008-06-03 at 22:16 +0400, Teodor Sigaev wrote: > > This is far more complicated than I thought. > >> Of course, phrase search should be able to use indexes. > > I can probably look into how to use index. Any pointers on this? > > src/backend/utils/adt/tsginidx.c, if you invent operation #

Re: [HACKERS] [GENERAL] Fragments in tsearch2 headline

2008-06-03 Thread Sushant Sinha
My main argument for using Cover instead of hlCover was that Cover will be faster. I tested the default headline generation that uses hlCover with the current patch that uses Cover. There was not much difference. So I think you are right in that we do not need norms and we can just use hlCover. I

Re: [HACKERS] phrase search

2008-06-02 Thread Sushant Sinha
On Mon, 2008-06-02 at 19:39 +0400, Teodor Sigaev wrote: > > > I have attached a patch for phrase search with respect to the cvs head. > > Basically it takes a a phrase (text) and a TSVector. It checks if the > > relative positions of lexeme in the phrase are same as in their > > positions in TSVec

Re: [HACKERS] [GENERAL] Fragments in tsearch2 headline

2008-06-02 Thread Sushant Sinha
Efficiency: I realized that we do not need to store all norms. We need to only store store norms that are in the query. So I moved the addition of norms from addHLParsedLex to hlfinditem. This should add very little memory overhead to existing headline generation. If this is still not acceptable f

[HACKERS] phrase search

2008-05-31 Thread Sushant Sinha
I have attached a patch for phrase search with respect to the cvs head. Basically it takes a a phrase (text) and a TSVector. It checks if the relative positions of lexeme in the phrase are same as in their positions in TSVector. If the configuration for text search is "simple", then this will prod

Re: [HACKERS] [GENERAL] Fragments in tsearch2 headline

2008-05-31 Thread Sushant Sinha
I have attached a new patch with respect to the current cvs head. This produces headline in a document for a given query. Basically it identifies fragments of text that contain the query and displays them. DESCRIPTION HeadlineParsedText contains an array of actual words but not information abou

Re: [HACKERS] [GENERAL] Fragments in tsearch2 headline

2008-05-24 Thread Sushant Sinha
ould we have an option to pass TSVector to headline function? -Sushant. On Sat, 2008-05-24 at 07:57 +0400, Teodor Sigaev wrote: > [moved to -hackers, because talk is about implementation details] > > > I've ported the patch of Sushant Sinha for fragmented headlines to pg8.3.1 &g