Re: [HACKERS] english parser in text search: support for multiple words in the same position

2011-01-06 Thread Sushant Sinha
Do not know if this mail got lost in between or no one noticed it! On Thu, 2010-12-23 at 11:05 +0530, Sushant Sinha wrote: Just a reminder that this patch is discussing how to break url, emails etc into its components. > > On Mon, Oct 4, 2010 at 3:54 AM, Tom Lane wrote: > [ sorry for no

Re: [HACKERS] english parser in text search: support for multiple words in the same position

2010-12-22 Thread Sushant Sinha
Just a reminder that this patch is discussing how to break url, emails etc into its components. On Mon, Oct 4, 2010 at 3:54 AM, Tom Lane wrote: > [ sorry for not responding on this sooner, it's been hectic the last > couple weeks ] > > Sushant Sinha writes: > >> I looked at this patch a bit.

Re: [HACKERS] english parser in text search: support for multiple words in the same position

2010-10-03 Thread Tom Lane
[ sorry for not responding on this sooner, it's been hectic the last couple weeks ] Sushant Sinha writes: >> I looked at this patch a bit. I'm fairly unhappy that it seems to be >> inventing a brand new mechanism to do something the ts parser can >> already do. Why didn't you code the url-par

Re: [HACKERS] english parser in text search: support for multiple words in the same position

2010-09-29 Thread Robert Haas
On Wed, Sep 29, 2010 at 1:29 AM, Sushant Sinha wrote: > Any updates on this? > > > On Tue, Sep 21, 2010 at 10:47 PM, Sushant Sinha > wrote: >> >> > I looked at this patch a bit.  I'm fairly unhappy that it seems to be >> > inventing a brand new mechanism to do something the ts parser can >> > alr

Re: [HACKERS] english parser in text search: support for multiple words in the same position

2010-09-28 Thread Sushant Sinha
Any updates on this? On Tue, Sep 21, 2010 at 10:47 PM, Sushant Sinha wrote: > > I looked at this patch a bit. I'm fairly unhappy that it seems to be > > inventing a brand new mechanism to do something the ts parser can > > already do. Why didn't you code the url-part mechanism using the > > ex

Re: [HACKERS] english parser in text search: support for multiple words in the same position

2010-09-21 Thread Sushant Sinha
> I looked at this patch a bit. I'm fairly unhappy that it seems to be > inventing a brand new mechanism to do something the ts parser can > already do. Why didn't you code the url-part mechanism using the > existing support for compound words? I am not familiar with compound word implementatio

Re: [HACKERS] english parser in text search: support for multiple words in the same position

2010-09-19 Thread Tom Lane
Sushant Sinha writes: > For the headline generation to work properly, email/file/url/host need > to become skip tokens. Updating the patch with that change. I looked at this patch a bit. I'm fairly unhappy that it seems to be inventing a brand new mechanism to do something the ts parser can alre

Re: [HACKERS] english parser in text search: support for multiple words in the same position

2010-09-08 Thread Sushant Sinha
For the headline generation to work properly, email/file/url/host need to become skip tokens. Updating the patch with that change. -Sushant. On Sat, 2010-09-04 at 13:25 +0530, Sushant Sinha wrote: > Updating the patch with emitting parttoken and registering it with > snowball config. > > -Sushan

Re: [HACKERS] english parser in text search: support for multiple words in the same position

2010-09-04 Thread Sushant Sinha
Updating the patch with emitting parttoken and registering it with snowball config. -Sushant. On Fri, 2010-09-03 at 09:44 -0400, Robert Haas wrote: > On Wed, Sep 1, 2010 at 2:42 AM, Sushant Sinha wrote: > > I have attached a patch that emits parts of a host token, a url token, > > an email token

Re: [HACKERS] english parser in text search: support for multiple words in the same position

2010-09-03 Thread Robert Haas
On Wed, Sep 1, 2010 at 2:42 AM, Sushant Sinha wrote: > I have attached a patch that emits parts of a host token, a url token, > an email token and a file token. Further, it makes sure that a > host/url/email/file token and the first part-token are at the same > position in tsvector. You should pr

Re: [HACKERS] english parser in text search: support for multiple words in the same position

2010-08-31 Thread Sushant Sinha
I have attached a patch that emits parts of a host token, a url token, an email token and a file token. Further, it makes sure that a host/url/email/file token and the first part-token are at the same position in tsvector. The two major changes are: 1. Tokenization changes: The patch exploits the

Re: [HACKERS] english parser in text search: support for multiple words in the same position

2010-08-02 Thread Robert Haas
On Mon, Aug 2, 2010 at 10:21 AM, Kevin Grittner wrote: > Sushant Sinha wrote: > >> Yes thats what I am planning to do. I just wanted to see if anyone >> can help me in estimating whether this is doable in the current >> parser or I need to write a new one. If possible, then some idea >> on how to

Re: [HACKERS] english parser in text search: support for multiple words in the same position

2010-08-02 Thread Kevin Grittner
Sushant Sinha wrote: > Yes thats what I am planning to do. I just wanted to see if anyone > can help me in estimating whether this is doable in the current > parser or I need to write a new one. If possible, then some idea > on how to go about implementing? The current tsearch parser is a stat

Re: [HACKERS] english parser in text search: support for multiple words in the same position

2010-08-02 Thread Tom Lane
Sushant Sinha writes: >> This would needlessly increase the number of tokens. Instead you'd >> better make it work like compound word support, having just "wikipedia" >> and "org" as tokens. > The current text parser already returns url and url_path. That already > increases the number of uniqu

Re: [HACKERS] english parser in text search: support for multiple words in the same position

2010-08-02 Thread Sushant Sinha
On Mon, 2010-08-02 at 09:32 -0400, Robert Haas wrote: > On Mon, Aug 2, 2010 at 9:12 AM, Sushant Sinha wrote: > > The current text parser already returns url and url_path. That already > > increases the number of unique tokens. I am only asking for adding of > > normal english words as well so that

Re: [HACKERS] english parser in text search: support for multiple words in the same position

2010-08-02 Thread Robert Haas
On Mon, Aug 2, 2010 at 9:12 AM, Sushant Sinha wrote: > The current text parser already returns url and url_path. That already > increases the number of unique tokens. I am only asking for adding of > normal english words as well so that if someone types only "wikipedia" > he gets a match. [...] >

Re: [HACKERS] english parser in text search: support for multiple words in the same position

2010-08-02 Thread Markus Wanner
Hi, On 08/02/2010 03:12 PM, Sushant Sinha wrote: The current text parser already returns url and url_path. That already increases the number of unique tokens. Well, I think I simply turned that off to be able to search for plain words. It still works for complete URLs, those are just treated

Re: [HACKERS] english parser in text search: support for multiple words in the same position

2010-08-02 Thread Sushant Sinha
> On 08/01/2010 08:04 PM, Sushant Sinha wrote: > > 1. We do not have separate tokens "wikipedia" and "org" > > 2. If we have the two tokens we should have them at adjacent position so > > that a phrase search for "wikipedia org" should work. > > This would needlessly increase the number of tokens.

Re: [HACKERS] english parser in text search: support for multiple words in the same position

2010-08-02 Thread Markus Wanner
Hi, On 08/01/2010 08:04 PM, Sushant Sinha wrote: 1. We do not have separate tokens "wikipedia" and "org" 2. If we have the two tokens we should have them at adjacent position so that a phrase search for "wikipedia org" should work. This would needlessly increase the number of tokens. Instead y

[HACKERS] english parser in text search: support for multiple words in the same position

2010-08-01 Thread Sushant Sinha
Currently the english parser in text search does not support multiple words in the same position. Consider a word "wikipedia.org". The text search would return a single token "wikipedia.org". However if someone searches for "wikipedia org" then there will not be a match. There are two problems here