Hi all,
I was reading a post from Sushant Sinha about english parser wich do not
consider dot as a word delimiter. In a following mail it has been proposed
to add a patch.
Is there any news about that ?
I would enjoy this patch, too ;)
Thank's
--
Paul Fariello
Étudiant ingénieur à l'Universit
On Tue, Jun 02, 2009 at 04:40:51PM -0400, Sushant Sinha wrote:
> Fair enough. I agree that there is a valid need for returning such tokens as
> a host. But I think there is definitely a need to break it down into
> individual words. This will help in cases when a document is missing a space
> in be
Sushant Sinha wrote:
> So what we can do is: return the entire compound word as Host and
> also break it down into individual words.
So, pretty much like we handle hyphenation?
-Kevin
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscriptio
Fair enough. I agree that there is a valid need for returning such tokens as
a host. But I think there is definitely a need to break it down into
individual words. This will help in cases when a document is missing a space
in between the words.
So what we can do is: return the entire compound wor
On Mon, Jun 01, 2009 at 08:22:23PM -0500, Kevin Grittner wrote:
> Sushant Sinha wrote:
>
> > I think that dot should be considered by as a word delimiter because
> > when dot is not followed by a space, most of the time it is an error
> > in typing. Beside they are not many valid english words
Sushant Sinha wrote:
> I think that dot should be considered by as a word delimiter because
> when dot is not followed by a space, most of the time it is an error
> in typing. Beside they are not many valid english words that have
> dot in between.
It's not treating it as an English word, but
Currently it seems like that dot is not considered as a word delimiter
by the english parser.
lawdb=# select to_tsvector('english', 'Mr.J.Sai Deepak');
to_tsvector
-
'deepak':2 'mr.j.sai':1
(1 row)
So the word obtained is "mr.j.sai" rather than three words "