Re: [GENERAL] tsearch2 dictionary for statute cites

2009-04-07 Thread Kevin Grittner
Oleg Bartunov wrote: > contrib/test_parser - an example parser code. Using that as a template, I seem to be on track to use the regexp.c code to pick out statute cites from the text in my start function, and recognize when I'm positioned on one in my getlexeme (GETTOKEN) function, delegating ev

Re: [GENERAL] tsearch2 dictionary for statute cites

2009-04-07 Thread Kevin Grittner
Tom Lane wrote: > "Kevin Grittner" writes: >> Can I use a different set of dictionaries >> for creating the tsquery than I did for the tsvector? > > Sure, as long as the tokens (normalized words) that they produce > match up for words that you want to have match. Once the tokens > come out, t

Re: [GENERAL] tsearch2 dictionary for statute cites

2009-04-07 Thread Oleg Bartunov
On Tue, 7 Apr 2009, Kevin Grittner wrote: Oleg Bartunov wrote: of course, you can build tsquery youself, but once your parser can recognize your very own token 'xxx', it'd be much better to have mapping xxx -> dict_xxx, where dict_xxx knows all semantics. I probably just need to have that "A

Re: [GENERAL] tsearch2 dictionary for statute cites

2009-04-07 Thread Tom Lane
"Kevin Grittner" writes: > Can I use a different set of dictionaries > for creating the tsquery than I did for the tsvector? Sure, as long as the tokens (normalized words) that they produce match up for words that you want to have match. Once the tokens come out, they're just strings as far as

Re: [GENERAL] tsearch2 dictionary for statute cites

2009-04-07 Thread Kevin Grittner
Oleg Bartunov wrote: > of course, you can build tsquery youself, but once your parser can > recognize your very own token 'xxx', it'd be much better to have > mapping xxx -> dict_xxx, where dict_xxx knows all semantics. I probably just need to have that "Aha!" moment, slap my forehead, and move

Re: [GENERAL] tsearch2 dictionary for statute cites

2009-04-07 Thread Oleg Bartunov
On Tue, 7 Apr 2009, Kevin Grittner wrote: If the document text contains '341.15(3)' I want to find it with a search string of '341', '341.15', '341.15(3)' but not '341.15(3)(b)', '341.1', or '15'. How do I handle that? Do I have to build my tsquery values myself as text and cast to tsquery, or

Re: [GENERAL] tsearch2 dictionary for statute cites

2009-04-07 Thread Kevin Grittner
Oleg Bartunov wrote: > contrib/test_parser - an example parser code. Thanks! Sorry I missed that. -Kevin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] tsearch2 dictionary for statute cites

2009-04-07 Thread Oleg Bartunov
Kevin, contrib/test_parser - an example parser code. On Mon, 6 Apr 2009, Kevin Grittner wrote: Tom Lane wrote: "Kevin Grittner" writes: People are likely to search for statute cites, which tend to have a hierarchical form. I think what you need is a custom parser I've just returned to

Re: [GENERAL] tsearch2 dictionary for statute cites

2009-04-06 Thread Kevin Grittner
Tom Lane wrote: > regexp substitution I found a way to at least keep the cite in one piece. Perhaps I can do the rest in custom dictionaries, which are more pluggable. select ts_debug ('State Statute pertaining to'); ts_debug -

Re: [GENERAL] tsearch2 dictionary for statute cites

2009-04-06 Thread Kevin Grittner
Tom Lane wrote: > Perhaps you could pass the texts and the queries through a regexp > substitution that converts digit-dot-digit to digit-dash-digit? This doesn't seem to get me anywhere. For cite '9.125.07(4A)(3)' I got this: select ts_debug('9-125-07-4A-3'); ts_

Re: [GENERAL] tsearch2 dictionary for statute cites

2009-04-06 Thread Kevin Grittner
Tom Lane wrote: > "Kevin Grittner" writes: >> People are likely to search for statute cites, which tend to have a >> hierarchical form. > I think what you need is a custom parser I've just returned to this and after review have become convinced that this is absolutely necessary; once the def

Re: [GENERAL] tsearch2 dictionary for statute cites

2009-03-11 Thread Kevin Grittner
>>> Oleg Bartunov wrote: > On Tue, 10 Mar 2009, Tom Lane wrote: >> "Kevin Grittner" writes: >>> People are likely to search for statute cites, which tend to have a >>> hierarchical form. I'm not sure the prefix approach will work for >>> this. For example, there is a section 939.64 in the stat

Re: [GENERAL] tsearch2 dictionary for statute cites

2009-03-11 Thread Oleg Bartunov
On Tue, 10 Mar 2009, Tom Lane wrote: "Kevin Grittner" writes: People are likely to search for statute cites, which tend to have a hierarchical form. I'm not sure the prefix approach will work for this. For example, there is a section 939.64 in the state statutes dealing with commission of a

Re: [GENERAL] tsearch2 dictionary for statute cites

2009-03-10 Thread Tom Lane
"Kevin Grittner" writes: > People are likely to search for statute cites, which tend to have a > hierarchical form. I'm not sure the prefix approach will work for > this. For example, there is a section 939.64 in the state statutes > dealing with commission of a crime while wearing a bulletproof

[GENERAL] tsearch2 dictionary for statute cites

2009-03-10 Thread Kevin Grittner
I broached this topic last year[1], but the project got tabled until now; so I raise it again. We want to be able to search text (extracted from character-based PDF files) which will contain legal terms and statute cites, and we want to be able to do tsearch2 searches (under 8.3.recent). It's cle