[GENERAL] ts_headline and query with hyphen

2012-12-04 Thread daniel
Hi I have a question about ts_headline, when the query includes word like 'on-line' - only the 'line' part is highlighted, even though the whole phrase is indexed too, some details below. Postgresql 9.1.6 select token, dictionary, lexemes from ts_debug('play on-line') where alias 'blank';

Re: [GENERAL] ts_headline and query with hyphen

2012-12-04 Thread Tom Lane
daniel dochto...@gmail.com writes: I have a question about ts_headline, when the query includes word like 'on-line' - only the 'line' part is highlighted, even though the whole phrase is indexed too, some details below. Part of the reason is that on is a stop word (at least in the default

Re: [GENERAL] ts_headline and query with hyphen

2012-12-04 Thread daniel
On 12/05/2012 04:49 AM, Tom Lane wrote: daniel dochto...@gmail.com writes: I have a question about ts_headline, when the query includes word like 'on-line' - only the 'line' part is highlighted, even though the whole phrase is indexed too, some details below. Part of the reason is that on is

Re: [GENERAL] ts_headline and query with hyphen

2012-12-04 Thread daniel
As a follow up to my previous comment, this is a cutting example select ts_headline('game played on-line', to_tsquery('on-line game'), 'MaxWords=3,MinWords=2,ShortWord=1'); ts_headline --- bgame/b played on that can't be right... daniel -- Sent via

Re: [GENERAL] ts_headline

2008-02-23 Thread Oleg Bartunov
On Sat, 23 Feb 2008, Stephen Davies wrote: As it turns out, all I needed was in the doco but the key element - the first config arg to ts_headline - was not in any of the examples so I missed it. aha, Original one were based on default configuration, but then concept was changed, but the

Re: [GENERAL] ts_headline

2008-02-22 Thread Stephen Davies
Not quite:-( It is the ts_headline with the explicit english configuration that fails rather than the implicit simple. That's what is so weird. As you say, the ts_vector has databas so the english version of ts_headline should work - but it doesn't. The simple version does; despite the

Re: [GENERAL] ts_headline

2008-02-22 Thread Richard Huxton
Stephen Davies wrote: OK. The first level explanation is that my default config is simple. Aha! Actually, that's the whole explanation. This explains the different query results as english reduces database to databas while simple does not reduce it at all. Exactly. The document is

Re: [GENERAL] ts_headline

2008-02-22 Thread Richard Huxton
Stephen Davies wrote: Not quite:-( It is the ts_headline with the explicit english configuration that fails rather than the implicit simple. Hmm... arse. That's what is so weird. As you say, the ts_vector has databas so the english version of ts_headline should work - but it doesn't. The

Re: [GENERAL] ts_headline

2008-02-22 Thread Richard Huxton
Stephen Davies wrote: Unfortunately, my link to the box with the test database is down due to lack of maintenance by our local telco (Telstra) but I think that I also missed the optional config arg to ts_headline. The lack of link also means that I cannot confirm your findings but your logic

Re: [GENERAL] ts_headline

2008-02-22 Thread Stephen Davies
Unfortunately, my link to the box with the test database is down due to lack of maintenance by our local telco (Telstra) but I think that I also missed the optional config arg to ts_headline. The lack of link also means that I cannot confirm your findings but your logic looks good. It begs

Re: [GENERAL] ts_headline

2008-02-22 Thread Oleg Bartunov
On Fri, 22 Feb 2008, Stephen Davies wrote: H! I think I now understand the ts position better, thank you. Part of my problem has been that I am used to the functionality of Open Text's LCS (aka BASIS) product which handles text differently. It includes the position (and context)

Re: [GENERAL] ts_headline

2008-02-22 Thread Stephen Davies
H! I think I now understand the ts position better, thank you. Part of my problem has been that I am used to the functionality of Open Text's LCS (aka BASIS) product which handles text differently. It includes the position (and context) information in the index and does remember how the

Re: [GENERAL] ts_headline

2008-02-22 Thread Stephen Davies
As it turns out, all I needed was in the doco but the key element - the first config arg to ts_headline - was not in any of the examples so I missed it. Would it be possible for ts_headline to work with the pre-parsed ts_vector? I see references to future plans for phrase searching in ts. Is

Re: [GENERAL] ts_headline

2008-02-21 Thread Richard Huxton
Stephen Davies wrote: I am a bit puzzled by the output of ts_headline (V8.3) for different queries. It seems that the difference is in the number of occurrences of the criterion words. If the number of hits is less than some number, the ts_headline result is correct but if the number of hits

Re: [GENERAL] ts_headline

2008-02-21 Thread Stephen Davies
Attached is the document in question. Searches for norwegian, thesaurus and statement give good results. A search for database gives the plain text from the beginning. Cheers and thanks, Stephen Davies On Thursday 21 February 2008 20:08, Richard Huxton wrote: Stephen Davies wrote: G'day

Re: [GENERAL] ts_headline

2008-02-21 Thread Stephen Davies
G'day Richard. I don't think so. A sample command is: ts_headline(abstract,to_tsquery('english','database'),'minWords = 99, maxWords = 999') I have also tried with smaller maxwords without any visible effect. Cheers, Stephen On Thursday 21 February 2008 19:19, Richard Huxton wrote: Stephen

Re: [GENERAL] ts_headline

2008-02-21 Thread Richard Huxton
Stephen Davies wrote: G'day Richard. I don't think so. A sample command is: ts_headline(abstract,to_tsquery('english','database'),'minWords = 99, maxWords = 999') I have also tried with smaller maxwords without any visible effect. Hmm - a simple test seems to work OK. SELECT ts_headline(

Re: [GENERAL] ts_headline

2008-02-21 Thread Richard Huxton
Stephen Davies wrote: Attached is the document in question. Searches for norwegian, thesaurus and statement give good results. A search for database gives the plain text from the beginning. Seems OK here - might need to look at your configuration settings.

Re: [GENERAL] ts_headline

2008-02-21 Thread Stephen Davies
Interesting. I hadn't seen that section before. As I said in my original post: Is this a bug or am I missing some configuration option. I shall investigate the stuff in 12.8. Any suggestions as to where to start? Thanks, Stephen Davies On Thursday 21 February 2008 20:50, Richard Huxton

Re: [GENERAL] ts_headline

2008-02-21 Thread Stephen Davies
I just spotted the difference between your test and mine. My query says: select ts_headline(abstract,to_tsquery('english','database'),'minWords = 99, maxWords = 999') from document where id=21; where your equivalent does not include the 'english' arg. If I take out the 'english' from this

Re: [GENERAL] ts_headline

2008-02-21 Thread Richard Huxton
Stephen Davies wrote: Interesting. I hadn't seen that section before. As I said in my original post: Is this a bug or am I missing some configuration option. I shall investigate the stuff in 12.8. Any suggestions as to where to start? Well, no-one has been using 8.3 for more than a few

Re: [GENERAL] ts_headline

2008-02-21 Thread Richard Huxton
Stephen Davies wrote: I just spotted the difference between your test and mine. My query says: select ts_headline(abstract,to_tsquery('english','database'),'minWords = 99, maxWords = 999') from document where id=21; where your equivalent does not include the 'english' arg. If I take out

Re: [GENERAL] ts_headline

2008-02-21 Thread Stephen Davies
OK. The first level explanation is that my default config is simple. This explains the different query results as english reduces database to databas while simple does not reduce it at all. The document is parsed/indexed using english explicitly so my queries nedd to be explicit also (not an

[GENERAL] ts_headline

2008-02-20 Thread Stephen Davies
I am a bit puzzled by the output of ts_headline (V8.3) for different queries. I have one record in a test documentation table and am applying different queries against that table to check out the ts_headline outputs. The document in question has 2553 words which generate 519 tokens in the