Re: Offsets?

2008-02-26 Thread Ryan McKinley
(maybe a better question for solr-user... but) which offsets are you talking about? The tokens? Are you looking for something like analysis.jsp or SOLR-477? Steve Suppe wrote: Hello, I'm looking into returning the offsets for various fields I've created in a JSON object. Is there some

Re: Offsets?

2008-02-26 Thread Steve Suppe
like I'm drinking from a firehose!). In our case, certain documents will have certain fields attached, and will be returned based on search criteria. We have specific highlighting requirements, and I will have to rely on the actual offsets of those matching fields (as opposed to using the built

Re: Offsets?

2008-02-26 Thread Ryan McKinley
This is a possibility, but I was thinking if I could get SOLR to return that information in the initial JSON, then I could save a step and speed things up immensely. nothing off the shelf to do it... you may want to look at implementing a search component to augment the response with

Re: Offsets?

2008-02-26 Thread Steve Suppe
I appreciate all the help - I think, for now, we'll try and leverage the analysis.jsp approach, as it appears that different approaches might be in the works, and I don't want to much with any of that just yet :) If I get some time, maybe I'll have better news in the future. Thanks again!

[jira] Reopened: (SOLR-234) TrimFilter should update the start and end offsets

2007-05-22 Thread Ryan McKinley (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan McKinley reopened SOLR-234: TrimFilter should update the start and end offsets

Re: [jira] Commented: (SOLR-234) TrimFilter should update the start and end offsets

2007-05-12 Thread Ryan McKinley
...oh, hmm ... you only want to split on - if it has a space on both sides huh? does java regex have a don't be greedy option? ... javadocs say yes (they call it Reluctant vs greedy so try something like this... pattern=\s*?(\s-\s|--|,|\(|\))\s*? it *almost* works, with:

[jira] Assigned: (SOLR-234) TrimFilter should update the start and end offsets

2007-05-12 Thread Ryan McKinley (JIRA)
to the TrimFilter. By default it will *not* modify the offsets. Depending on how the Tokenizer+Analyzer stream is configured it may or may not make sense, so the option seems reasonable. TrimFilter should update the start and end offsets

Re: [jira] Commented: (SOLR-234) TrimFilter should update the start and end offsets

2007-05-12 Thread Chris Hostetter
: After 1/2 hour of regex hacking... I think I'll stick with a two step : process: split then trim ;) But regex hacking is FUN!! I'm 99% certain this does waht you want... tokenizer class=solr.PatternTokenizerFactory

Re: [jira] Commented: (SOLR-234) TrimFilter should update the start and end offsets

2007-05-12 Thread Ryan McKinley
Chris Hostetter wrote: : After 1/2 hour of regex hacking... I think I'll stick with a two step : process: split then trim ;) But regex hacking is FUN!! I'm 99% certain this does waht you want... tokenizer class=solr.PatternTokenizerFactory

Re: [jira] Commented: (SOLR-234) TrimFilter should update the start and end offsets

2007-05-12 Thread Chris Hostetter
: Incidently, PatternTokenizerFactory seems to have the anoying limitation : of assuming there is a token prior to each match -- even if the match : explicitly matches on the start of the string (so it creates a 0 width : token) ... that seems like a bug right? : how would you change it? I

[jira] Created: (SOLR-234) TrimFilter should update the start and end offsets

2007-05-11 Thread Ryan McKinley (JIRA)
TrimFilter should update the start and end offsets -- Key: SOLR-234 URL: https://issues.apache.org/jira/browse/SOLR-234 Project: Solr Issue Type: Improvement Reporter: Ryan

[jira] Resolved: (SOLR-234) TrimFilter should update the start and end offsets

2007-05-11 Thread Ryan McKinley (JIRA)
exactly where in the orriginal stream of date the source of the token was found ... if hte token is modified in some way (ie: stemmed, trimmed, etc..) the offsets are suppose to remain the same becuase regardless of the token text munging, the orriginal location hsa not actually changed. I'll move

[jira] Commented: (SOLR-234) TrimFilter should update the start and end offsets

2007-05-11 Thread Yonik Seeley (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12495147 ] Yonik Seeley commented on SOLR-234: --- Updating the offsets does seem like the right thing to do. I imagine using

[jira] Commented: (SOLR-234) TrimFilter should update the start and end offsets

2007-05-11 Thread Ryan McKinley (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12495153 ] Ryan McKinley commented on SOLR-234: Updating the offsets does seem like the right thing to do. My real use

[jira] Updated: (SOLR-234) TrimFilter should update the start and end offsets

2007-05-11 Thread Ryan McKinley (JIRA)
()-endOff, t.type() ); t.setPositionIncrement( incr ); //+ start ); TODO? what should happen with the offset } TrimFilter should update the start and end offsets -- Key: SOLR-234 URL

[jira] Commented: (SOLR-234) TrimFilter should update the start and end offsets

2007-05-11 Thread Yonik Seeley (JIRA)
with spaces at the end. It doesn't make any sense. I'd think that updating the offsets is almost always the right thing to do (and should be the default?), given that spaces will almost always come from the field value itself. -Yonik TrimFilter should update the start and end offsets

Re: [jira] Commented: (SOLR-234) TrimFilter should update the start and end offsets

2007-05-11 Thread Chris Hostetter
: My real use case is adding the the trim filter to the pattern tokenizer. : the 'correct' answer in my case it to update the offsets. hmmm... wouldn't the correct thing to do in that case be to change your pattern so it strips the whitespace when tokenizing? that way the offsets of your tokens

[jira] Commented: (SOLR-234) TrimFilter should update the start and end offsets

2007-05-11 Thread Hoss Man (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12495202 ] Hoss Man commented on SOLR-234: --- I'd think that updating the offsets is almost always the right thing to do

[jira] Commented: (SOLR-234) TrimFilter should update the start and end offsets

2007-05-11 Thread Ryan McKinley (JIRA)
the token itself ... I get the basic pattern now: Tokenizers determin the start/end offsets and Filters just transform the text along the way. In Ryan's use case he may want his highlighter-esque code to be able to know ... I am fine with either: 1. leave the TrimFilter as is and do

Re: [jira] Commented: (SOLR-234) TrimFilter should update the start and end offsets

2007-05-11 Thread Ryan McKinley
Chris Hostetter wrote: : My real use case is adding the the trim filter to the pattern tokenizer. : the 'correct' answer in my case it to update the offsets. hmmm... wouldn't the correct thing to do in that case be to change your pattern so it strips the whitespace when tokenizing? that way

Re: [jira] Commented: (SOLR-234) TrimFilter should update the start and end offsets

2007-05-11 Thread Mike Klaas
On 11-May-07, at 5:02 PM, Ryan McKinley wrote: Chris Hostetter wrote: : My real use case is adding the the trim filter to the pattern tokenizer. : the 'correct' answer in my case it to update the offsets. hmmm... wouldn't the correct thing to do in that case be to change your pattern so

Re: [jira] Commented: (SOLR-234) TrimFilter should update the start and end offsets

2007-05-11 Thread Chris Hostetter
: probably I'm just not very good at regex ;) : :pattern=--|,|\s-\s|\(|\) : : this will split on --, - , (, and ). I can't figure out how to : build the pattern so it will trim each thing on the way out. just make sure you match the whitespace in the pattern, you're already doing that

[jira] Commented: (SOLR-234) TrimFilter should update the start and end offsets

2007-05-11 Thread Yonik Seeley (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12495213 ] Yonik Seeley commented on SOLR-234: --- offsets point back to the original field value for a particular token

[jira] Commented: (SOLR-234) TrimFilter should update the start and end offsets

2007-05-11 Thread Hoss Man (JIRA)
sense to have an option -- i'm just saying that as a general rule TokenFilters shouldn't be munging offsets ... i don't see a big difference between TrimFilter and StemmingFilter (where the the stem of fooand foo is foo). so the option should default to off. TrimFilter should update