Re: String field

2011-03-29 Thread Scott Gonyea
First, make sure your request handler is set to spit out everything. I take it you did, but I hate to assume. Second, I suggest indexing your data twice. One as tokenized-text, the other as a string. It'll save you from howling at the moon in anguish... Unless you really only do care about pure

Advice on Exact Matching?

2010-12-30 Thread Scott Gonyea
Thank you, Scott Gonyea

Solr highlighting is double-quotes-aware?

2010-12-01 Thread Scott Gonyea
RD_STARTbaffleTEST_KEYWORD_END_prices.html\"]TEST_KEYWORD_STARTbaffleTEST_KEYWORD_END Is there something about this data that makes the highlighter not want to split it up? Do I have to have Solr tokenize the words by some character that I somehow excluded? Thank you, Scott Gonyea

Re: Dismax Filtering Hyphens? Why is this not working? How do I debug Dismax?

2010-10-04 Thread Scott Gonyea
Wow, that's pretty infuriating. Thank you for the suggestion. I added it to the Wiki, with the hope that if it contains misinformation then someone will correct it and, consequently, save me from another one of these experiences :) (...and to also document that, hey, there is a tokenizer which t

Dismax Filtering Hyphens? Why is this not working? How do I debug Dismax?

2010-10-04 Thread Scott Gonyea
Wow, this is probably the most annoying Solr issue I've *ever* dealt with. First question: How do I debug Dismax, and its query handling? Issue: When I query against this StrField, I am attempting to do an *exact* match... Albeit one that is case-insensitive :). So, 90% exact. It works in a maj

Re: Highlighting match term in bold rather than italic

2010-09-30 Thread Scott Gonyea
Your solrconfig has a highlighting section. You can make that CDATA thing whatever you want. I changed it to . On Thu, Sep 30, 2010 at 2:54 PM, efr...@gmail.com wrote: > Hi all - > > Does anyone know how to produce solr results where the match term is > highlighted in bold rather than italic? >

Re: How to Index Pure Text into Seperate Fields?

2010-09-29 Thread Scott Gonyea
Break your HTML pages into the desired fields, format it as follows: http://wiki.apache.org/solr/UpdateXmlMessages And away you go. You may want to search / review the Wiki. Also, if you're indexing websites and want to place it in Solr, you should look at Nutch. It can do all that work for yo

Re: Get all results from a solr query

2010-09-16 Thread Scott Gonyea
lol, note to self: scratch out IPs. Good thing firewalls exist to keep my stupidity at bay. Scott On Thu, Sep 16, 2010 at 2:55 PM, Scott Gonyea wrote: > If you want to do it in Ruby, you can use this script as scaffolding: > require 'rsolr' # run `gem install rsolr` to

Re: Get all results from a solr query

2010-09-16 Thread Scott Gonyea
If you want to do it in Ruby, you can use this script as scaffolding: require 'rsolr' # run `gem install rsolr` to get this solr  = RSolr.connect(:url => 'http://ip-10-164-13-204:8983/solr') total = solr.select({:rows => 0})["response"]["numFound"] rows  = 10 query = {   :rows   => rows,   :sta

Re: LowerCaseTokenizerFactory - Tokenizer Options? Why does it behave this way?

2010-09-14 Thread Scott Gonyea
table feeling? If it performs worse enough to matter, > then that's why you'd need a custom tokenizer, other than that I'm not sure > anything's undesirable about the PatternTokenizer. > > > Jonathan > > Scott Gonyea wrote: > >> I'd agree with yo

Re: LowerCaseTokenizerFactory - Tokenizer Options? Why does it behave this way?

2010-09-14 Thread Scott Gonyea
it for the demonstrated > > performance advantage. (At least I hope that's what happened, otherwise > > there's no excuse for it!). > > > > Do you know you get a worthwhile performance benefit for what you're > doing? > > If not, why do it? > > > &

Re: LowerCaseTokenizerFactory - Tokenizer Options? Why does it behave this way?

2010-09-14 Thread Scott Gonyea
ated > performance advantage. (At least I hope that's what happened, otherwise > there's no excuse for it!). > > Do you know you get a worthwhile performance benefit for what you're doing? > If not, why do it? > > Jonathan > > > Scott Gonyea wrote: > >&g

Re: LowerCaseTokenizerFactory - Tokenizer Options? Why does it behave this way?

2010-09-14 Thread Scott Gonyea
I went for a different route: https://issues.apache.org/jira/browse/LUCENE-2644 Scott On Tue, Sep 14, 2010 at 11:18 AM, Robert Muir wrote: > On Tue, Sep 14, 2010 at 1:54 PM, Scott Gonyea wrote: > > > Hi, > > > > I'm tweaking my schema and the LowerCaseTo

LowerCaseTokenizerFactory - Tokenizer Options? Why does it behave this way?

2010-09-14 Thread Scott Gonyea
Hi, I'm tweaking my schema and the LowerCaseTokenizerFactory doesn't create tokens, based solely on lower-casing characters. Is there a way to tell it NOT to drop non-characters? It's amazingly frustrating that the TokenizerFactory and the FilterFactory have two entirely different modes of behav

Re: In Need of Direction; Phrase-Context Tracking / Injection (Child Indexes) / Dismissal

2010-09-03 Thread Scott Gonyea
to the > original pages for each context > > You may be able to represent your grammar as textual rules instead of code. > Your latency may be minutes instead of milliseconds though... > > -- > Jan Høydahl, search solution architect > Cominvent AS - www.cominvent.com > Training

Re: In Need of Direction; Phrase-Context Tracking / Injection (Child Indexes) / Dismissal

2010-09-02 Thread Scott Gonyea
t all 20. Further, the white-listing can generally be applied to other sites in which they appear. I'd love to get some thoughts on how to tackle this problem, but I think that kicking off separate documents, within Solr, for each specific occurrence... would be the simplest path.

In Need of Direction; Phrase-Context Tracking / Injection (Child Indexes) / Dismissal

2010-09-01 Thread Scott Gonyea
sumptions can be painfully expensive. Thank you for reading my bloated e-mail. Again, I'm mostly just looking to be pointed to various pieces of the Lucene / Solr code-base, and am trolling for any insight that people might share. Scott Gonyea