RE: dismax: limiting term match to one field

2010-12-09 Thread jan.kurella
try to set the tiebreaker above 1.0, this will increase score for dismax findings in other than the best field. But this may lead to strange side effects? -Original Message- From: ext davidbrai [mailto:davidb...@gmail.com] Sent: Donnerstag, 9. Dezember 2010 09:55 To:

RE: how to set maxFieldLength to unlimitd

2010-12-01 Thread jan.kurella
You just can't set it to unlimited. What you could do, is ignoring the positions and put a filter in, that sets the token for all but the first token to 0 (means the field length will be just 1, all tokens stacked on the first position) You could also break per page, so you put each page on a

Re: how to set maxFieldLength to unlimitd

2010-12-01 Thread jan.kurella
I don't know about upload limitations, but for sure there are some in the default settings, this could explain the limit of 20MB. Which upload mechanism on solr side do you use? I guess this is not a lucene problem but rather the http-layer of solr. If you manage to stream your PDF and

RE: Boost on newer documents

2010-11-30 Thread jan.kurella
You could also put a short representation of the data (I suggest days since 01.01.2010) as payload and calculate boost with payload function of the similarity. -Original Message- From: ext Jason Brown [mailto:jason.br...@sjp.co.uk] Sent: Montag, 29. November 2010 17:28 To:

RE: Good example of multiple tokenizers for a single field

2010-11-30 Thread jan.kurella
We had the same problem for our fields and we wrote a Tokenizer using the icu4j library. Breaking tokens at script changes, and dealing with them according the script and the configured Breakiterators. This works out very well, as we also add the scrip information to the token so later filter

RE: DisMaxQParserPlugin and Tokenization

2010-11-24 Thread jan.kurella
Sorry for the double post. Is there someone, that can point me where the original query given to the DisMaxHandler/QParser is splitted? Jan -Original Message- From: Kurella Jan (Nokia-MS/Berlin) Sent: Montag, 22. November 2010 14:49 To: solr-user@lucene.apache.org Subject:

DisMaxQParserPlugin and Tokenization

2010-11-22 Thread jan.kurella
Hi, Using the SearchHandler with the deftype=”dismax” option enables the DisMaxQParserPlugin. From investigating it seems, it is just tokenizing by whitespace. Although by looking in the code I could not find the place, where this behavior is enforced? I only found, that for each field

passing arguments to analyzer/filter at runtime

2010-11-22 Thread jan.kurella
Hi, I’m trying to find a solution to search only in a given language. On index time the language is known per string to be tokenized so I would like to write a filter that prefixes each token according to its language. First question: how to pass the language argument to the filter best? I’m

RE: passing arguments to analyzer/filter at runtime

2010-11-22 Thread jan.kurella
Hi, yes this is one of my four options I am going to evaluate. Why your suggestion might be problematic: We have ca. 12 language sensitive fields and support ca. 200 distinct languages = 2400 fields a multifield/dismax query spanning 2400 fields might become problematic? We will go for this

DisMaxQParserPlugin and Tokenization

2010-11-22 Thread jan.kurella
Hi, Using the SearchHandler with the deftype=”dismax” option enables the DisMaxQParserPlugin. From investigating it seems, it is just tokenizing by whitespace. Although by looking in the code I could not find the place, where this behavior is enforced? I only found, that for each field