Muir, thanks for your response.
I'm indexing indian language web pages which has got descent amount of
english content mixed with therein. For the time being I'm not going to use
any stemmers as we don't have standard stemmers for indian languages . So
what I want to do is like this,
Say I've a web
Hi all.
I just want to tell some people an interesting story. :-)
We had a custom analyser which was implemented like this:
public class NoStopWordsAnalyser extends StandardAnalyzer {
public TokenStream tokenStream(String fieldName, Reader reader) {
TokenStream result = ne
On Wed, Jun 3, 2009 at 7:34 PM, Mark Miller wrote:
> Max Lynch wrote:
>
>> Well what happens is if I use a SpanScorer instead, and allocate it like
>>>
>>>
>>
>>
>>
>>> such:
analyzer = StandardAnalyzer([])
tokenStream = analyzer.tokenStream("contents",
luc
Max Lynch wrote:
Well what happens is if I use a SpanScorer instead, and allocate it like
such:
analyzer = StandardAnalyzer([])
tokenStream = analyzer.tokenStream("contents",
lucene.StringReader(text))
ctokenStream = lucene.CachingTokenFilter(tokenStre
I would suggest you take a look at Solr -- http://lucene.apache.org/solr
-- which requires essentially no Java knowledge to use. It has a Python
client which at the very least might help with the learning curve. If
you want to try an alternative to JSP/Servlets for your web framework,
there'
Hi all,
I need to develop a website that allows for searching and browsing the
underlying documents collection. I am going to be using Lucene as the
underlying search engine. I am however not very familiar with web
development, and am new to Lucene as well. I have used JSP/Servlets
before, and sin
Sorry, no videos this time. The conversation wasn't very structured... next
month I'll record it :)
On Wed, Jun 3, 2009 at 1:59 PM, Bhupesh Bansal wrote:
> Great Bradford,
>
> Can you post some videos if you have some ?
>
> Best
> Bhupesh
>
>
>
> On 6/3/09 11:58 AM, "Bradford Stephens"
> wrote:
Hey everyone!
I just wanted to give a BIG THANKS for everyone who came. We had over a
dozen people, and a few got lost at UW :) [I would have sent this update
earlier, but I flew to Florida the day after the meeting].
If you didn't come, you missed quite a bit of learning and topics. Such as:
-B
KK, is all of your latin script text actually english? Is there stuff like
german or french mixed in?
And for your non-english content (your examples have been indian writing
systems), is it generally true that if you had devanagari, you can assume
its hindi? or is there stuff like marathi mixed i
Hi All,
I'm indexing some non-english content. But the page also contains english
content. As of now I'm using WhitespaceAnalyzer for all content and I'm
storing the full webpage content under a single filed. Now we require to
support case folding and stemmming for the english content intermingled
Just be aware that KeywordAnalyzer won't tokenize at all. That is,if you
expect to index "jack-bauer" and hit on "jack" or "bauer" it won't.
Best
Erick
On Wed, Jun 3, 2009 at 2:25 AM, legrand thomas wrote:
> Hi,
>
> A KeywordAnalyzer solved my problem.
> Luke allowed me to understand the queries
11 matches
Mail list logo