Stop words filter

2010-06-22 Thread Vinicius Carvalho
Hello there! I've been using lucene as a Fult Text Search solution for some time. And although I'm familiar with Analyzers and Stemmers I never used them directly. I'm testing a few experiments on Sentiment Analysis and our implementation needs to perform stemming and stop word removal. I thought

Question on number of fields in a document

2010-03-12 Thread Vinicius Carvalho
Hello there! We are indexing metadata for our medias. One ideia is that each user adds its own metadata, so each document may have different number/name/type of fields. Is this ok on Lucene? I mean, is Lucene ok with the this relax approach. Also, considering that each user may define its own meta

Re: Free software for language detection

2009-07-06 Thread Vinicius Carvalho
You can also check google's language API: I'm writing a blog entry on this, hope to post tomorrow: http://code.google.com/apis/ajaxlanguage/documentation/reference.html Here a snippet of it working: (Using Json Simple to decode: http://code.google.com/p/json-simple/) try { String s =

Re: Termdocs question

2008-06-23 Thread Vinicius Carvalho
t; > 20 jun 2008 kl. 18.12 skrev Vinicius Carvalho: > > > Hello there! I trying to query for a specific document on a efficient way. >> > > Hi Vinicius, > >termDocs = reader.termDocs(term); >> while(termDocs.next()){ >> in

Termdocs question

2008-06-20 Thread Vinicius Carvalho
Hello there! I trying to query for a specific document on a efficient way. My index is structured in a way where I have an id field which is a unique key for the whole index. When I'm updating/removing a document I was searching for my id using a Searcher and a TermQuery. But reading the list it se

Question about indexing (BrazilianAnalyzer)

2008-06-03 Thread Vinicius Carvalho
Hello there! I'm indexing documents using the BrazilianAnalyzer, and I've noticed that many words are not being indexed. I store and index the entire doc (I'm doing this in order to present the fragments on the results, don't know if its the best way, mostly on large docs, any ideas?). Well using l

Single IndexReader vs Single IndexSearcher

2008-05-29 Thread Vinicius Carvalho
Hello there! My application uses multiple indexes, so I create a multireader based on my indexreaders. What I've done is create a Map of Readers, and whenever the user needs a reader I iterate over my collection, checking if it is the current index, if not I reopen it, else, I add it to my multirea

Using highlighter

2008-05-29 Thread Vinicius Carvalho
Hello there! When I use an wildcard with my query, for instance: java*. Lucene finds the document, but when using the highlighter, the getBestFragment() is returning null for a fragment that contains the word javadoc for instance. Is it possible to use the hightlighter with wildcards? One option I

Re: Boosting Search

2008-05-16 Thread Vinicius Carvalho
> > 16 maj 2008 kl. 19.20 skrev Vinicius Carvalho: > >> >> I know its a dumb test >> > > There is a lot of initial latency. You want to "warm" the index. > > but what can be done in order to speed things up? >> > &g

Boosting Search

2008-05-16 Thread Vinicius Carvalho
Hello there! We are starting with lucene, and in order to prove it's usage one of the benefits is performance. I do know that lucene (as other full text search engines) provide many more benefits than using a SGDB. Ok, so here's a simple test: I have a Table with 17.700 rows. It is stored on mysql

Chaining analyzers

2008-03-20 Thread Vinicius Carvalho
Hello there! Is it possible to chain analyzers? If I don't know what is the locale of my document, and considering that all of my docs will always either be in English/Spanish/Portuguese, is it possible to chain analyzers to remove stop-words from all those locales? I know that stem would be a much

Re: [noobie question] Can't index :(

2008-03-19 Thread Vinicius Carvalho
Doh Sorry, never mind, returning different indexWriter instances :P On Wed, Mar 19, 2008 at 7:21 PM, Vinicius Carvalho < [EMAIL PROTECTED]> wrote: > Hello there! This is really a dumb question, but I just need to get things > started :( I'm just trying to get things working

[noobie question] Can't index :(

2008-03-19 Thread Vinicius Carvalho
Hello there! This is really a dumb question, but I just need to get things started :( I'm just trying to get things working here, and I'm not being able to index :(. Here's my code: public abstract class AbstractLuceneIndexer implements LuceneIndexer{ protected String INDEX_DIR = ""; pu

Adding new documents to index

2008-03-19 Thread Vinicius Carvalho
Hello there! Since I've just begun with lucene, some concepts are kinda new for me :). One of the is the whole indexing process. Well, AFAIK, indexing should happen in a batch process right, to maximize the time spent on this operation. One issue tough, is that our client wants "instants search res

Fwd: Lucene on a cluster environment

2008-03-19 Thread Vinicius Carvalho
index changes. 3. Finally, older versions (1.4 and earlier) of Lucene had problems with having the index on a shared directory. I think most of these issues have been resolved. Good Luck. "Vinicius Carvalho" <[EMAIL PROTECTED]> 03/19/2008 09:17 AM Please respond to java-user@lucene.a

Lucene on a cluster environment

2008-03-19 Thread Vinicius Carvalho
Hello there! I have just started with lucene. Bought the Lucene in action book [right now I'm at chap 4, plus the 10th chapter, great explanation by Terence from jGuru, really nice stuff], also I'm reading most that I can at the wiki :) Still a bit lost with some stuff, mostly with clusters :) Our