Reverse stemmer?

2009-10-06 Thread David Leangen
Hello, I've been using Lucene in a very basic way for some time now, and I'm starting to take advantage of some of the linguistic capabilities only now. I am making use of the snowball analyzer for stemming, and it works very well. Question: is there any such thing as a "reverse stemm

Phase Extraction, mainly for English

2009-10-06 Thread Andrew Zhang
Hi guys, The requirement is very simple here, e.g. for this sentence, 'The NBA formally announced its new *social media* guidelines Wednesday', I want to treat '*social media*' as a whole phase term. The default english analyzers came with lucene all deal with single word, so it you want to get t

Re: Reverse stemmer?

2009-10-06 Thread Erick Erickson
Why do you care? That is, what is the problem you want to solve with a reversestemmer? Note that if you STORE the field, the *original* text is available, storing and indexing are orthogonal. So if all you want is to get the original text back, you can freely index with a stemming analyzer, but jus

Re: Phase Extraction, mainly for English

2009-10-06 Thread Vasudevan Comandur
Hi, Take the NLP route and use modules like POS tagger and NP chunker. OpenNLP has a stack for English language. Try to use them. Regards Vasu On Tue, Oct 6, 2009 at 5:12 PM, Andrew Zhang wrote: > Hi guys, > > The requirement is very simple here, e.g. for this sentence, 'The NBA > form

Re: Phase Extraction, mainly for English

2009-10-06 Thread Erick Erickson
Maybe I'm missing the problem entirely, but can you use phrase queries?or one of the Span* queries with a slop of 0 when searching? Best Erick On Tue, Oct 6, 2009 at 7:42 AM, Andrew Zhang wrote: > Hi guys, > > The requirement is very simple here, e.g. for this sentence, 'The NBA > formally anno

Document loading

2009-10-06 Thread Dragon Fly
Hi, Which of the following method actually loads the document from disk? (1) Document document = searcher.doc (docId); OR (2) string value = document.get ("FirstNameField"); It's probably searcher.doc but I just want to be sure. Thank you. ___

Help regarding choosing Indexing Stratergies

2009-10-06 Thread ManjuNS
In my Application currently I am indexing Object with One Field[ID] to Hold ID of the Object which is stored and attributes of Object into Another Field [Content] to hold attribute information seperated by space and this Field is tokenized. When I search for information related to the Object I get

Struts2 implementation

2009-10-06 Thread Gary Moore
I'm porting one of my Struts1 Lucene search apps to Struts2. The basics are working but I need to remove the Lucene search service out of the action classes. I'm ready to write an interceptor but can perhaps also see using a plug-in like is done with Tiles. As I'm a Struts2 newbie, any ti

Re: Phase Extraction, mainly for English

2009-10-06 Thread Karl Wettin
Hi Andrew, I think you are looking for the shingle package in contrib/analyzers. karl 6 okt 2009 kl. 13.42 skrev Andrew Zhang: Hi guys, The requirement is very simple here, e.g. for this sentence, 'The NBA formally announced its new *social media* guidelines Wednesday', I want to t

Forwarded: InstantiatedIndex questions

2009-10-06 Thread David Causse
Hi, Karl prefer to answer on the ml so here is some informations he asked on how we use InstantiatedIndex. - Forwarded message from David Causse - Date: Tue, 6 Oct 2009 15:45:57 +0200 From: David Causse To: Karl Wettin Subject: Re: InstatiatedIndex questions Hi, sorry for the delay.

Re:InstantiatedIndex questions

2009-10-06 Thread Karl Wettin
6 okt 2009 kl. 18.54 skrev David Causse: David, your timing couldn't be better. Just the other day I proposed that we deprecate InstantiatedIndexWriter. The sum of the reasons to this is that I'm a bit lazy. Your mail makes me reconsider. https://issues.apache.org/jira/browse/LUCENE-1948

Re: Struts2 implementation

2009-10-06 Thread Dave Newton
> I'm porting one of my Struts1 Lucene search apps to Struts2. The basics are > working but I need to remove the Lucene search service out of the action > classes. I'm ready to write an interceptor but can perhaps also see using a > plug-in like is done with Tiles. As I'm a Struts2 newbie

Re: How to test if an IndexReader is still open?

2009-10-06 Thread Chris Hostetter
: I figured it might be less expensive if search() (I have extended : IndexSearcher) were to check that the underlying IndexReader is still if you're extending IndexSearcher anyway you can override the close() method to update a boolean and then add your own isClosed() method. : open - and re

Re: document diversity

2009-10-06 Thread Michael Masters
My initial description may have been a little abstract. Maybe I should explain exactly what I'm trying to do. My company has various revenue channels, one of which is per click. If a user does a search, we would like to show results with the greatest revenue, although we don't want people to be abl

Re: document diversity

2009-10-06 Thread Paul Libbrecht
Just as you can add a query that will boost better things with a higher quality, you can add a query for a higher revenue. Basically, the default operator "should" in boolean-clauses can be used exactly for that: do not force this query to be matched but raise boost if there's something tha

Re: document diversity

2009-10-06 Thread Simon Willnauer
Michael, this sounds like a pretty good usecase for CustomScoreQuery (http://lucene.apache.org/java/2_9_0/api/core/org/apache/lucene/search/function/CustomScoreQuery.html) The org.apache.lucene.search.function package provides flexible programmatic control over document scores. You boost up documen

Re: Document loading

2009-10-06 Thread Simon Willnauer
Hi, a call to IndexSearcher.doc(docId) will load the document. Internally this call forwards to IndexReader.document(docId) which could be very expensive because this method will load all stored document fields. I would recommend to have a look at IndexSearcher.doc(docId, FieldSelector). This meth

How to setup a scalable deployment?

2009-10-06 Thread Chris Were
Hi, I've been using lucene for a project and it works great on the one dev. machine. Next step is to investigate the best method of deploying lucene so that multiple web servers can access the lucene directory of indexes. I see four potential options: 1) Each web server indexes the content separa

Re: Struts2 implementation

2009-10-06 Thread Gary Moore
Yes, I'm injecting the service now and it works fine. My head is not completely around struts2 yet but there would seem to be considerable advantage to the interceptor/plug-in approach, not the least of which is you wouldn't have to write an action class each time you need to drop search resu

Re: How to setup a scalable deployment?

2009-10-06 Thread Jason Rutherglen
Chris, It sounds like you're on the right track. Have you looked at Solr which uses the rsync/Java replication method you mentioned? Replication and near realtime in Solr aren't quite there yet, however it wouldn't be too hard to add it. -J On Tue, Oct 6, 2009 at 3:57 PM, Chris Were wrote: > Hi

Re: Phase Extraction, mainly for English

2009-10-06 Thread Andrew Zhang
Right, Vasu, I think NLP is good, I should take some time to look at that. Thanks. On Tue, Oct 6, 2009 at 8:10 PM, Vasudevan Comandur wrote: > Hi, > > Take the NLP route and use modules like POS tagger and NP chunker. > > OpenNLP has a stack for English language. Try to use them. > > Regards

Re: Phase Extraction, mainly for English

2009-10-06 Thread Andrew Zhang
Hi Karl, I think shingle is designed to make the phase search faster, it'll generate a lot of "seemed like" phase by pos only and completely disregard the meaning, that's not good enough. Regards, Andrew On Tue, Oct 6, 2009 at 11:51 PM, Karl Wettin wrote: > Hi Andrew, > > I think you are looki

Re: Phase Extraction, mainly for English

2009-10-06 Thread Andrew Zhang
Hi Erick, If you want to query, you should know the "phase" right? but I want to discover the phase, or which words came together so often and by the natural way, we use that as a phase. On Tue, Oct 6, 2009 at 8:12 PM, Erick Erickson wrote: > Maybe I'm missing the problem entirely, but can you

Re: Phase Extraction, mainly for English

2009-10-06 Thread Karl Wettin
There are many uses for shingles. I've used them to find common phrases in text, which is my understanding of what you try to achieve. It works rather well, is a very simple solution and easy on resources compared to real semantic analysis. You'll be getting a lot of shingles such as "the

Re: How to setup a scalable deployment?

2009-10-06 Thread no spam
Have you investigated using Terracotta / Compass? We need real-time updates across the index using multiple web servers. I recently got this up and running and we're going to be doing some performance testing. It's very easy, essentially you just replace your FSDirectoryProvider with a Terracott

Re: How to setup a scalable deployment?

2009-10-06 Thread Jake Mannix
Hi Chris, Answering your question depends in part whether your kind of scalability is dependent on sharding (your index size is expected to grow to very large) or just replication (your query load is large, and you need failover). It sounds like you're mostly thinking about the latter. 1) Each