compare paragraphs of text - which Query Class to use?

2013-06-14 Thread Malgorzata Urbanska
Hello, I've just started using Lucene and I'm not sure which Query Classes I should use in my project. My goal is to compare paragraphs of text. Paragraph A is a query and paragraph B is a document for which I would like to calculate similarity score. the paragraphs A and B can be in some

Re: compare paragraphs of text - which Query Class to use?

2013-06-14 Thread Jack Krupansky
First, start with Solr and use the edismax query parser with the default query operator as OR and set pf, pf2, and pf3, and then simply query by the raw text of the paragraph. This will order the results by how closely the indexed paragraphs match the query paragraph. This is also a good

Re: compare paragraphs of text - which Query Class to use?

2013-06-14 Thread Malgorzata Urbanska
thanks, I will try it gosia On Fri, Jun 14, 2013 at 10:33 AM, Jack Krupansky j...@basetechnology.comwrote: First, start with Solr and use the edismax query parser with the default query operator as OR and set pf, pf2, and pf3, and then simply query by the raw text of the paragraph. This will

Re: Seemingly very difficult to wrap an Analyzer with CharFilter

2013-06-14 Thread Steven Schlansker
On Jun 12, 2013, at 5:26 PM, Michael Sokolov msoko...@safaribooksonline.com wrote: On 6/12/2013 7:02 PM, Steven Schlansker wrote: On Jun 12, 2013, at 3:44 PM, Michael Sokolov msoko...@safaribooksonline.com wrote: You may not have noticed that CharFilter extends Reader. The expected

KStemFilter

2013-06-14 Thread Sirish Vadala
Hello All, I have a new requirement within my text search implementation to perform stemming. I have done some research and implemented snowball, but however the customers found it too aggressive and eventually I got them to agree to compromise on k-stem algorithm. Currently my existing code is

RE: KStemFilter

2013-06-14 Thread Uwe Schindler
Look at the javadocs of the analysis package and the Analyzer class, there it is explained how Analyzers are built - the first example is the way to go: http://lucene.apache.org/core/4_3_0/core/org/apache/lucene/analysis/Analyzer.html - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen

RE: KStemFilter

2013-06-14 Thread Sirish Vadala
Awesome! Exactly what I was looking for. Thanks Schindler. Uwe Schindler wrote Look at the javadocs of the analysis package and the Analyzer class, there it is explained how Analyzers are built - the first example is the way to go:

segments and sorting

2013-06-14 Thread Sriram Sankar
Quick question on segments: For my use case of having all docs sorted by a static rank and being able to cut off retrieval after a certain number of docs, I have to sort all my docs using the static rank (and Lucene 4 has a way to do this). When an index has multiple segments, how does this

Read an solr index with two different lucene formats

2013-06-14 Thread Mingfeng Yang
I have a solr index built with solr 1.4 a few years ago, and later upgraded to solr 3.6, and now the index is consisting of 150 million documents. Now I want to read all values of a DateField from the index. But it turns out that for nearly 100 million documents, document.get('date') return

Re: Read an solr index with two different lucene formats

2013-06-14 Thread Chris Hostetter
: I used solr to query the index, and verified that each document does have a : non-blank date field. I suspect that it's because the lucene-3.6 api I am : using can not read datefield correctly from documents written in lucene 1.4 : format. how did you verify that they all have a non-blank

Re: Read an solr index with two different lucene formats

2013-06-14 Thread Mingfeng Yang
Hoss, I did in two ways. The first is the 1) in your list, q=date:* match q=*:*. And all fields are stored in the index. I got a doc id (say 3315), do q=id:3315, the output contain the datefield and value. Anyway, I am 100% sure every doc has a date field and value indexed and stored there.

Re: Read an solr index with two different lucene formats

2013-06-14 Thread Mingfeng Yang
I did System.println(d.get('date')), and the output is stored,binary,omitNorms,indexOptions=DOCS_ONLYdate:[B@4cbfea1d Emmm. On Fri, Jun 14, 2013 at 4:05 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : I used solr to query the index, and verified that each document does have a :

Re: Read an solr index with two different lucene formats

2013-06-14 Thread Mingfeng Yang
Figured out the solution. The datefield in those documents were stored as binary, so what I should do is Fieldable df = doc.getFieldable(fname); byte[] ary = df.getBinaryValue(); ByteBuffer bb = ByteBuffer.wrap(ary); long num = bb.getLong(); ate dt =

Lucene pointing to existing DB Index

2013-06-14 Thread Pradeep B
Hi I have just started out on lucene and experimenting with some possibilities. My goal is to try to exploit an existing database index (which in my case is an inverted index) to serve as a Lucene Index. this helps me avoid need of additional indexing time and storage space. Is this possible ?