Norms and Term Vectors in Lucene 4.0

2012-10-29 Thread Scott Smith
Converting some code to lucene 4.0, it appears that we can no longer set whether we want to store norms or termvectors using the "sugared" Field classes (e.g., StringField() and TextField). I gather the defaults are to store norms and to not store termvectors? If I don't want norms on a field,

Re: A large number of files in an index (3.6)

2012-10-29 Thread kiwi clive
Hi Lance, File handles can be a problem but the instantaneous opening of a great many files at exactly the same time give a big I/O hit during a query. This is compounded by many indexes on the server than can get hit at the same time. Limiting the number of files per index directory makes a di

RE: lucene 4.0 indexReader is changed

2012-10-29 Thread Scott Smith
OK. I'll take a look at that. Thanks for the help. Scott -Original Message- From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: Friday, October 26, 2012 6:07 PM To: java-user@lucene.apache.org Subject: Re: lucene 4.0 indexReader is changed How about DirectoryReader.html#openIf

RE: Lucene 4.0 delete by ID

2012-10-29 Thread Scott Smith
I understand the issue of the lucene doc id changing. I'll probably look to see if I can delete stuff just based on some field that I have that I know won't change. I've used the doc id for a long time, but maybe it's time for a change. Thanks for all of the input. Scott -Original Mes

RE: Lucene 4.0 delete by ID

2012-10-29 Thread Scott Smith
The lucene integer doc id. -Original Message- From: Lance Norskog [mailto:goks...@gmail.com] Sent: Sunday, October 28, 2012 5:09 PM To: java-user@lucene.apache.org Subject: Re: Lucene 4.0 delete by ID Scott, did you mean the Lucene integer id, or the unique id field? - Original Mess

Re: Term Positions added to one document forward

2012-10-29 Thread Simon Willnauer
you should call currDocsAndPositions.nextPosition() before you call currDocsAndPositions.getPayload() payloads are per positions so you need to advance the pos first! simon On Mon, Oct 29, 2012 at 6:44 PM, Ivan Vasilev wrote: > Hi Guys, > > I use the following code to index documents and set Pa

Term Positions added to one document forward

2012-10-29 Thread Ivan Vasilev
Hi Guys, I use the following code to index documents and set Payloads to term positions: public class TestPayloads_ { private static final String INDEX_DIR = "E:/Temp/Index"; public static void main(String[] args) throws Exception { IndexWriterConfig iwc = new Ind

RE: Scoring based on document

2012-10-29 Thread Siraj Haider
Thanks Selvakumar, but I do not think it would work in our scenario. Simon mentioned earlier that it is possible in Lucene 4.0. Does somebody have more information on how to accomplish that? regards -Siraj -Original Message- From: selvakumar netaji [mailto:vvekselva...@gmail.com] Sent: W

Re: Running Solr Core/ Tika on Azure

2012-10-29 Thread Jack Krupansky
SolrCell includes Tika and SolrCell is included with Solr, at least the standard distribution of Solr. You can stream Office and PDF docs directly to the extracting request handler where Tika will process them. You can also ask SolrCell to "extract only" and return the extracted content. See:

SpanQuery, Filter, BooleanQuery

2012-10-29 Thread Carsten Schnober
Hi, I've got a setup in which I would like to perform an arbitrary query over one field (typically realised through a WildcardQuery) and the matches are returned as a SpanQuery because the result payloads are further processed using Span.next() and Span.getPayload(). This works fine with the follow

Running Solr Core/ Tika on Azure

2012-10-29 Thread Aloke Ghoshal
Hi, Looking for feedback on running Solr Core/ Tika parsing engine on Azure. There's one offering for Solr within Azure from Lucid works. This offering however doesn't mention Tika. We are looking at options to make content from files (doc, excel, pdfs, etc.) stored within Azure storage search-ab