Re: Lucene Concurrent Search

2013-09-05 Thread Ian Lea
I think that blog post was bleeding edge and the API changed a bit subsequently. I use Directory dir = whatever; SearcherManager sm = new SearcherManager(dir, new SearcherFactory()); to get default behaviour. The javadocs for SearcherFactory explain that you can write your own implementation

Re: Lucene Concurrent Search

2013-09-05 Thread Ian Lea
research in the index? 2013/9/5 Ian Lea ian@gmail.com I think that blog post was bleeding edge and the API changed a bit subsequently. I use Directory dir = whatever; SearcherManager sm = new SearcherManager(dir, new SearcherFactory()); to get default behaviour. The javadocs

Re: Files greater than 20 MB not getting Indexed. No files generated except write.lock even after 8-9 minutes.

2013-08-30 Thread Ian Lea
do in order to index large files. Say about 30 MB.. I read something MergeFactor and etc. but was not able to set any value for it. Don't even know whether doing that will help the cause.. On 8/29/2013 7:04 PM, Ian Lea wrote: Well, I use neither Eclipse nor your application server and can

Re: Files greater than 20 MB not getting Indexed. No files generated except write.lock even after 8-9 minutes.

2013-08-29 Thread Ian Lea
So you do get an exception after all, OOM. Try it without this line: doc.add(new TextField(contents, new BufferedReader(new InputStreamReader(fis, UTF-8; I think that will slurp the whole file in one go which will obviously need more memory on larger files than on smaller ones. Or just run

Re: Files greater than 20 MB not getting Indexed. No files generated except write.lock even after 8-9 minutes.

2013-08-29 Thread Ian Lea
.. Even then no exception occurred!!.. Only write.lock is formed.. Removing contents field is not desirable as this is needed for search to work perfectly... On 8/29/2013 6:17 PM, Ian Lea wrote: So you do get an exception after all, OOM. Try it without this line: doc.add(new TextField

Re: Wildcard in PhraseQuery

2013-08-27 Thread Ian Lea
See the FAQ: http://wiki.apache.org/lucene-java/LuceneFAQ#Can_I_combine_wildcard_and_phrase_search.2C_e.g._.22foo_ba.2A.22.3F -- Ian. On Tue, Aug 27, 2013 at 5:11 AM, Chuming Chen chumingc...@gmail.com wrote: Hi All, Can I use wildcard in a phrase query in Lucene/Solr? Can anybody point me

Re: Delete content corresponding to FileName from INDEX and corresponding change in INDEXES FOR SUGGESTION.

2013-08-27 Thread Ian Lea
How many lucene indexes do you have? Lucene has no concept of main or child indexes. To remove all docs from an index for a given indexed field write.deleteDocuments() is the way to do it, although in your case probably by Term, like your code sample: Term term1=new Term(name,D111-123-987.txt)

Re: Boosting potential phrases when using QueryParser

2013-08-27 Thread Ian Lea
I don't think there's a standard solution. Using PhraseQuery as you suggest should work - you could also look at the setSlop(s) method of PhraseQuery. SpanQuery and its friends such as SpanNearQuery are more flexible but not generated by default by QueryParser, although if you are going to be

Re: Question on wildcard queries, filters, scoring and TooManyClauses exception

2013-08-16 Thread Ian Lea
I can't explain all of it and 3.0 is way old ... you might like to think about upgrading. However in your first snippet you don't need the query AND the filter. Either one will suffice. In some circumstances, as you say, filters are preferable but queries and filters are often interchangeable.

Re: Wrong documents in results

2013-08-16 Thread Ian Lea
and != AND? http://lucene.apache.org/core/4_4_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#AND It works for or rather than OR because that is the default. If you had a doc with id=or you'd find that too, I think. It looks odd to be escaping the value when you are

Re: Boolean Query when indexing each line as a document.

2013-08-14 Thread Ian Lea
then, it gives me proper hits.. But for me it should work on Indexes created by Line by Line parsing also. Please guide. On 8/13/2013 4:41 PM, Ian Lea wrote: remedialaction != remedial action? Show us your query. Show a small self-contained sample program or test case that demonstrates

Re: Boolean Query when indexing each line as a document.

2013-08-14 Thread Ian Lea
GIVEN THE COMPLETE CODE SAMPLE FOR PEOPLE TO WORK ON.. PLEASE GUIDE ME NOW: IN case any further information is required please let me know. On 8/14/2013 7:43 PM, Ian Lea wrote: Well, you have supplied a bit more info - good - but I still can't spot the problem. Unless someone else can I

Re: Boolean Query when indexing each line as a document.

2013-08-14 Thread Ian Lea
normally (without indexing them line by line) I do get HITS.. Still not able to figure out the problem. On 8/14/2013 8:07 PM, Ian Lea wrote: I was rather hoping for something smaller! One suggestion from a glance is that you're using some analyzer somewhere but building a BooleanQuery out

Re: Assistance for Unified Index Proces

2013-08-14 Thread Ian Lea
Have one big index holding everything, with a folder indexed field that you can use for filtering? -- Ian. On Wed, Aug 14, 2013 at 10:03 AM, Mark Jason B. Nacional jason.nacio...@icomteq.com wrote: Hi Lucene Developers: I just want to ask some help regarding our new implementation of

Re: Creating Indexes when data inside the file is being written.

2013-08-13 Thread Ian Lea
If I've understood your question correctly, the answer is yes. Assuming the input data is coming from another file the flow will be along the lines of . Open input file for reading . Open output file for writing . Open (or create) lucene index . For each input record - write to output file

Re: Boolean Query when indexing each line as a document.

2013-08-13 Thread Ian Lea
Should be straightforward enough. Work through the tips in the FAQ entry at http://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hits_.2F_incorrect_hits.3F and post back if that doesn't help, with details of how you are analyzing the data and how you are searching. -- Ian. On

Re: Creating Indexes when data inside the file is being written.

2013-08-13 Thread Ian Lea
indexes using this file, I would be able to create the indexes ..??? Won't I get any kind of exception from the file as I am still writing data in that file. ??? Guidance is highly appreciated... On 13-08-2013 PM 02:01, Ian Lea wrote: If I've understood your question correctly, the answer

Re: Boolean Query when indexing each line as a document.

2013-08-13 Thread Ian Lea
values. No problem,. 3. If I fire a Boolean Query with remedialaction and Checking as a must/must , then it is not providing me this document as a hit. 4. I am using StandardAnalyzer both during the indexing and searching time. On 8/13/2013 2:31 PM, Ian Lea wrote: Should be straightforward

Re: Searching within a Search Result

2013-08-07 Thread Ian Lea
, analyzer ).parse(abstract); BooleanQuery bq = new BooleanQuery(); booleanQuery.add(q1,BooleanClause.Occur.MUST); booleanQuery.add(q2,BooleanClause.Occur.MUST); Hits hits = indexSearcher.search(booleanQuery); This is right for what I want to do? Thanks. 2013/8/6 Ian Lea ian@gmail.com

Re: Searching within a Search Result

2013-08-06 Thread Ian Lea
The standard way is to combine the searches by label and abstract into one query. If using QueryParser a simple example would look something like label:aaa abstract:bbb abstract:ccc. You can get the same effect, with more flexibility, by building a BooleanQuery in code. Also consider using a

Re: Lucene in Action

2013-07-15 Thread Ian Lea
Have you read and worked through http://lucene.apache.org/core/4_3_1/demo/overview-summary.html? To build and run applications using lucene you need either lucene-4.3.1.tgz or lucene-4.3.1.zip. If you're on unix you might go for the gzipped tar file, windows users might prefer the Zip file. The

Re: Features added after Lucene 4

2013-07-15 Thread Ian Lea
The Changes and Migration Guide on http://lucene.apache.org/core/4_3_1/ (or 4_2_x) should help. They usually link through to JIRA pages which will have more detail. If you want info about lower level stuff such as Codecs, try googling lucene codecs or whatever it is you're interested in. --

Re: TermDocs

2013-07-08 Thread Ian Lea
There's a fair chunk of info on TermDocs and friends in the migration guide. http://lucene.apache.org/core/4_3_1/MIGRATE.html Does that cover your question? -- Ian. On Mon, Jul 8, 2013 at 12:32 PM, Yonghui Zhao zhaoyong...@gmail.com wrote: Hi, What's proper replacement of TermDocs termDocs

Re: Possible location of word inside the file.

2013-07-04 Thread Ian Lea
Sounds like you're indexing each log file as one lucene document. Obvious answer is to index each line in each log file as a separate doc. Searches would then match lines in files and you can display those lines, summarizing counts per file if you want that, If you wanted to be able to show

Re: Possible location of word inside the file.

2013-07-04 Thread Ian Lea
. This will be very resource extensive as well as severly hit performance issue. On 7/4/2013 2:04 PM, Ian Lea wrote: Sounds like you're indexing each log file as one lucene document. Obvious answer is to index each line in each log file as a separate doc. Searches would then match lines in files

Re: How to perform Date Range Search

2013-06-27 Thread Ian Lea
Concatenating all your searchable fields into one is certainly what I'd do. Simple and efficient. And yes, you can perform range searches via the query parser - the example you give matches the one in the docs at

Re: What to do with Lucene Version parameter on upgrade

2013-06-20 Thread Ian Lea
Version relates to analyzers and the like rather than to internals such as index format. I don't recall what exactly has changed between 4.0 and 4.3.1 but you're probably safe to change it and use LUCENE_43. Take a look at the javadoc for StandardAnalyzer - that lists some versions and what

Re: Using MultiField query with Boost factor

2013-06-07 Thread Ian Lea
Take a look at BooleanQuery and the setBoost() call on Query, and BooleanClause.Occur for the MUST/SHOULD logic. Something along the lines of this pseudo code BooleanQuery bq = new BooleanQuery(); Query titleq = xxx; titleq.setBoost(somevalue) bq.add(titleq, must|should) Query addressq = yyy

Re: Using MultiField query with Boost factor

2013-06-07 Thread Ian Lea
, Jun 7, 2013 at 7:12 PM, Ian Lea ian@gmail.com wrote: Take a look at BooleanQuery and the setBoost() call on Query, and BooleanClause.Occur for the MUST/SHOULD logic. Something along the lines of this pseudo code BooleanQuery bq = new BooleanQuery(); Query titleq = xxx; titleq.setBoost

Re: IndexReader doc method performance troubles

2013-05-15 Thread Ian Lea
Are the indexes on local disks? Is the same index present on all 6 servers or split or different or what? Do you see the slowdown on all servers/indexes or what? Any IO/memory/CPU problems being reported anywhere? Are you always loading the same fields with approx the same volume of data or do

Re: Find index version with an index reader

2013-05-14 Thread Ian Lea
Take a look at org.apache.lucene.index.CheckIndex. That displays the versions of the segment files. Note the plurals - that's a complication you may need to deal with. Or read|store whatever you want with IndexWriter.get|setCommitData(...). You can get the currently running version via

Re: TrackingIndexWriter.tryDeleteDocument(IndexReader, int) vs deleteDocuments(Query)

2013-05-07 Thread Ian Lea
Does the tryDeleteDocument() call return true or false? The 4.2.1 javadocs for IndexWriter.tryDeleteDocument says If the provided reader is an NRT reader obtained from this writer ... then the delete succeeds and this method returns true; else, it returns false Maybe you need

Re: [sort order]how to support sort by max(field1),then field2 desc.

2013-05-07 Thread Ian Lea
I think you'll have to run 2 searches with 2 sorts - the first to get max(field1) and the second sorted by field2. If you don't want the max(field1) doc to appear in the second list you'll have to filter it out somehow. -- Ian. On Tue, May 7, 2013 at 6:49 AM, Jack Liu jack@morningstar.com

Re: Indexed numeric fields return indexed() == false

2013-04-26 Thread Ian Lea
Unfortunately you can't read an existing document, modify it and add it to an existing or new index. You'll have to create a new Document, populate it with fields of the relevant types, using values from the source index if they are stored, then add the new Document to the new index. If there

Re: Indexed numeric fields return indexed() == false

2013-04-26 Thread Ian Lea
It doesn't work because lucene doesn't store all the necessary info in the index. It may work for StringField because there isn't really any other info for that field type - it's just a string stored as is - but other fields have tokenization, precision, whatever, which may not be stored, and

Re: high memory usage by indexreader

2013-03-22 Thread Ian Lea
at 6:43 AM, Ian Lea ian@gmail.com wrote: That number of docs is far more than I've ever worked with but I'm still surprised it takes 4 minutes to initialize an index reader. What exactly do you mean by initialization? Show us the code that takes 4 minutes. What version of lucene? What OS

Re: high memory usage by indexreader

2013-03-21 Thread Ian Lea
Gigs. The total virtual memory (VIRT) is 307 Gig. Do you think this is okay? Do you think I should use Solr instead of using lucene core? I have times tamps for document and so I can split into multiple indexes sorted on chronology. Thanks, Ashwin On Wed, Mar 20, 2013 at 1:43 PM, Ian Lea

Re: high memory usage by indexreader

2013-03-20 Thread Ian Lea
Searching doesn't usually use that much memory, even on large indexes. What version of lucene are you on? How many docs in the index? What does a slow query look like (q.toString()) and what search method are you calling? Anything else relevant you forgot to tell us? Or google lucene

Re: Lucene scoring

2013-03-12 Thread Ian Lea
Sounds like a job for boosting. Document.setBoost() and/or Field.setBoost(). The former has gone away in lucene 4.x. See the migration guide. Or execute 2 searches, restricting the first to the contact docs or whichever you want to be top of the list. -- Ian. On Tue, Mar 12, 2013 at 7:36

Re: Example phrase query with lucene version 4

2013-03-12 Thread Ian Lea
QueryParser qp = new QueryParser(Version.whatever, somefield, new WhateverAnalyzer()); Query q = qp.parse(\The mouse gnawed the clothes of the king of Rome\); and q should be a PhraseQuery if I've got the quoting right. Some of those words might be stop words which might cause you problems

Re: Should heap size be proportionate to the size of the index I'm opening?

2013-03-11 Thread Ian Lea
It's not that simple. More to do with number of terms than raw index size. Of course your large index may well have more terms than a smaller one. See http://blog.mikemccandless.com/2010/07/lucenes-ram-usage-for-searching.html and

Re: Bulk indexing and delete old index files

2013-03-05 Thread Ian Lea
That sounds fine. Or just open an IndexWriter with create/overwrite/whatever-it-is set to true. There's rarely a clear best strategy. Do the simplest thing that could possibly work: http://www.xprogramming.com/Practices/PracSimplest.html -- Ian. On Tue, Mar 5, 2013 at 5:10 AM, 장용석

Re: Split index and store

2013-03-01 Thread Ian Lea
Never rely on lucene internal doc ids. Use your own. Lucene searches on unique ids are of course very fast. -- Ian. On Fri, Mar 1, 2013 at 9:51 AM, Ramprakash Ramamoorthy youngestachie...@gmail.com wrote: Hello team, I have a query and I am explaining it as below. Objective :

Re: Solr Or Lucene Paging

2013-03-01 Thread Ian Lea
You're probably better off asking Solr questions on the solr list. But if you really need the 20 hits starting at 100 i.e. page number 5 you'd better rethink your requirements and your indexing strategy. -- Ian. On Fri, Mar 1, 2013 at 6:48 AM, dizh d...@neusoft.com wrote: Hi,All:

Re: Lucene filter questions

2013-02-25 Thread Ian Lea
I'm sure that Filters are thread safe. Lucene doesn't have a global caching mechanism as such. But see FieldCache - you might get better performance from FieldCacheTermsFilter than from TermsFilter. See also CachingWrapperFilter and QueryWrapperFilter. -- Ian. On Mon, Feb 25, 2013 at 1:16

Re: SpanQuery.getSpans() with document sorting

2013-02-25 Thread Ian Lea
FieldCache? -- Ian. On Sun, Feb 24, 2013 at 4:46 PM, Igor Shalyminov ishalymi...@yandex-team.ru wrote: A slightly more specific question: Is it possible to load in RAM a single stored field for all the documents in the index via some Lucene data structures? -- Best Regards, Igor

Re: Min/max support in Lucene

2013-02-25 Thread Ian Lea
TermsEnum will give you the first, and the last if you loop through to the end. Generally pretty fast. Or skip through with seekCeil() - might be faster. -- Ian. On Wed, Feb 20, 2013 at 11:31 PM, Vitaly Funstein vfunst...@gmail.com wrote: I know that general questions about aggregate

Re: How to add a field to hold a Java map object?

2013-02-14 Thread Ian Lea
, it seems that StringField can't be found and thus not compiled. My lucene is 3.5 On Wed, Feb 13, 2013 at 4:54 AM, Ian Lea ian@gmail.com wrote: Assuming you mean the String representation of a Map, the same way you do any other String: use StringField or an analyzer that keeps

Re: How to add a field to hold a Java map object?

2013-02-14 Thread Ian Lea
found StringField API here, however, it seems that StringField can't be found and thus not compiled. My lucene is 3.5 On Wed, Feb 13, 2013 at 4:54 AM, Ian Lea ian@gmail.com wrote: Assuming you mean the String representation of a Map, the same way you do any other String: use

Re: Can't find LimitTokenCountAnalyzer in 4.1

2013-02-14 Thread Ian Lea
It's in 4.1, just not necessarily in the same place. $ jar -tf lucene-analyzers-common-4.1.0.jar | grep Limit org/apache/lucene/analysis/miscellaneous/LimitTokenCountAnalyzer.class org/apache/lucene/analysis/miscellaneous/LimitTokenCountFilter.class

Re: IndexReader.reopen() If Index Has Been Rebuilt

2013-02-14 Thread Ian Lea
I've never tried reopen() on a completely new index, but if it works, it works. Try it. I'm not aware of any documentation explicitly mentioning this. The benefit of using reopen() rather than close/open is that if only some segments have changed the reopen is less costly. For a brand new index

Re: Optimal way to index

2013-02-12 Thread Ian Lea
. Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com On Mon, Feb 11, 2013 at 10:03 PM, Ian Lea ian@gmail.com wrote: You can certainly use lucene for this, and it will be blindingly fast even if you use a disk based index. Just index documents as you've laid it out

Re: Optimal way to index

2013-02-12 Thread Ian Lea
-' as well. Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com On Tue, Feb 12, 2013 at 9:50 PM, Ian Lea ian@gmail.com wrote: From a glance it looks fine. I don't see what you gain by adding dots - you are using a TermQuery which will only do exact matches. Since

Re: Strange behavior of term queries with StoredFields - 4.1

2013-02-11 Thread Ian Lea
Yes, that looks fine. As far as I'm aware the compression is low level and transparent to user code. -- Ian. On Mon, Feb 11, 2013 at 2:59 PM, Ramprakash Ramamoorthy youngestachie...@gmail.com wrote: On Mon, Feb 11, 2013 at 7:10 PM, Ian Lea ian@gmail.com wrote: StoredField does indeed

Re: Custom score question

2013-02-08 Thread Ian Lea
The score from the main query is passed to the customScore() methods of CustomScoreProvider so you can tweak that as you will. Or, easier, use document boosting to set a low boost for common titles. How are you going to determine if a title is common or not? Lucene by default will tend to

Re: How to implement Lucene

2013-02-05 Thread Ian Lea
You're probably better off using Solr which is tightly linked with lucene. http://lucene.apache.org/solr/ I'm sure there are installation and getting started guides there. -- Ian. On Tue, Feb 5, 2013 at 12:58 PM, Álvaro Vargas Quezada al...@outlook.com wrote: Hello, I want to implement a

Re: How to properly use updatedocument in lucene.

2013-02-01 Thread Ian Lea
There is no way to update without reindexing the entire document. It might be less confusing if the IndexWriter.updateDocument() methods were called maybe replaceDocument() but they're not. It would also help if lucene could reject attempts to pass a Document read from the index to these methods

Re: Migration to Lucene 4.1

2013-01-30 Thread Ian Lea
Have you read the changes and migration docs that come with 4.1? You may also need to look at 3.[123456] javadocs to see deprecations and alternatives for stuff that was present in 3.0 but gone in 4.1. -- Ian. On Tue, Jan 29, 2013 at 7:30 PM, Paul Sitowitz sitow...@gmail.com wrote: Hello, I

Re: Large Index Query Help!

2013-01-29 Thread Ian Lea
Lucene won't load the whole index into memory. See http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html What version of lucene? How are you opening index readers? How are you searching? How much memory are you giving the jvm? What else in your app is using all the memory?

Re: update index for user defined types

2013-01-29 Thread Ian Lea
Please try and phrase your question in terms of lucene. Oracle? What's that? User defined type? What's that? IndexWriter has various updateDocuments() methods. I usually give all docs in my indexes a unique id, supplied by me (primary key in database terminology) and use the method that

Re: CompressingStoredFieldsFormat doesn't show improvement

2013-01-29 Thread Ian Lea
I make that about 15Mb of data - trivial. What happens if you make each field 400 chars and index a million or two? If you really have that few docs, what are you worrying about? A doubling of indexing time from 3.0.2 to 4.1 is surprising, but for 40k docs are we talking about it taking 2

Re: CompressingStoredFieldsFormat doesn't show improvement

2013-01-29 Thread Ian Lea
of such indexes. So the time will add up with the number of such indexes being open simultaneously and parallel indexing. Arun On Tue, Jan 29, 2013 at 7:09 PM, Ian Lea ian@gmail.com wrote: I make that about 15Mb of data - trivial. What happens if you make each field 400 chars and index

Re: How do I best store my IRC log data in lucene indexes?

2013-01-25 Thread Ian Lea
Unless there's good reason not to (massive size? different systems? conflicting update schedules?) I'd store everything in the one index. Consider a cached filter for fast restriction of searches to particular message types. -- Ian. On Thu, Jan 24, 2013 at 1:06 PM, crocket

Re: How do I best store my IRC log data in lucene indexes?

2013-01-25 Thread Ian Lea
wrote: Do you mean http://lucene.apache.org/core/4_1_0/core/org/apache/lucene/analysis/CachingTokenFilter.htmlby a cached filter? And how would you restrict searches to particular message types fast with a cached filter? I'm a beginner. On Fri, Jan 25, 2013 at 6:51 PM, Ian Lea ian@gmail.com

Re: 回复: 回复: IndexReader.open and CorruptIndexException

2013-01-25 Thread Ian Lea
at 7:58 AM, Cheng zhoucheng2...@gmail.com wrote: Any example code for this SearcherManager? On Fri, Jan 25, 2013 at 3:59 AM, Ian Lea ian@gmail.com wrote: There will be one file handle for every currently open file. Use SearcherManager and this problem should go away. -- Ian. On Thu

Re: 回复: 回复: 回复: IndexReader.open and CorruptIndexException

2013-01-25 Thread Ian Lea
-with-lucenes.html Mike McCandless http://blog.mikemccandless.com On Fri, Jan 25, 2013 at 7:58 AM, Cheng zhoucheng2...@gmail.com wrote: Any example code for this SearcherManager? On Fri, Jan 25, 2013 at 3:59 AM, Ian Lea ian@gmail.com wrote: There will be one file handle for every currently

Re: 回复: 回复: 回复: 回复: 回复: IndexReader.open and CorruptIndexException

2013-01-25 Thread Ian Lea
-with-lucenes.html Mike McCandless http://blog.mikemccandless.com On Fri, Jan 25, 2013 at 7:58 AM, Cheng zhoucheng2...@gmail.com wrote: Any example code for this SearcherManager? On Fri, Jan 25, 2013 at 3:59 AM, Ian Lea ian@gmail.com wrote: There will be one file handle for every currently

Re: Filtering top hits based on stored field? And Lucene 1.x - 3.x for Dummies

2013-01-25 Thread Ian Lea
On the specific question, calling doc() is still expensive. You could look at the FieldCache or the new DocValues stuff. See http://www.searchworkings.org/blog/-/blogs/introducing-lucene-index-doc-values for info on the latter. On the general question, much of your lucene knowledge will still be

Re: about isStored method

2013-01-25 Thread Ian Lea
If you're new to lucene why are you using such an old version? Stored means the value is stored in the index and can be retrieved later e.g. for displaying on a search results page. Not stored means it isn't and can't be. There was a similar question not long ago on this list - check the

Re: Filtering top hits based on stored field? And Lucene 1.x - 3.x for Dummies

2013-01-25 Thread Ian Lea
, Jan 25, 2013 at 9:20 PM, Andrew Gilmartin and...@andrewgilmartin.com wrote: Ian Lea wrote: Thank you for the quick and helpful reply. I had forgotten that Lucene's change document was one of best example of change documents around. I will read it. On the specific question, calling doc

Re: IndexReader.open and CorruptIndexException

2013-01-24 Thread Ian Lea
Well, raising the limits is one option but there may be better ones. There's an FAQ entry on this: http://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_an_IOException_that_says_.22Too_many_open_files.22.3F Take a look at org.apache.lucene.search.SearcherManager Utility class to safely

Re: IndexWriter: IndexWriter.MaxFieldLength.LIMITED setMaxFieldLength(MAX_FIELD_SCAN_LENGTH)

2013-01-24 Thread Ian Lea
See org.apache.lucene.analysis.miscellaneous.LimitTokenCountAnalyzer and org.apache.lucene.analysis.miscellaneous.LimitTokenCountFilter. Looks you can use the former with StandardAnalyzer as the delegate and whatever value you want for maxTokenCount. The 3.6,1 javadocs have

Re: 回复: IndexReader.open and CorruptIndexException

2013-01-24 Thread Ian Lea
There will be one file handle for every currently open file. Use SearcherManager and this problem should go away. -- Ian. On Thu, Jan 24, 2013 at 6:40 PM, zhoucheng2008 zhoucheng2...@gmail.com wrote: What file handlers did you guy refer to? I opened the index directory only. Is this the

Re: Are Search Index directories backward comptabile? ( when upgrading to latest lucene version)

2013-01-23 Thread Ian Lea
Lucene 4.x cannot read indexes created with 2.x. You can change and recompile your code to 4.x in one go. Since you can reindex, I'd make all the code changes and then recreate the indexes using 4.x. With a bit of testing along the way of course. -- Ian. On Wed, Jan 23, 2013 at 7:34 AM,

Re: Document term vectors in Lucene 4

2013-01-18 Thread Ian Lea
Stewart j...@lightboxtechnologies.com wrote: D'oh Thanks! Does TermsEnum.totalTermFreq() return the per-doc frequencies? It looks like it empirically, but the documentation refers to corpus usage, not document.field usage. Jon On Thu, Jan 17, 2013 at 10:00 AM, Ian Lea ian@gmail.com

Re: How to control the lucence index storage size?

2013-01-17 Thread Ian Lea
There's no way to set such a limit within lucene that I know of. If you really need this you could implement something outside lucene to monitor the index directory and do something (what???) when the limit was exceeded. Don't forget that disk usage will vary over time as segments are merged,

Re: Document term vectors in Lucene 4

2013-01-17 Thread Ian Lea
typo time. You need doc2.add(...) not 2 doc.add(...) statements. -- Ian. On Thu, Jan 17, 2013 at 2:49 PM, Jon Stewart j...@lightboxtechnologies.com wrote: On Thu, Jan 17, 2013 at 9:08 AM, Robert Muir rcm...@gmail.com wrote: Which statistics in particular (which methods)? I'd like to know

Re: The best way get highest frequency term from index

2013-01-15 Thread Ian Lea
java org.apache.lucene.misc.HighFreqTerms indexdir 1 field That's for 4.0, in lucene-misc-4.0.0.jar. It has been around for ages but may have had a different package name in earlier releases. I've no idea how it works and luckily don't need to. You can look at the source if you need to know.

Re: Query beginning with special characters

2013-01-14 Thread Ian Lea
with leading special characters. I hope the above infromation helps. 2013/1/11 Ian Lea ian@gmail.com QueryParser has a setAllowLeadingWildcard() method. Could that be relevant? What version of lucene? Can you post some simple examples of what does/doesn't work? Post the smallest possible

Re: Query beginning with special characters

2013-01-14 Thread Ian Lea
In fact I see you are ignoring all spaces between words. Maybe that's deliberate. Break it down into the smallest possible complete code sample that shows the problem and post that. -- Ian. On Mon, Jan 14, 2013 at 11:02 AM, Ian Lea ian@gmail.com wrote: It won't be IndexWriter

Re: Query beginning with special characters

2013-01-14 Thread Ian Lea
} while(Character.getNumericValue(termBuffer[0]) == -1); return true; } 2013/1/14 Ian Lea ian@gmail.com In fact I see you are ignoring all spaces between words. Maybe that's deliberate. Break it down into the smallest possible complete code sample that shows

Re: Lucene release source code

2013-01-14 Thread Ian Lea
lucene-4.0.0-src.tgz from http://www.apache.org/dyn/closer.cgi/lucene/java/4.0.0, linked from http://lucene.apache.org/, with a redirect or two along the way. -- Ian. On Mon, Jan 14, 2013 at 4:26 PM, Igor Shalyminov ishalymi...@yandex-team.ru wrote: Hello! I've checked out Lucene trunk from

Re: Query beginning with special characters

2013-01-11 Thread Ian Lea
QueryParser has a setAllowLeadingWildcard() method. Could that be relevant? What version of lucene? Can you post some simple examples of what does/doesn't work? Post the smallest possible, but complete, code that demonstrates the problem? With any question that mentions a custom version of

Re: Indexing your documents with Lucene!

2013-01-10 Thread Ian Lea
rm -rf works well for number 4. For the others use your favourite search engine with queries like lucene tutorial or lucene getting started. Or start with these: http://lucene.apache.org/core/quickstart.html http://www.lucenetutorial.com/lucene-in-5-minutes.html Good luck. -- Ian. On

Re: how much blocksize is set in lucene.

2013-01-10 Thread Ian Lea
want to know,when index lib saved in the disk array ,which stripe size will be set. when index saved in the file sytem, how much block size will be set? Sent from Huawei Mobile Ian Lea ian@gmail.com编写: What do you mean by lucene blocksize? What version of lucene are you using? A good

Re: how much blocksize is set in lucene.

2013-01-09 Thread Ian Lea
What do you mean by lucene blocksize? What version of lucene are you using? A good general principle is to start with the defaults and only worry if there is a problem. -- Ian. On Wed, Jan 9, 2013 at 8:51 AM, seacathello huj@gmail.com wrote: now i index very many email file, aboule 50m

Re: FuzzyQuery in lucene 4.0

2013-01-09 Thread Ian Lea
What adjustments did you make? One of them might be to blame. But at a glance the code looks fine to me. In what way is it not working? Care to provide any input/output/details of what does/doesn't work? -- Ian. On Wed, Jan 9, 2013 at 2:03 PM, algebra fabianoc...@gmail.com wrote: I was

Re: how to forcemerge a index library with many segmens to another dir?

2012-12-20 Thread Ian Lea
So you want a copy of the merged index on another disk? You could just copy it, before or after the merge, your choice. Or create the new index with an IndexWriter and call one of the addIndexes() methods. From the javadocs they sound to have different merge effects. Try it out and see what

Re: Lucene Indexing on NFS

2012-12-19 Thread Ian Lea
Use SimpleFSLockFactory. See the javadocs about locks being left behind on abnormal JVM termination. There was a thread on this list a while ago about some pros and cons of using lucene on NFS. 2-Oct-2012 in fact. http://mail-archives.apache.org/mod_mbox/lucene-java-user/201210.mbox/thread --

Re: How to update one field(not stored) of an document in lucene 4.0 ?

2012-12-18 Thread Ian Lea
Not possible. You have to replace the whole document. -- Ian. On Tue, Dec 18, 2012 at 9:14 AM, Bo Zhang bo.zhan...@gmail.com wrote: Hi all, I don't know that how to update one field which is not stored of an document in lucene 4.0. Can anybody tell me? Thanks! Cheers, --- Bob

Re: Help needed: search is returning no results

2012-12-18 Thread Ian Lea
I think you need TextField rather than StringField. See also http://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hits_.2BAC8_incorrect_hits.3F -- Ian. On Tue, Dec 18, 2012 at 2:14 PM, Ramon Casha rca...@gmail.com wrote: I have just downloaded and set up Lucene 4.0.0 to implement

Re: Lock Errors within JBoss Environment

2012-12-18 Thread Ian Lea
Is the index on NFS? There are words in the javadocs warning against using NativeFSLockFactory on NFS. -- Ian. On Tue, Dec 18, 2012 at 8:02 PM, Bowden Wise wi...@acm.org wrote: Hi Andrew: Thanks for the reply; I am glad to here our approach is also being used out there. In our case,

Re: Opposite of SpanFirstQuery - Searching for documents by last term in a field

2012-12-11 Thread Ian Lea
The javadoc for SpanFirstQuery says it is a special case of SpanPositionRangeQuery so maybe you can use the latter directly, although you might need to know the position of the last term which might be a problem. Alternatives might include reversing the terms and using SpanFirst or adding a

Re: Deciding how to use reader

2012-12-10 Thread Ian Lea
getting to know and tamper around with lucene itself. Lars-Erik -Original Message- From: Ian Lea [mailto:ian@gmail.com] Sent: 10. desember 2012 13:00 To: java-user@lucene.apache.org Subject: Re: Deciding how to use reader If the index is only updated once an hour I'd create a new

Re: Delete documents base on more than one condition?

2012-12-06 Thread Ian Lea
Or, easier, just pass the Query identifying the docs to IndexWriter.deleteDocuments(Query query). There are variants that take multiple queries and single or multiple terms. See the javadocs. You can't delete docs via IndexReader any more. -- Ian. On Thu, Dec 6, 2012 at 9:19 AM, parnab

Re: Beginning with Lucene

2012-12-05 Thread Ian Lea
Read Lucene in Action. The fundamental concepts and techniques haven't changed. You can keep a copy of the release notes and migration guides for later versions to hand. -- Ian. On Wed, Dec 5, 2012 at 12:55 AM, Mohammad Tariq donta...@gmail.com wrote: Sorry to be a pest of question guys.

Re: Lucene 4.0, Serialization

2012-12-04 Thread Ian Lea
It's in the release notes for 4.0. See https://issues.apache.org/jira/browse/LUCENE-2908 -- Ian. On Tue, Dec 4, 2012 at 9:33 AM, BIAGINI Nathan nathan.biag...@altanis.fr wrote: I need to send a class containing Lucene elements such as `Query` over the network using EJB and of course this

Re: Difference in behaviour between LowerCaseFilter and String.toLowerCase()

2012-12-04 Thread Ian Lea
Dawid said that's how it's supposed to work which to me = intended behaviour. -- Ian. On Tue, Dec 4, 2012 at 6:33 AM, Trejkaz trej...@trypticon.org wrote: On Tue, Dec 4, 2012 at 10:09 AM, Vitaly Funstein vfunst...@gmail.com wrote: If you don't need to support case-sensitive search in your

Re: Difference in behaviour between LowerCaseFilter and String.toLowerCase()

2012-11-30 Thread Ian Lea
Sounds like a side effect of possibly different, locale-dependent, results of using String.toLowerCase() and/or Character.toLowerCase(). http://docs.oracle.com/javase/6/docs/api/java/lang/String.html#toLowerCase() specifically mentions Turkish. A Google search for Character.toLowerCase() turkish

Re: sort by field and score

2012-11-30 Thread Ian Lea
= TopFieldCollector.create(sort, 1000, true, true, true, true); indexSearcher.search(query, topFieldCollector); TopDocs topDocs = topFieldCollector.topDocs(); but I got the same result with the previous code, need I custom the class TopFieldCollector? thank you lan 2012/11/27 Ian Lea

<    1   2   3   4   5   6   7   8   9   10   >