from:"Ian Lea"

Re: sort by field and score

2012-11-27 Thread Ian Lea

What are you getting for the scores? If it's NaN I think you'll need to use a TopFieldCollector. See for example http://www.gossamer-threads.com/lists/lucene/java-user/86309 -- Ian. On Tue, Nov 27, 2012 at 3:51 AM, Andy Yu ukour...@gmail.com wrote: Hi All, Now I want to sort by a field

Re: info on how lucene conducsts a search?

2012-11-27 Thread Ian Lea

http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/search/package-summary.html#package_description might help. Or Google something like how does lucene work. The question on cores might be better asked on the solr list, assuming you are talking about Solr cores. But I bet the answer

Re: info on how lucene conducsts a search?

2012-11-27 Thread Ian Lea

As you can tell from the title, Lucene In Action is more about using lucene than how it works internally, but yes, it is good and is worth buying. If you're worried about how up to date it is, keep a copy of the release notes and migration guides for later versions to hand. -- Ian. On Tue,

Re: what is the offsets and payload in DocsAndPositionsEnum for ??

2012-11-23 Thread Ian Lea

Well, according to the javadoc, PayloadTermQuery factors in the value of the payload located at each of the positions where the Term occurs. Have you read some of the info available from Google by searching for lucene payloads? -- Ian. On Fri, Nov 23, 2012 at 8:32 AM, wgggfiy

Re: Specialized Analyzer for names

2012-11-23 Thread Ian Lea

I'd use StandardAnalyzer, or ClassicAnalyzer. Also depends on how you want to search. You probably want a query for John Smith to match John Smith and Smith, John but maybe not John Brown and Sam Smith. The latter is a problem. You can partially work round it by using a BooleanQuery made up

Re: Get the Last Indexed Date

2012-11-23 Thread Ian Lea

You mean the time that a doc, any doc, was last added to an index? I'm not aware of a way to do that directly. You can store arbitrary data when you commit changes and get it back again somehow. See IndexCommit.getUserData(). Or look at the lastmod timestamps of the files on disk. -- Ian.

Re: Excessive mem usage with 32-bit app, on 64-bit server

2012-11-22 Thread Ian Lea

1. Does memory usage go up with multiple simultaneous searches - does it need to load the data structures multiple times? Lucene loads some stuff into RAM, but just once rather than for each search. But there will of course be memory used for each search, more concurrent searches will use

Re: Using Lucene 2.3 indices with Lucene 4.0

2012-11-20 Thread Ian Lea

everything still works as before. On Tue, Nov 20, 2012 at 12:20 PM, Ian Lea ian@gmail.com wrote: You can upgrade the indexes with org.apache.lucene.index.IndexUpgrader. You'll need to do it in steps, from 2.x to 3.x to 4.x, but should work fine as far as I know. -- Ian

Re: Lucene 3.6.0 high CPU usage

2012-11-09 Thread Ian Lea

Are you getting the same, improved or worse performance/throughput? Has the bottleneck switched from IO to CPU? -- Ian. On Thu, Nov 8, 2012 at 12:40 PM, kiwi clive kiwi_cl...@yahoo.com wrote: Having played with merge parameters and various index parameters, it seems possible to change the

Re: AlreadyClosedException when doing search

2012-11-09 Thread Ian Lea

By far the most likely cause is that something somewhere in your code is closing the searcher or the reader. -- Ian. On Thu, Nov 8, 2012 at 2:39 PM, Bin Lan b...@perimeterusa.com wrote: We recently upgrade our lucene library from 1.9.1 to 3.6.1 and we run into multiple AlreadyClosedException

Re: questions on PerFieldSimilarityWrapper

2012-11-09 Thread Ian Lea

Feels a bit of a hack, but you might be able to make it work by storing the field name when MyPerFieldxxx.get(name) is called and using that in MyPerFieldxxx.queryNorm() and coord() calls to do the right thing, either inline or via the relevant Similarity subclass, identified by the name. --

Re: case-insensitive index and queries

2012-11-07 Thread Ian Lea

From a glance the code looks OK, but there's lots you're not showing that could cause it not to work - whatever you mean by that. Fails to get hits on docs you think are in the index? Look at the index with Luke to see what actually has been indexed. Look at Query.toString() to see how the query

Re: Storing html files in lucene index and get back them

2012-10-25 Thread Ian Lea

A couple of weeks ago Rafał Kuć told you how to store fields, and Document.get(name) is very straightforward, What's the problem? http://lucene.472066.n3.nabble.com/Storing-html-files-in-lucene-index-and-get-back-them-td4012877.html -- Ian. On Thu, Oct 25, 2012 at 1:08 PM, rajputadesh

Re: How to use/create an alias to a field?

2012-10-25 Thread Ian Lea

Did you also find the response to that question? http://mail-archives.apache.org/mod_mbox/lucene-java-user/200801.mbox/%3c81162.81463...@web50303.mail.re2.yahoo.com%3E Hard to think of any other ways than those mentioned there. -- Ian. On Thu, Oct 25, 2012 at 2:26 PM, Willi Haase

Re: Is there some class to iterate on document's term positions in Lucene 4.0.0?

2012-10-25 Thread Ian Lea

From http://lucene.apache.org/core/4_0_0/MIGRATE.html TermPositions is renamed to DocsAndPositionsEnum, and no longer extends the docs only enumerator (DocsEnum). And the link is probably the answer to your second question. -- Ian. On Thu, Oct 25, 2012 at 2:50 PM, Ivan Vasilev

Re: StandardAnalyzer functionality change

2012-10-24 Thread Ian Lea

If you want email addresses, UAX29URLEmailAnalyzer is another alternative. -- Ian. On Wed, Oct 24, 2012 at 3:56 PM, Jack Krupansky j...@basetechnology.com wrote: Yes, by design. StandardAnalyzer implements simple word boundaries (the technical term is Unicode text segmentation), period. As

Re: SortField.STRING

2012-10-24 Thread Ian Lea

SortField.Type.STRING maybe? Can't help with the other question. It's generally best to send one question per message. Looking at the source code might help. -- Ian. On Wed, Oct 24, 2012 at 6:55 PM, Carlos de Luna Saenz cdelunasa...@yahoo.com.mx wrote: I am migrating code from Lucene 3 to

Re: Removing Indexed field data.

2012-10-22 Thread Ian Lea

As Aditya said, you'll need to recreate that document. http://wiki.apache.org/lucene-java/LuceneFAQ#How_do_I_update_a_document_or_a_set_of_documents_that_are_already_indexed.3F The fact that you only want to remove one value is irrelevant. -- Ian. On Mon, Oct 22, 2012 at 12:56 PM,

Re: Restrict Lucene search in concrete document ids

2012-10-22 Thread Ian Lea

Exactly what method in which class of which version of lucene are you trying to override? There is no Bits method. There is no indexReader.Documents method. I said this earlier in this thread: Presumably you're aware of the transient nature of lucene internal docids and the per-segment

Re: Lucene updateDocument deletes the document, but the counts keep increasing

2012-10-17 Thread Ian Lea

Yes, IndexWriter.updateDocument() deletes and then adds. See the javadocs. So your index will have deleted docs. Why do you care? They'll go away eventually as segments get merged. If you really do care, see IndexWriter,forceMergeDeletes(). See also the javadoc for that: This is often a

Re: Lucene updateDocument deletes the document, but the counts keep increasing

2012-10-17 Thread Ian Lea

these segments gets merged, i will have my document count going down right? On Wed, Oct 17, 2012 at 6:33 PM, Ian Lea ian@gmail.com wrote: Yes, IndexWriter.updateDocument() deletes and then adds. See the javadocs. So your index will have deleted docs. Why do you care? They'll go away eventually

Re: Restrict Lucene search in concrete document ids

2012-10-17 Thread Ian Lea

I would expect a filter to be quicker than adding thousands of clauses because Filters are just bit sets and operations are extremely fast. But never take performance predictions, particularly from me, on trust - test it in your app with your index on your hardware. To use a filter here I think

Re: Lucene index on NFS

2012-10-02 Thread Ian Lea

You'll certainly need to factor in the performance of NFS versus local disks. My experience is that smallish low activity indexes work just fine on NFS, but large high activity indexes are not so good, particularly if you have a lot of modifications to the index. You may want to install a custom

Re: Lucene index on NFS

2012-10-02 Thread Ian Lea

as possible! (rsync is way more your friend for transporting and replication à la solr should also be considered) paul Le 2 oct. 2012 à 11:10, Ian Lea a écrit : You'll certainly need to factor in the performance of NFS versus local disks. My experience is that smallish low activity indexes

Re: Index size doubles every time when I synchronize the RAM-based index with the FD-based index

2012-09-30 Thread Ian Lea

Are you loading it from disk, adding loads of docs then writing it back to disk? That would do it. How many docs in the memory index? How many on disk? What version of lucene? -- Ian. On Fri, Sep 28, 2012 at 1:56 AM, Cheng zhoucheng2...@gmail.com wrote: Hi, I have a ram based index which

Re: multireader.deleteDocuments transfer to IndexWriter.deleteDocuments ?

2012-09-25 Thread Ian Lea

So you've got a MultiReader over some number of indexes, and in order to delete stuff matched with that MultiReader you need an IndexWriter for the specific index that holds the selected term. In Lucene 4.0. Is that right? There are getSequentialSubReaders() and readerIndex(int docID) methods in

Re: Why does giving more JVM memory to lucene make queries run it faster?

2012-09-21 Thread Ian Lea

Most programs in all languages like plenty of memory. If you Google lucene memory usage you'll get hits on articles by Lucene developers and plenty more. Some bits may be more or less relevant to specific versions of lucene, As for the minimum memory I must give to Lucene for its optimal

Re: ah, problems with Filter

2012-09-13 Thread Ian Lea

The most likely explanation is simply that your filter doesn't match any docs that do match your query. See also http://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hits_.2BAC8_incorrect_hits.3F -- Ian. On Thu, Sep 13, 2012 at 8:19 AM, sdr...@sina.com wrote: Hi, problems with

Re: How to create a Lucene in-memory index at webapp deployment time

2012-09-07 Thread Ian Lea

You can do stuff with scopes and contexts and web.xml and whatever (google something like tomcat application scope). Or use some static classes or singletons to look after the single index. -- Ian. On Fri, Sep 7, 2012 at 6:10 AM, Kasun Perera kas...@opensource.lk wrote: I have a web java/jsp

Re: Negative query issue

2012-09-07 Thread Ian Lea

http://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hits_.2BAC8_incorrect_hits.3F -- Ian. On Wed, Sep 5, 2012 at 4:24 PM, Ramprakash Ramamoorthy youngestachie...@gmail.com wrote: Take a look at this query : -HOSTNAME:ram AND SEVERITY:information The above query isn't giving

Re: Packaging lucene

2012-09-07 Thread Ian Lea

tg2exe as in code.google.com/p/tg2exe/ Make TurboGears project to the Stand Alone Windows ...? Are you sure you're posting this question to the correct list? -- Ian. On Wed, Sep 5, 2012 at 3:56 PM, Antony Joseph antonyjosep...@gmail.com wrote: Hello, I have upgraded my lucene from 2.4.0 to

Re: DuplicateFilter filters not only duplicates

2012-08-30 Thread Ian Lea

https://issues.apache.org/jira/browse/LUCENE-2348 suggests there are long-standing and probably still current issues with DuplicateFilter and multiple segments. I'm not sure if this could explain what you are seeing. You could try calling optimize(1) on your index writer and see if that makes a

Re: Personalized ranking using pre-computed scores

2012-08-23 Thread Ian Lea

Using a FieldSelector is likely to speed up the doc.get() calls, but it is still liable to be slow. Can you use the lucene FieldCache? Some other memory cache? Payloads? -- Ian. On Wed, Aug 22, 2012 at 4:39 PM, Sebastian R. egnu...@web.de wrote: Dear all, I am currently trying to

Re: query

2012-08-20 Thread Ian Lea

org.apache.lucene.index.PKIndexSplitter in contrib-misc sounds promising. www.slideshare.net/abial/eurocon2010 Munching crunching - Lucene index post-processing sounds well worth a look too. Or just build new indexes from scratch routing docs to the correct index however you choose. -- Ian.

Re: TermRangeQuery with multiple words

2012-08-20 Thread Ian Lea

This won't work with TermRangeQuery because neither test 1 not test 3 are terms. test will be a term, output by the analyzer. You'll be able to see the indexed terms in Luke. Sounds very flaky anyway - you'd get term 10 xxx and term 100 xxx as well as term 1 and term 2. If your TEST values are

Re: TermRangeQuery with multiple words

2012-08-20 Thread Ian Lea

? Kind regards, Jochen 2012/8/20 Ian Lea ian@gmail.com This won't work with TermRangeQuery because neither test 1 not test 3 are terms. test will be a term, output by the analyzer. You'll be able to see the indexed terms in Luke. Sounds very flaky anyway - you'd get term 10 xxx and term

Re: query

2012-08-20 Thread Ian Lea

No. See the FAQ. http://wiki.apache.org/lucene-java/LuceneFAQ#How_do_I_update_a_document_or_a_set_of_documents_that_are_already_indexed.3F There are a couple of ideas floating around e.g. http://www.flax.co.uk/blog/2012/06/22/updating-individual-fields-in-lucene-with-a-redis-backed-codec/ or

Re: Find documents contained in search term

2012-08-17 Thread Ian Lea

Can't see how you could do it with standard queries, but you could reverse the process and use a MemoryIndex. Add the single target phrase to the memory index then loop round all docs executing a search for each one. Maybe use PrefixQuery although I'd worry about performance. Try it and see.

Re: IndexUpgrader

2012-08-16 Thread Ian Lea

Loads of stuff will have changed between those 2 versions - since you can, I'd just reindex. -- Ian. On Tue, Aug 14, 2012 at 10:59 PM, sunil Kumar Verma sunilkv.ve...@gmail.com wrote: We have recently moved to 3.6 from lucene 2.2 and have seen that the way tokens get indexed are not the

Re: LuceneIndex export to SQL-database

2012-08-16 Thread Ian Lea

Is this a lucene question or a mysql question or what? Since this is the lucene list let's assume you're asking about how to get multiple values for a field from an index. Document.getValues(keyword) looks promising: Returns an array of values of the field specified. -- Ian. On Wed, Aug 15,

Re: Does the string Cla$$War affect Lucene?

2012-08-14 Thread Ian Lea

Sounds extremely unlikely. What is the query? What analyzer? What version of lucene? What about other strings containing $$? -- Ian. On Tue, Aug 14, 2012 at 9:13 AM, zhoucheng2008 zhoucheng2...@gmail.com wrote: Hi, I have a big index, and when I searched it with a title string Cla$$War,

Re: How SynonymFilter works?

2012-08-13 Thread Ian Lea

); sb.add(new CharsRef(base2), new CharsRef(syn2), true); SynonymMap smap = sb.build(); Hope that helps. There may be an easier way. Have you tried looking at the source code/test cases? -- Ian. On Fri, Aug 10, 2012 at 6:24 PM, Ricardo r...@rand.org wrote: Ian Lea ian.lea at gmail.com writes

Re: Analyzer on query question

2012-08-03 Thread Ian Lea

You can add parsed queries to a BooleanQuery. Would that help in this case? SnowballAnalyzer sba = whatever(); QueryParser qp = new QueryParser(..., sba); Query q1 = qp.parse(some snowball string); Query q2 = qp.parse(some other snowball string); BooleanQuery bq = new BooleanQuery(); bq.add(q1,

Re: Analyzer on query question

2012-08-03 Thread Ian Lea

. thanks for the help, Bill -Original Message- From: Ian Lea [mailto:ian@gmail.com] Sent: Friday, August 03, 2012 9:32 AM To: java-user@lucene.apache.org Subject: Re: Analyzer on query question You can add parsed queries to a BooleanQuery. Would that help in this case

Re: Analyzer on query question

2012-08-03 Thread Ian Lea

it this way over the original method. I just don't know if the original way I described is wrong or will give me bad results. thanks for the help, Bill -Original Message- From: Ian Lea [mailto:ian@gmail.com] Sent: Friday, August 03, 2012 9:32 AM To: java-user@lucene.apache.org

Re: Small Vocabulary

2012-07-31 Thread Ian Lea

Lucene 4.0 allows you to use custom codecs and there may be one that would be better for this sort of data, or you could write one. In your tests is it the searching that is slow or are you reading lots of data for lots of docs? The latter is always likely to be slow. General performance advice

Re: how to put multiplue proximity search in lucene??

2012-07-25 Thread Ian Lea

If you are using QueryParser use fear dark~2 tight free~3. See also PhraseQuery.setSlop(n). You could also look at the Span queries e.g. SpanNearQuery. -- Ian. On Wed, Jul 25, 2012 at 6:13 AM, neerajshah84 neerajsha...@gmail.com wrote: how can i put multiplue proximity search in lucene??

Re: how can i search multiple words in line and paragraph?

2012-07-25 Thread Ian Lea

Look into spans and line, or sentence, delimiters and tokens, and position increment gaps. Google will help you. You can do a whole lot of stuff with spans - see http://www.lucidimagination.com/blog/2009/07/18/the-spanquery/ for a good intro. Lucene 2.9 is ancient. You should upgrade. -- Ian.

Re: Question on ElisionFilter with d'

2012-07-25 Thread Ian Lea

I bet it's expected. From http://en.wikipedia.org/wiki/Elision_(French) In written French, elision (both phonetic and orthographic) is obligatory for the following words: ... the preposition de ... Le père d'Albert vient d'arriver. So surely the removal of d' is correct. -- Ian. On

Re: Question on ElisionFilter with d'

2012-07-25 Thread Ian Lea

is that the filter don't remove d' (and c' too). Shall i open an issue on jira ? On 07/25/2012 04:36 PM, Ian Lea wrote: I bet it's expected. From http://en.wikipedia.org/wiki/Elision_(French) In written French, elision (both phonetic and orthographic) is obligatory for the following words

Re: QueryParser and BooleanQuery

2012-07-23 Thread Ian Lea

QueryParser returns a query. Just add that to the BooleanQuery. QueryParser qp = ...; BooleanQuery bq = new BooleanQuery(); Query parsedq = qp.parse(...); bq.add(parsedq, ...); -- Ian. On Mon, Jul 23, 2012 at 1:16 PM, Deepak Shakya just...@gmail.com wrote: Hey Jack, Can you let me know

Re: Usage of NoMergePolicy and its potential implications

2012-07-23 Thread Ian Lea

I can't answer your questions, but use of lucene's document ids as persistent ids is strongly discouraged, particularly in version 4.x where I think it just won't work at all. There was a related thread a couple of weeks ago. See Uwe's message at

Re: how to deal with multi subject problem?

2012-07-20 Thread Ian Lea

Just add the different subjects to the document e.g. Doc doc = new Document(); for (String subject : subjects) { Field f = new Field(subject, subject, ...); doc.add(f); } Or concatenate the subjects and store the one long string. If you don't want a search to potentially match terms from

Re: Multiple sort field

2012-07-18 Thread Ian Lea

Any thoughts on this? Patience ... Is it good to use multiple sort fields? Absolutely, if that's what you need. On the other hand, if you don't need it then it's a bad idea. Using sort on docid will consume any memory? Don't know. Certainly won't use less than not sorting this way. Is

Re: Lucene 2.x to 4.x upgrade possible?

2012-07-18 Thread Ian Lea

The release notice for 4.0-alpha sent to this list says file format backwards compatibility is provided for indexes from the 3.0 series so you won't be able to go straight from 2.x to 4.0. I'm sure that will remain true for all 4.x releases. The comments about waiting for a stable release of 4.0

Re: Lucene 2.x to 4.x upgrade possible?

2012-07-18 Thread Ian Lea

I'd forgotten about IndexUpgrader, but I'd still go for 3.6. I wouldn't want the complexity of shipping two versions of lucene and having to get customers to run an upgrade script. And probably wouldn't want to ship the first stable version of 4.0, even though lucene is very stable and reliable.

Re: how to implement a search engine like gmail?

2012-07-18 Thread Ian Lea

That is one option. See recent thread (yesterday?) about possible problems with that approach, and an alternative or two. I've no idea how Google do it. And I've no idea what you mean by problem with different subjects. -- Ian. On Wed, Jul 18, 2012 at 4:27 PM, 许超前 chaora...@gmail.com wrote:

Re: about some date store

2012-07-16 Thread Ian Lea

So content is a String variable in your program holding a multi-line value, is it? I'd double check exactly what that is holding before you store it in the index. -- Ian. On Mon, Jul 16, 2012 at 4:56 AM, sam hairen...@yahoo.com.cn wrote: I had done that,I used the docment.add(new

Re: many index reader problem

2012-07-16 Thread Ian Lea

OOV or OOM? Always best to post a full stack trace, and version of lucene, and OS. Anyway - give your app more memory? Close searchers after use or some period of inactivity? Best long term solution is probably to merge the many small indexes into one, or a few, larger indexes and restrict

Re: Lucene 3.5 Query Parser Question

2012-07-11 Thread Ian Lea

I think you'll have to build the query up in code. RegexQuery in the contrib queries package should be able to take care of #[0-9]. BooleanQuery bq = new BooleanQuery(); PrefixQuery pq = new PrefixQuery(...) // # RegexQuery rq = new RegexQuery(...) // #[0-9] bq.add(pq, ); bq.add(rq, ...);

Re: index.merge.scheduler exception - java.io.IOException: Input/output error

2012-07-10 Thread Ian Lea

data loss if it makes it more stable and performant. thanks On Mon, Jul 9, 2012 at 2:28 AM, Ian Lea ian@gmail.com wrote: Is this on a local or remote file system? Is the file system itself OK? Is something else messing with your lucene index at the same time? -- Ian

Re: about some seacher(I'm new hand, thank you for help)

2012-07-09 Thread Ian Lea

You don't know how to split the string containing the data you want to index?? String s = 2012-07-06 11:11:43some message; String timestamp = s.substring(0, 19); String content = s.substring(19).trim(); is one way. -- Ian. On Mon, Jul 9, 2012 at 3:55 AM, sam

Re: index.merge.scheduler exception - java.io.IOException: Input/output error

2012-07-09 Thread Ian Lea

Is this on a local or remote file system? Is the file system itself OK? Is something else messing with your lucene index at the same time? -- Ian. On Sun, Jul 8, 2012 at 8:58 PM, T Vinod Gupta tvi...@readypulse.com wrote: Hi, My log files are showing the below exceptions almost at twice a

Re: about some seacher(I'm new hand, thank you for help)

2012-07-06 Thread Ian Lea

Split the data into 2 fields, timestamp and content. Store one lucene document per line with the 2 fields, timestamp stored and not indexed (unless you want to search on it), content stored and analyzed. Use StandardAnalyzer unless you have special requirements. Then close the IndexWriter, open

Re: Starts with Query - Return like search

2012-07-04 Thread Ian Lea

Where exactly are you using these double quoted strings? QueryParser? It would help if you showed a code snippet. Assuming your real data is more complex and the strings you are searching for aren't necessarily at the start of the text, you'll need some mix of wildcard and proximity searching.

Re: Starts with Query - Return like search

2012-07-04 Thread Ian Lea

ComplexPhraseQueryParser which looks interesting. -- Ian. On Wed, Jul 4, 2012 at 9:51 AM, Ian Lea ian@gmail.com wrote: Where exactly are you using these double quoted strings? QueryParser? It would help if you showed a code snippet. Assuming your real data is more complex and the strings you

Re: QueryParser, double quotes and wilcard inside the double quotes

2012-07-03 Thread Ian Lea

You can use the QueryParser proximity feature e.g. foo test~n where n is the max distance you want them to be apart. Or look at the SpanQuery stuff e.g. SpanNearQuery. -- Ian. On Tue, Jul 3, 2012 at 4:59 PM, Jochen Hebbrecht jochenhebbre...@gmail.com wrote: Hi all, Imagine you have the

Re: Re: find meaningful words through Lucene

2012-06-27 Thread Ian Lea

All words are important if they help people find what they want. Maybe you want high frequency terms. See contrib class org.apache.lucene.misc.HighFreqTerms. -- Ian. On Wed, Jun 27, 2012 at 3:04 AM, 齐保元 qibaoy...@126.com wrote: meaningful just means the word is important than others,like

Re: Lucene Query About Sorting

2012-06-27 Thread Ian Lea

suppose document number..i have 2-3 GB index and every day , it goes higher. so i cant use searcher.maxdoc(). So i need this solution. Can you please help me out? On Tue, Jun 26, 2012 at 10:42 PM, Ian Lea ian@gmail.com wrote: Do you mean you want all hits that match B:abc, sorted by field

Re: Question

2012-06-27 Thread Ian Lea

Add imageid as a stored field, no need to index it unless you want to be able to search by it. Add the tags as an analyzed indexed field. no need to store unless you want to read/display the values. StandardAnalyzer will work fine. Then use QueryParser to build a query like tags: car, execute

Re: Lucene Query About Sorting

2012-06-26 Thread Ian Lea

Do you mean you want all hits that match B:abc, sorted by field A? As opposed to the top 100 hits sorted by field A? Just pass a higher value in the search(query, ... 100, ...) call. It will be slower and potentially use more memory but with only 10K docs you probably won't notice. -- Ian.

Re: find meaningful words through Lucene

2012-06-26 Thread Ian Lea

Please define meaningful. -- Ian. On Tue, Jun 26, 2012 at 10:39 AM, 齐保元 qibaoy...@126.com wrote: hi, does anyone knows how to extract meaningful words from Lucene index? - To unsubscribe, e-mail:

Re: Problem querying Lucene after escaping

2012-06-25 Thread Ian Lea

It's probably an issue with analysis and colons and hyphens and dots, maybe lower/upper case as well. Are you using an analyzer? Which? If not, which might be consistent with your usage of TermQuery, how are you storing the multiple values for alt_id? See also the FAQ entry Why am I getting no

Re: how to remove the dash

2012-06-25 Thread Ian Lea

I'm positive that StandardAnalyzer won't change drinks - water to drinks -water. So it must be something in your code. Which you don't show us. Best guess is that the changes you've made to the Flex file have caused the problem. If you created your tokenizer by copying and modifying

Re: how to remove the dash

2012-06-25 Thread Ian Lea

(Query: + query.toString(contents)); TopDocs results = searcher.search(query, 10); Thanks xpete A Segunda, 25 de Junho de 2012 14:37:37 Ian Lea escreveu: I'm positive that StandardAnalyzer won't change drinks - water to drinks -water. So it must be something in your code. Which you don't

Re: Problem querying Lucene after escaping

2012-06-25 Thread Ian Lea

The key thing is to be consistent. You can either replace your TermQuery code with the output from QueryParser.parse, with QP created with StandardAnalyzer, or index alt_id as Index.NOT_ANALYZED and stick with TermQuery. I think the latter will work even with multiple terms/tokens stored for

Re: my rangequery problem

2012-05-30 Thread Ian Lea

Do you mean NumericRangeQuery or a textual range query that happens to be searching on numbers? What exactly is wrong? The rewrite method (are you calling this yourself? why?) does indeed mess around with queries and some may end up wrapped with ConstantScoreQuery. I can't remember what happens

Re: How SynonymFilter works?

2012-05-30 Thread Ian Lea

Did you get an answer to this? Looking at the lucene test cases can be a good way of finding out things like this. Reading Lucene In Action is also highly recommended. May not have the exact answer to this question but will teach you how to find out. -- Ian. On Mon, May 28, 2012 at 7:35 AM,

Re: Lucene Exact Match On Analyzed Field

2012-05-28 Thread Ian Lea

KeywordAnalyzer is the normal thing to use if you want exact matches. -- Ian. On Sat, May 26, 2012 at 11:37 AM, Yogesh patel yogeshpateldai...@gmail.com wrote: Hi I would like to search on any analyzed field of lucene index with Exact Match. Is it possible to search with exact match

Re: CPU usage increased using 3.4.0

2012-05-24 Thread Ian Lea

It's hard to believe that an upgrade from 3.0.3 to 3.4.0 would make that much difference to CPU usage. Are you sure nothing else has changed? Has the crawling/indexing elapsed time gone up in the same proportion? Have you verified that the increased usage is actually in lucene rather than

Re: Lucene Grouping problem

2012-05-24 Thread Ian Lea

I've never come across this GroupingCollector stuff before so know nothing about it apart from looking at the javadocs and may be talking nonsense, but here goes anyway. group by time span/web site: it appears that it will group by single values, not ranges, So should work fine by website. Just

Re: lucene (search) performance tuning

2012-05-22 Thread Ian Lea

Lots of good tips in http://wiki.apache.org/lucene-java/ImproveSearchingSpeed, linked from the FAQ. -- Ian. On Tue, May 22, 2012 at 2:08 AM, Li Li fancye...@gmail.com wrote: something wrong when writing in my android client. if RAMDirectory do not help, i think the bottleneck is cpu. you may

Re: sort question

2012-05-22 Thread Ian Lea

important than sorting. if don't sort, how can we implement this request? I'm stuck here. and the discount has been convert to number already, thanks for your information. Thanks, CQ 2012/5/21 Ian Lea ian@gmail.com I'm not clear what you are asking. Are you saying that you want keyword

Re: Per User filtering of public/common documents

2012-05-21 Thread Ian Lea

Certainly lots of questions, and I can't answer most of them, but a couple of comments/opinions. Collecting all docs will potentially use a lot of memory but isn't necessarily excessively slow. It's generally only doing something like reading field values for all docs that can be prohibitively

Re: sort question

2012-05-21 Thread Ian Lea

I'm not clear what you are asking. Are you saying that you want keyword matching to be more important than sorting? If that's the case, don't sort. Or are you saying that sorting of null values isn't doing what you want? Use an actual value instead of null, whatever makes sense in your

Re: old fashioned.....Too many open files!

2012-05-18 Thread Ian Lea

You may need to cut it down to something simpler, but I can't see any reader.close() calls. -- Ian. On Fri, May 18, 2012 at 5:47 PM, Michel Blase mblas...@gmail.com wrote: This is the code in charge of managing the Lucene index. Thanks for your help! package luz.aurora.lucene; import

Re: Optional Terms

2012-05-17 Thread Ian Lea

Document doc3 = new Document(); doc2.add(new Field(searchText, LMN Takeaway, Field.Store.YES, doc2 != doc3. Boosting by number of occurrences tends to happen automatically. See IndexSearcher.explain() as I think someone already suggested. See also javadocs for

Re: Wildcards in field name

2012-05-15 Thread Ian Lea

No and no. MultiFieldQueryParser is the only thing that comes to mind as being remotely close but you have to tell it the field names. I guess you could use IndexReader.getFieldNames(...) to find indexed fields and pass the output from that through a wildcard regexp and feed the output from that

Re: Memory question

2012-05-15 Thread Ian Lea

In versions from 3.3 onwards MMapDirectory is the default on 64-bit linux. Not sure exactly what that means wrt your questions, but may well be relevant. -- Ian. On Tue, May 15, 2012 at 3:51 PM, Lutz Fechner lfech...@hubwoo.com wrote: Hi, By design memory outside the JVM heap space should

Re: how to convert French letters to English?

2012-05-11 Thread Ian Lea

I don't think there is an out of the box analyzer to do this but you can easily build your own, incorporating org.apache.lucene.analysis.ASCIIFoldingFilter into the chain. -- Ian. On Fri, May 11, 2012 at 11:01 AM, Li Li fancye...@gmail.com wrote: I have some french hotels such as Elysée

Re: Getting the frequencies by corresponding order of documents were indexed

2012-05-11 Thread Ian Lea

Can't spot anything obviously wrong in your code and what you are trying to do should work. Are you positive that what you think is the second doc is really being added second? You only show one doc being added. Are there already 7 docs in the index before you start? -- Ian. On Fri, May 11,

Re: Getting the frequencies by corresponding order of documents were indexed

2012-05-11 Thread Ian Lea

, Ian Lea ian@gmail.com wrote: Can't spot anything obviously wrong in your code and what you are trying to do should work. Are you positive that what you think is the second doc is really being added second? You only show one doc being added. Are there already 7 docs in the index before

Re: update/re-add an existing document with numeric fields

2012-05-10 Thread Ian Lea

You can't selectively update fields in docs read from an index, in old or current versions of lucene. I think there are some ideas floating around but nothing usable today as far as I know. You'll need to rebuild the whole doc before passing it to writer.updateDocument(). -- Ian. On Wed, May

Re: use index, big or small?

2012-05-10 Thread Ian Lea

Impossible to say - how big is big? How fast is fast? I'd start with the simplest option and if it's fast enough, stop. -- Ian. On Sat, May 5, 2012 at 12:47 AM, Yang tedd...@gmail.com wrote: I have an index containing all students, now I want to do an index search inside an Apache

Re: Similarity coefficient for more exact matching

2012-05-10 Thread Ian Lea

Similarity.setDefault(new MySimilarity()) is certainly better than the 2 calls I recommended. Thanks. I find it hard to see why one might not want to do this in normal usage but have a vague recollection of someone once outlining some obscure scenarios where different similarities at index and

Re: update/re-add an existing document with numeric fields

2012-05-10 Thread Ian Lea

to rebuild my doc from whole cloth and I'm reasonably sure it is working me :-) Thanks! -Original Message- From: Ian Lea [mailto:ian@gmail.com] Sent: Thursday, May 10, 2012 1:20 AM To: java-user@lucene.apache.org Subject: Re: update/re-add an existing document with numeric fields You

Re: Similarity coefficient for more exact matching

2012-04-27 Thread Ian Lea

You can override org.apache.lucene.search.Similarity/DefaultSimilarity to tweak quite a lot of stuff. computeNorm() may be the method you are interested in. Called at indexing time so be sure to use the same implementation at index and query time, using IndexWriterConfig.setSimilarity() and

Re: PhoneticFilterFactory 's inject parameter

2012-04-26 Thread Ian Lea

that all queries and terms don't contain white spaces. Thanks again. -Elmer On 04/25/2012 02:53 PM, Ian Lea wrote: You seem to be quietly going round in circles, by yourself! I suggest a small self-contained program/test case with a RAM index created from scratch. You can

Re: two fields, the first important than the second

2012-04-26 Thread Ian Lea

If you really mean must and always, you'll probably have to execute 2 searches. First on title alone then on description, or title and description, merging the hit lists as appropriate. -- Ian. On Thu, Apr 26, 2012 at 8:30 PM, Akos Tajti akos.ta...@gmail.com wrote: Jake, we're already

Re: PhoneticFilterFactory 's inject parameter

2012-04-25 Thread Ian Lea

You seem to be quietly going round in circles, by yourself! I suggest a small self-contained program/test case with a RAM index created from scratch. You can then experiment with inject on or off and if you still can't figure it out, post the code and hopefully someone will be able to help you

< 1 2 3 4 5 6 7 8 9 10 >

201 - 300 of 911 matches

Mail list logo