Re: Using Payloads

2009-04-27 Thread Murat Yakici
See http://wiki.apache.org/lucene-java/ImproveIndexingSpeed and http://wiki.apache.org/lucene-java/ImproveSearchingSpeed and http://wiki.apache.org/lucene-java/ConcurrentAccessToIndex Murat Yakici Department of Computer Information Sciences University of Strathclyde Glasgow, UK

Re: Using Payloads

2009-04-27 Thread liat oren
Yes, I agree with you - I also tried this approach in the past and it was terribely slow - looping on the term vectors. What I have done - is dividing indexes into steps - which of course, if can be avoided, it will be more than great!! As for my problem - it was a code problems, I sloved it,

Re: no segments* file found: files: Error on opening index

2009-04-27 Thread Paul Taylor
Paul Taylor wrote: Hi I was using a RAMDirectory and this was working fine but have now moved over to a filesystem directory to preserve space, the directory is just initialized once directory = new RAMDirectory(); directory =

Re: Using Payloads

2009-04-27 Thread liat oren
Thanks a lot! I will deffenitely will go thoroughly over these 2009/4/27 Murat Yakici murat.yak...@cis.strath.ac.uk See http://wiki.apache.org/lucene-java/ImproveIndexingSpeed and http://wiki.apache.org/lucene-java/ImproveSearchingSpeed and

Why Lucene phrase searching fail?

2009-04-27 Thread blazingwolf7
hi, I am trying to perform a search using Lucene. The keyword : national india This phrase exists inside the content. I try searching it using Lucene and it fail to return any results. Then I try to search it using Luke, with the quotes and it also fail to return results. Why is that happening?

Re: Why Lucene phrase searching fail?

2009-04-27 Thread Ian Lea
What does query.toString() say? Are you using standard analyzers with standard lowercasing, stop words etc? Knocking up a very simple program/index that demonstrates the problem usually helps: either it will work and help you spot the problem with your existing code, or if you post it here

Re: no segments* file found: files: Error on opening index

2009-04-27 Thread Michael McCandless
It's fine if one thread is changing the index (w/ IndexWriter) while other threads are opening IndexReaders. Though, if you have IndexWriter opening with create=true at the same time that an IndexReader is attempting to open the index, that would explain this. Is it possible that's happening?

Re: no segments* file found: files: Error on opening index

2009-04-27 Thread Paul Taylor
Michael McCandless wrote: It's fine if one thread is changing the index (w/ IndexWriter) while other threads are opening IndexReaders. Though, if you have IndexWriter opening with create=true at the same time that an IndexReader is attempting to open the index, that would explain this. Is it

Re: Why Lucene phrase searching fail?

2009-04-27 Thread Koji Sekiguchi
Another possible factor, if you are using omitTf feature, it causes phrase query doesn't work. Koji Ian Lea wrote: What does query.toString() say? Are you using standard analyzers with standard lowercasing, stop words etc? Knocking up a very simple program/index that demonstrates the

Re: How to search special characters in LUcene

2009-04-27 Thread uday kumar maddigatla
hi thanks for your reply. Please suggest me what to do now. i want to index the document which contains multiple languages. I really waiting for this to complete with your help. Please,please help me Erick Erickson wrote: I'm puzzled why you say By the above out put we can say that

Re: lsi as indexing algorithm with lucene

2009-04-27 Thread adasal
Hello all, following the link to SemanticVectors - related research there is this link:- Magnus Sahlgren. An introduction to random indexing.http://www.sics.se/%7Emange/papers/RI_intro.pdf I would like to point out that Magnus Sahlgren has completed a PHd in this area which is both very readable

no results for query with special characters

2009-04-27 Thread Laura Hollink
Hi, I have built an index using the standardAnalyser, and am now querying that index, also with standardAnalyser. However, I get results I don't understand. I know there are a few documents about Brazil in my corpus. My corpus is in Dutch, and the Dutch term used is Braziliƫ. I query

Re: Getting values with low scores

2009-04-27 Thread Erick Erickson
Well, you can always implement your own HitCollector and just take the end of the list. But perhaps a fuller explanation of why you need to do this would lead to a better answer Best Erick On Sun, Apr 26, 2009 at 11:41 PM, samd sdoyl...@yahoo.com wrote: I have 2500 documents and need to

Re: no results for query with special characters

2009-04-27 Thread Ian Lea
Hi The problem may well lie with the reading of the queries from disk into your program. Using an InputStreamReader with correct encoding (UTF-8?) should solve it. -- Ian. On Mon, Apr 27, 2009 at 12:32 PM, Laura Hollink lau...@cs.vu.nl wrote: Hi, I have built an index using the

Re: Getting values with low scores

2009-04-27 Thread samd
Because no matter how low the rank they should still be available in search results for our case. For example, if you want to search based on an event name. It shouldn't matter if an event name is of low rank or not, you still want to find a match. Erick Erickson wrote: Well, you can always

Re: Getting values with low scores

2009-04-27 Thread samd
One more thing, it's not just about ranking pieces, it's about all no matter what the rank should be available. Erick Erickson wrote: Well, you can always implement your own HitCollector and just take the end of the list. But perhaps a fuller explanation of why you need to do this would

RE: kamikaze

2009-04-27 Thread Michael Mastroianni
Thanks. I have a couple quick questions. 1. when I call DocSetFactory.getDocSetInstance(0, -1, -1, DocSetFactory.FOCUS.SPACE); with very large numbers instead of -1, I get OpenBitSets back, which I found surprising (I get p4 sets back when I pass in the -1 values). Also, is there any

Re: Getting values with low scores

2009-04-27 Thread Erick Erickson
I'm really having a hard time understanding what your requirements are. To have all the results available for event, just search on that field (assuming you have an event field for each of your docs). Use a HitCollector to get all of the rather than the (deprecated) Hits object, I'd suggest

Searcher#setSimilarity clarifications

2009-04-27 Thread Rakesh Sinha
I am looking into setting custom scoring sing Similarity (org.apache.lucene.search.Similarity) . Searcher has a method to set similarity as follows - searcher.setSimilarity(Similarity) and retrieve the same too. I am looking at a case where I can have just one Searcher instance and use different

Re: Getting values with low scores

2009-04-27 Thread samd
As it turns out it was more an issue with the tokenization and how fields were being stored which contain numeric values. I need to be sure that fields are tokenized on numeric values as well. Thanks Erick Erickson wrote: I'm really having a hard time understanding what your requirements

Yet another NFS Question...

2009-04-27 Thread David Seltzer
Hi everyone, There has been a lot of discussion regarding Lucene+NFS pitfalls. I'm not sure how to proceed with a more distributed operation. I'm trying to take the indexing load off of our search server. I can do this either by building a new server which hosts the Indexer and the Index, or a

Re: Yet another NFS Question...

2009-04-27 Thread Michael McCandless
In theory either solution will work, from Lucene's standpoint. But this is not well-explored territory: you are a pioneer! Please report back on your results :) Usually (and this may be different in your app), search performance trumps indexing performance, so my guess is you'd want the index

RE: Yet another NFS Question...

2009-04-27 Thread Sudarsan, Sithu D.
What is the best way to handle this sort of situation? My inclination is build a new Search Server (with fast HDDs and lots of Memory for tomcat) and leave the indexer on the old server connected via NFS. - Our current development is on similar lines. Almost no deletes, but only lots of

RE: Wordnet indexing error

2009-04-27 Thread Sudarsan, Sithu D.
Thanks Otis! I've got java APIs that could be used. Sincerely, Sithu D Sudarsan Off: 301-796-2587 sithu.sudar...@fda.hhs.gov sdsudar...@ualr.edu -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Friday, April 24, 2009 12:09 PM To:

RE: kamikaze

2009-04-27 Thread Michael Mastroianni
Hi-- I just got kamikaze somewhat integrated into a project of mine. I'm having problems growing the DocIdSets, though. Up to the point where the first regrow happens, everything is fine. Once the regrow happens, I get an ArrayOutOfBoundsException. The following unit test will exhibit this

ArrayIndexOutOfBoundsException from TermInfosReader.get (2.3.2)

2009-04-27 Thread Daniel Noll
Hi all. One of our users has seen an error like this: java.lang.ArrayIndexOutOfBoundsException: -1030685 at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:210) at org.apache.lucene.index.SegmentTermDocs.seek(SegmentTermDocs.java:54) at

Re: Why Lucene phrase searching fail?

2009-04-27 Thread blazingwolf7
When i print out the query, it will be like this: (url:terror india^2.0 anchor:terror india^0.0 content:terror india title:terror india^1.5 host:terror india^2.0 site:terror india^10.0) I dont understand at all, only phrase query got problem, even my sloop has no problem at all. I have exact