RE: Regex for legal user search input

2005-07-27 Thread Alex Kiselevski
Erik, Just to make it clear for me. The last Lucene version supports interior +/-. So, how looks the query that deals with "BOOK NAME" as keyword fields And I want to find C++ tutorial ? Thanks in advanvce Alex Kiselevski Development Expert, Amdocs Advanced Technologies +9.729.776.4346 (desk)

Re: Lucene vs Derby (vs MySQL) for spatial indexing

2005-07-27 Thread Otis Gospodnetic
Barry, You may also want to consider PostgreSQL for a few reasons: 1) it's historically known to work well for geo-spatial data, 2) has GIS/geo-spatial data types and such, and 3) it seems that the new versions let you embed Java directly into the database (perhaps something like Oracle's Java-emb

Lucene vs Derby (vs MySQL) for spatial indexing

2005-07-27 Thread Barry Carter
Does Lucene optimize range queries that use Sort and/or limit the number of hits? My situation: I have a listing of 2 million cities, with the name, latitude, longitude, and population of each city. I want to efficiently find the 50 most populous cities between (for example) latitudes 35.2 and 41

Re: Searching a URL with a PrefixQuery / Too Many Clauses (again...)

2005-07-27 Thread Scott Ganyo
Chris, How about indexing the domain as one field and each part of the path as separate terms in another field? I'm sure you've probably already thought of doing this... and maybe discarded the idea because you'd lose the position information. However, even though you can't just simply

Re: Regex for legal user search input

2005-07-27 Thread Erik Hatcher
What version of Lucene are you using? There was a change that helped with that situation such that interior +/- was not considered an operator. That changed is in the 1.4 versions - might you be running a previous version of Lucene? Erik On Jul 27, 2005, at 6:42 PM, Derek Westfa

Re: Searching a URL with a PrefixQuery / Too Many Clauses (again...)

2005-07-27 Thread Erik Hatcher
On Jul 27, 2005, at 4:56 PM, Chris May wrote: Always domain + part of a path e.g. url:http://blogs.warwick.ac.uk/chrismay/* or url:http://www2.warwick.ac.uk/fac/soc/law/ug/prospective/degrees/ modules/commonlaw/* or url:http://www2.warwick.ac.uk/services/its/* ... and so on. Part of th

Re: Query text Tokenize issue

2005-07-27 Thread Otis Gospodnetic
Hi, I believe your problem is described on page 121 in the Lucene book: http://www.lucenebook.com/search?query=%22dealing+with+keyword+fields%22 The solution for you may be to write your own Analyzer that knows how to correctly tokenize or not tokenize certain fields in your index. Using PerFiel

Re: URL Stemmer

2005-07-27 Thread Otis Gospodnetic
Hm, not sure why you're emailing [EMAIL PROTECTED] [EMAIL PROTECTED] may be better. Here are 2 ancient classes from 2003 that I once used to normalize URLs, to help me identify URL duplicates. This may get stripped on its way to the list. Otis --- Chris Fraschetti <[EMAIL PROTECTED]> wrote:

Query text Tokenize issue

2005-07-27 Thread Indu Abeyaratna
I have a field index as keyword. And have two records "J400-C-V1-S10-T1" and "J400-C-V-S10-T1" When I search for "J400-C-V1-S10-T1", it returns me matching record, but when I Search for "J400-C-V-S10-T1" it doesn't return the matching one. Further I found that "J400-C-V-S10-T1" is incorrectly t

URL Stemmer

2005-07-27 Thread Chris Fraschetti
Writing simple code to trim down a URL is trivial, but to actually trim it down to its most meaningful state is very hard. In same cases the URL parameters actually define the page in others they are useless babble. I'd like to use the hash of a page's URL as well as a hash of the content data to h

RE: Hardware Question

2005-07-27 Thread Otis Gospodnetic
Ah - my brain was off. :) In the Lucene book we refer to that index format as "compound index format", while the original format we call "multifile index format" http://www.lucenebook.com/search?query=compound+index http://www.lucenebook.com/search?query=multifile+index Yes, the latter will g

Regex for legal user search input

2005-07-27 Thread Derek Westfall
Is there a way to allow users to use + and - and special operators in free-text searches, but also allow them to search for a last name like Smith-Jones? (which I'd have to escape?) Is there a regular expression to determine/fix this kind of user input so it is queryparser-legal? Ie they can't ju

Re: Derby + Lucene

2005-07-27 Thread markharw00d
Thanks for the reminder, Otis. I haven't done any more on this since this post: http://archives.devshed.com/a/ml/200501-114586/lucene-query-sql-kind The scalability concerns with the user-defined-functions I created prevented me from taking it any further. A proper solution would need a tight

RE: Hardware Question

2005-07-27 Thread Mark Bennett
My apologies Otis, I should have spelled that out. I'm going to take a stab at answering this. But please, others on the list, chime in with corrections / clarifications. CFS = "compact file system" or "consolidate file system" or something like that. Essentially, each Lucene index segment is a

Re: Hardware Question

2005-07-27 Thread Otis Gospodnetic
Option 1) will most likely give you more, but there are a number of other things you could do before going for monster hardware. Splitting the index, more than 1 disk, ParallelIndexReader, the patch that splits index files into a number of data files, etc. Otis --- Michael Celona <[EMAIL PROTEC

RE: Hardware Question

2005-07-27 Thread Otis Gospodnetic
What's CFS? Cryptographic File System? I'm not being sarcastic here, I'm really curious about what you referring to. Otis --- Mark Bennett <[EMAIL PROTECTED]> wrote: > Also, non-hardware, have you considered turning off CFS? > > Our client told us this sped up their system. > > -Original

Re: Derby + Lucene

2005-07-27 Thread Otis Gospodnetic
--- Mag Gam <[EMAIL PROTECTED]> wrote: > Anyone here have any luck with integration of Apache Derby and > Lucene? I believe Mark Harwood has done some experiments with Lucene and Derby... here: http://www.google.com/search?q=Lucene+derby+harwood Otis

Re: another problem with Multisearcher

2005-07-27 Thread Otis Gospodnetic
Some changes were made to MultiSearcher version that is in the SVN repository. Which version of Lucene are you using, and can you provide an index and a query that cause this exception? Otis --- Daniel Cortes <[EMAIL PROTECTED]> wrote: > I don't know why, but all this problems that I shared wi

Re: Searching a URL with a PrefixQuery / Too Many Clauses (again...)

2005-07-27 Thread Chris May
Always domain + part of a path e.g. url:http://blogs.warwick.ac.uk/chrismay/* or url:http://www2.warwick.ac.uk/fac/soc/law/ug/prospective/degrees/ modules/commonlaw/* or url:http://www2.warwick.ac.uk/services/its/* ... and so on. Part of the problem is that we may need to go an arbitrar

RE: Hardware Question

2005-07-27 Thread Michael Celona
I am retrieving the documents using "hits.doc(i)". I put in some timing output. Here are the results: Before Search 1122497423976 After Search 1122497426795 After Build 1122497426839 (after I retrieve 10 results from hits ) What is CFS? Thanks, Michael -Original Message- From: M

Re: searchable mailing list archive

2005-07-27 Thread Alex Krohn
Hi, > > I've added the Lucene mailing lists to our searchable archive found > > here: > > > > http://www.gossamer-threads.com/lists/lucene/ > > > > The search is, of course, powered by Lucene. =) I hope you find it > > useful, and thanks for the great work! If you have any questions or >

Re: hit count within categories

2005-07-27 Thread markharw00d
I posted the code I use to do this (based on a single index) here: http://marc.theaimsgroup.com/?l=lucene-dev&m=111044178212335&w=2 Cheers Mark ___ Yahoo! Messenger - NEW crystal clear PC to PC calling

Re: Quick newbie question

2005-07-27 Thread Erik Hatcher
On Jul 27, 2005, at 4:32 PM, Scott Ganyo wrote: Actually, I believe the correct answer is an empty result set. Oops I really screwed that one up! :O You're absolutely right, my apologies. Erik On Jul 27, 2005, at 12:14 PM, Erik Hatcher wrote: On Jul 27, 2005, at 12:40 PM, Pete

Re: Searching a URL with a PrefixQuery / Too Many Clauses (again...)

2005-07-27 Thread Erik Hatcher
Could you give some examples of the types of PrefixQuery's you'd like to use? Is it always at a granularity of domain and path? Or are you wanting to do a prefix pieces of the domain and path? Erik On Jul 27, 2005, at 3:47 PM, Chris May wrote: First, apologies for what seems to be s

Re: Quick newbie question

2005-07-27 Thread Scott Ganyo
Actually, I believe the correct answer is an empty result set. On Jul 27, 2005, at 12:14 PM, Erik Hatcher wrote: On Jul 27, 2005, at 12:40 PM, Peter Gelderbloem wrote: I wonder what would happen An exception :) Peter Gelderbloem -Original Message- From: Erik Hatcher [mailto:[EM

hit count within categories

2005-07-27 Thread Tim Johnson
I'm working on a problem where I need to search over 160 million documents. I know Lucene can do this no sweat; my problem is that these documents are grouped in more then 500 categories. I need to get a count of documents that match a given query, within each category. There is no need for scori

Searching a URL with a PrefixQuery / Too Many Clauses (again...)

2005-07-27 Thread Chris May
First, apologies for what seems to be something of an FAQ. However, I've not been able to find an answer either in LIA or in the relevant section of the FAQ (http://wiki.apache.org/jakarta-lucene/ LuceneFAQ#head-06fafb5d19e786a50fb3dfb8821a6af9f37aa831) My setup is as follows: I have an inde

RE: Hardware Question

2005-07-27 Thread Mark Bennett
Also, non-hardware, have you considered turning off CFS? Our client told us this sped up their system. -Original Message- From: Chris Lamprecht [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 27, 2005 11:52 AM To: java-user@lucene.apache.org Subject: Re: Hardware Question It depends on

Re: Hardware Question

2005-07-27 Thread Chris Lamprecht
It depends on your usage. When you search, does your code also retrieve the docs (using Searcher.document(n), for instance). If your index is 8GB, part of that is the "indexed" part (searchable), and part is just "stored" document fields. It may be as simple as adding more RAM (try 4, 6, and 8G

another problem with Multisearcher

2005-07-27 Thread Daniel Cortes
I don't know why, but all this problems that I shared with you are produced for my use of multisearcher Now I've obtained this, anyone had this problem sometimes? java.lang.IndexOutOfBoundsException: Index: 43, Size: 12 at java.util.ArrayList.RangeCheck(ArrayList.java:507) at java

Hardware Question

2005-07-27 Thread Michael Celona
I am going over ways to increase overall search performance. Currently, I have a dual zeon with 2G of ram dedicated to java searching an 8G index on one 7200 rpm drive. Which will give the greatest payoff? 1) Going to 64bit server and giving more memory to java with faster drive

Re: Quick newbie question

2005-07-27 Thread Erik Hatcher
On Jul 27, 2005, at 12:40 PM, Peter Gelderbloem wrote: I wonder what would happen An exception :) Peter Gelderbloem -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: 27 July 2005 17:36 To: java-user@lucene.apache.org Subject: Re: Quick newbie question On Jul 27,

RE: Quick newbie question

2005-07-27 Thread Peter Gelderbloem
I wonder what would happen Peter Gelderbloem -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: 27 July 2005 17:36 To: java-user@lucene.apache.org Subject: Re: Quick newbie question On Jul 27, 2005, at 12:22 PM, Andrew Boyd wrote: > Of course you can do the inverse o

Re: Quick newbie question

2005-07-27 Thread Erik Hatcher
On Jul 27, 2005, at 12:22 PM, Andrew Boyd wrote: Of course you can do the inverse of what Erik said. That is search for a term that you know is not in the index and use the NOT operator. Ummm... no you can't. A purely negative query is not allowed with Lucene. Erik Andrew ---

Re: Quick newbie question

2005-07-27 Thread Andrew Boyd
Of course you can do the inverse of what Erik said. That is search for a term that you know is not in the index and use the NOT operator. Andrew -Original Message- From: Erik Hatcher <[EMAIL PROTECTED]> Sent: Jul 27, 2005 10:49 AM To: java-user@lucene.apache.org Subject: Re: Quick newb

Re: Quick newbie question

2005-07-27 Thread Erik Hatcher
On Jul 27, 2005, at 11:07 AM, Federico Tonioni wrote: Hi all! I have just a simple question How can I retrieve all documents in an index by using QueryParser? I thought Query query = QueryParser.parse("*", "contents", new StandardAnalyzer()); might be the solution, but it

Quick newbie question

2005-07-27 Thread Federico Tonioni
Hi all! I have just a simple question How can I retrieve all documents in an index by using QueryParser? I thought Query query = QueryParser.parse("*", "contents", new StandardAnalyzer()); might be the solution, but it's not:) thanks in advance fede -- -

Derby + Lucene

2005-07-27 Thread Mag Gam
Anyone here have any luck with integration of Apache Derby and Lucene? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]