RE: document & field boosting

2002-07-30 Thread Halácsy Péter
it's amazing! in september I'll implement our news archive where we try to score the documents based on text relevancy and relative frequency of article's downloads/read (it's an e-magazine). peter > -Original Message- > From: Doug Cutting [mailto:[EMAIL PROTECTED]] > Sent: Monday, J

RE: CachedSearcher

2002-07-16 Thread Halácsy Péter
> > I want to change the way TermQuery doing scores. > > Could you please make a proposal to the lucene-dev list of > which methods and > classes should be made public or protected or non-final, and > what documentation > should be added? > > Thanks, > > Doug > 1. all package-protected

RE: LARM as an Avalon Phoenix application

2002-07-16 Thread Halácsy Péter
sounds great! I've read this topic and I hoped that we don't reinvent the wheel. I think Avalon is one of the best framework that could be used for such a large-scale application. http://www.mail-archive.com/lucene-dev@jakarta.apache.org/msg00754.html peter > -Original Message- > From

RE: VOTE: Possible features for next release

2002-05-23 Thread Halácsy Péter
> -Original Message- > From: none none [mailto:[EMAIL PROTECTED]] > Sent: Thursday, May 23, 2002 7:39 PM > To: Lucene Developers List > Subject: RE: VOTE: Possible features for next release > > > > -- their relationship. > > 4.Keep the index searcher opened inside the servlet or js

RE: VOTE: Possible features for next release

2002-05-23 Thread Halácsy Péter
ndexSearcher, IndexWriter, Searcher, Directory, FSDirectory, RAMDirectory can be subclassed from outer package. It would be very useful to make for example a ManagedSearcher class that is returned by an IndexAccessControl class: http://www.mail-archive.com/cgi-bin/htsearch?method=and&format=sh

RE: Call for features in next release

2002-05-22 Thread Halácsy Péter
Some others (if you don't mind): 1. make package protected abtract methods of org.apache.lucene.search.Searcher to public (I'd like to be able to make subclasses of Searcher, IndexWriter, InderReader ) 2. add lastModified() method to Directory, FSDirectory and RamDirectory (so it could be cached

RE: Web Crawler

2002-04-24 Thread Halácsy Péter
> -Original Message- > From: Clemens Marschner [mailto:[EMAIL PROTECTED]] > Sent: Wednesday, April 24, 2002 11:14 PM > To: Lucene Developers List; [EMAIL PROTECTED] > Subject: Re: Web Crawler > >> Another thing I have in mind is to compress the URLs in > memory. First of > all, the URL

RE: Normalization of Documents

2002-04-11 Thread Halácsy Péter
Extracting concept is not too easy thing and I don't think you can implement a language/context/document type independent solution. Filtering only important terms of a text (and not index all text as in modern full text indexing system) is one of the most important area of IR. A lot of project

not public abtract methods

2002-04-09 Thread Halácsy Péter
Hello, when I wrote a subclass of Searcher (ManagedSearcher mentioned in my previous mail) I faced a problem in org.apache.lucene.search.Searcher: it has package protected abtract methods such as: abstract TopDocs search(Query query, Filter filter, int n) throws IOException; Why? That me

RE: Searcher/Reader/Writer Management

2002-04-09 Thread Halácsy Péter
Hello Scott, I've attached a new version of IndexAccessControl and a TEST file. First of all I think I've found a failure in you code: 1. get a searcher 2. use it 3. release it 4. get an other searcher 5. use it In step 3. -- since there is only one reference to the searhcer -- the real searcher

RE: QueryParser for default AND

2002-04-08 Thread Halácsy Péter
Web" - as a phrase I prefer option II because I think a user don't like to get too much result even if the first ones are great. If I search for World Wide Web I don't want to get result about country wide promotion world bank web publishing &

QueryParser for default AND

2002-04-06 Thread Halácsy Péter
19 Febr. Doug wrote: >> How could I modify the queryParser to implement >> default AND logic? >I haven't tested this, but it should be as simple as changing line 318 of >QueryParser.jj to: > int ret = MOD_REQ; >Unfortunately, I think this would end up disabling OR, so the proper change >is mor

RE: Searcher/Reader/Writer Management

2002-04-03 Thread Halácsy Péter
> -Original Message- > From: Scott Ganyo [mailto:[EMAIL PROTECTED]] > Sent: Wednesday, April 03, 2002 5:30 PM > To: 'Lucene Developers List' > Subject: RE: Searcher/Reader/Writer Management > > > > Yes, but then you rely on the user to make sure that they > > don't create two > > on t

RE: Searcher/Reader/Writer Management

2002-04-03 Thread Halácsy Péter
> -Original Message- > From: Scott Ganyo [mailto:[EMAIL PROTECTED]] > Sent: Wednesday, April 03, 2002 4:44 PM > To: 'Lucene Developers List' > Subject: RE: Searcher/Reader/Writer Management > > > (quotes clipped for brevity) > > > 1. Why don't we have as many control/manager object as

RE: Searcher/Reader/Writer Management

2002-04-02 Thread Halácsy Péter
Hello, > -Original Message- > From: Scott Ganyo [mailto:[EMAIL PROTECTED]] > Sent: Tuesday, April 02, 2002 10:29 PM > To: Lucene-Dev (E-mail) > Subject: Searcher/Reader/Writer Management > > > It seems that a lot of people run into the same set of problems with > maintaining readers, wr

lucene & avalon (was: Proposal for Lucene / new component)

2002-04-02 Thread Halácsy Péter
Hello, more than 1 month ago I promissed to write an avalon example application. Now in my project I need some avalon components so I "avalonized" lucene. I published the package as a zip file: www.extra.hu/halacsyp/lucelon.zip The main idea is to make two manager component one for Searches and

RE: Proposal for Lucene / new component

2002-03-02 Thread Halácsy Péter
> -Original Message- > From: Andrew C. Oliver [mailto:[EMAIL PROTECTED]] > Sent: Tuesday, February 26, 2002 2:13 PM > To: Lucene Developers List > Subject: Re: Proposal for Lucene / new component > > > Humm. Well said. I'm not against using Avalon. My approach to > software is this t

RE: Proposal for Lucene / crawler

2002-03-02 Thread Halácsy Péter
mercator is the http crawler used in altavista search engine 3.0 (it's not a state-of-the-art product) http://research.compaq.com/SRC/mercator/papers/www/paper.html I hope something can be used and something about google (I've already posted it): http://www7.scu.edu.au/programme/fullpapers/1921/

RE: Lucene Query Structure

2002-02-19 Thread Halácsy Péter
> -Original Message- > From: Doug Cutting [mailto:[EMAIL PROTECTED]] > Sent: Tuesday, February 19, 2002 5:05 PM > To: 'Lucene Developers List'; Lucene Users List > Subject: RE: Lucene Query Structure > > > > Good analogies for the semantics of BooleanQuery are most > internet search

RE: Re : How does Lucene handle phrases containing words that are not indexed?

2002-02-14 Thread Halácsy Péter
Hello, I think my problem is something similar. > -Original Message- > From: Julien Nioche [mailto:[EMAIL PROTECTED]] > Sent: Wednesday, February 13, 2002 6:09 PM > To: Lucene Developers List > Subject: Re : How does Lucene handle phrases containing words > that are not indexed? > > Ph

RE: Proposal for Lucene / new component

2002-02-10 Thread Halácsy Péter
Hello, I've read you proposal (and all email related to it). One thing I'd like to advise is to distinguish the crawler and the loader component. The crawler is responsible for gathering documents from several sources. The loader (or indexer) is responsible for loading the gathered documents to t

RE: Antwort: RE: Re(2): Re: [Lucene-dev] Katakana characters in queries (a bug?)

2001-10-31 Thread Halácsy Péter
> -Original Message- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] > > Hi again, > > i think i have to correct my previous statement. it seems, > that the new > token definitions introduce other problems. The "+" od "-" prefixes to > search terms do not work any longer. > > Ex

RE: Re(2): Re: [Lucene-dev] Katakana characters in queries (a bug?)

2001-10-30 Thread Halácsy Péter
Title: RE: Re(2): Re: [Lucene-dev] Katakana characters in queries (a bug?) Hello, > -Original Message- > From: Doug Cutting [mailto:[EMAIL PROTECTED]] > > > > I think  IDENTIFIER_CHAR doesn't need to be the first char so my > > proposal is: > > > "*", "?", > > "~", "{", "}

RE: Re(2): Re: [Lucene-dev] Katakana characters in queries (a bug?)

2001-10-27 Thread Halácsy Péter
Hello, I think the token definition list has some problem that causes the ParseException if a term starts with any not English character. Joanne's solution helps in case of three other chars but do not helps for other. A TERM is definied as: ( ~["\"", " ", "\t", "(", ")", ":", "&"