Re: SloppyMath license

2015-09-19 Thread Earl Hood
On Sat, Sep 19, 2015 at 11:14 AM, Robert Muir wrote: > There is nothing unusual about public domain code. If your lawyers do > not understand that, tell them to go back to school. Actually, the code in question is not in the public domain, despite that the term "public domain" is in the comments

Re: Highlighting text, do I seriously have to reimplement this from scratch?

2014-02-05 Thread Earl Hood
On Tue, Feb 4, 2014 at 6:05 PM, Michael Sokolov wrote: > Thanks for the feedback. I think it's difficult to know what to do about > attribute value highlighting in the general case - do you have any > suggestions? That is a challenging one since one has to know how attribute data will be transfo

Re: Highlighting text, do I seriously have to reimplement this from scratch?

2014-02-04 Thread Earl Hood
On Tue, Feb 4, 2014 at 1:16 PM, Michael Sokolov wrote: > You might be interested in looking at Lux, which layers XML services like > XQuery on top of Lucene and Solr, and includes an XML-aware highlighter: > https://github.com/msokolov/lux/blob/master/src/main/java/lux/search/highlight/XmlHighligh

Re: Highlighting text, do I seriously have to reimplement this from scratch?

2014-02-04 Thread Earl Hood
On Tue, Feb 4, 2014 at 12:20 AM, Trejkaz wrote: > I'm trying to find a precise and reasonably efficient way to highlight > all occurrences of terms in the query, only highlighting fields which > match the corresponding fields used in the query. This seems like it > would be a fairly common require

Re: Scanning through inverted index

2013-11-27 Thread Earl Hood
On Wed, Nov 27, 2013 at 3:31 PM, Michael Berkovsky wrote: > My goal is to simply store records term->[doc1, doc2, ] on disk. I > tried to get these records through docsEnum but it was too slow. Not sure > if it possible to get them faster, hence the reason for my enquiry.(Perhaps > there is

Performance/scoring impacts with multiple occurrences of a field

2013-10-07 Thread Earl Hood
Using Lucene 3. I know Lucene supports multiple occurrences of a field, and if one searches on that field, all fields are checked for hits. One question I have is if there is a performance difference between if all the data I want to index is represented by a single field vs multiple fields of t

Re: Is Analyzer used when calling IndexWriter.addIndexesNoOptimize()?

2012-12-05 Thread Earl Hood
On Wed, Dec 5, 2012 at 8:24 AM, Jack Krupansky wrote: > These are operations on indexes, so analysis is no longer relevant. Analysis > is performed BEFORE data is placed in an index. You still need to perform > analysis for queries though. This is what I thought. Just wanted to get confirmation.

Is Analyzer used when calling IndexWriter.addIndexesNoOptimize()?

2012-12-04 Thread Earl Hood
Lucene version: 3.0.3 Does IndexWriter use the analyzer when adding indexes via addIndexesNoOptimize()? What about for optimize()? I am examining some existing code and trying to determine what effects there may be when combining multiple indexes into a single index, but each index may have had

Re: Restricting search results to a dynamic slice of documents

2012-05-05 Thread Earl Hood
On Sat, May 5, 2012 at 11:38 AM, Erick Erickson wrote: > On the face of it, it looks like one of the subclasss of lucene.search.Filter > should be what you're looking for. Or is the "dynamic slice" something > you couldn't formulate into a query? The query route is possible, but it would make for

Restricting search results to a dynamic slice of documents

2012-05-04 Thread Earl Hood
I require the ability to perform a search on a dynamic slice of documents in an index. For a given event, only a select set of documents should be considered when performing a query. Looking at the API, it appears that I can use a Collector during the search to filter out any documents that do no

Re: Re-indexing a particular field only without re-indexing the entire enclosing document in the index

2012-04-23 Thread Earl Hood
On Mon, Apr 23, 2012 at 10:31 AM, Jong Kim wrote: > Is there any good way to solve this design problem? Obviously, an > alternative design would be to split the index into two, and maintain > static (and large) data in one index and the other dynamic part in the > other index. However, this approa

Re: any tips for upgrading Lucene 3.0.3 -> 3.5.0?

2012-01-19 Thread Earl Hood
On Thu, Jan 19, 2012 at 4:59 PM, Uwe Schindler wrote: > Lucene 3.5 can read any index going back to 2.0. The IndexUpgrader is only > needed to "forcefully" upgrade indexes for maximum performance and safe > migration to Lucene 4.0 (that can only read indexs >= 3.0). Question: Will Lucene 3.5 auto

Re: Upgrade lucene from 2.3.2 to 3.1.0

2011-11-14 Thread Earl Hood
On Mon, Nov 14, 2011 at 11:09 AM, Zhang, Lisheng wrote: > We plan to upgrade lucene from 2.3.2 to 3.1.0, from reading "Lucene In > Action" I learned > that we should "warm up" IndexSearcher and donot expect initial a few queries > to be fast. Make sure to QA things first. When we went from 2.4

Re: Questions regarding upgrading from 2.2.x -> 2.9.x -> 3.1.x

2011-09-14 Thread Earl Hood
On Wed, Sep 14, 2011 at 3:08 PM, Charlie Hubbard wrote: > I posted some questions to stackoverflow regarding how to upgrade from 2.2.x > to 3.1.x.  Hadn't gotten a response so I thought I'd try here.  Would repost > the full question here, but it looks prettier over there: > > http://stackoverflow.

Re: Which is the +best +fast HTML parser/tokenizer that I can use with Lucene for indexing HTML content today ?

2011-03-14 Thread Earl Hood
On Mon, Mar 14, 2011 at 11:46 PM, shrinath.m wrote: > I used Jericho and found it extremely simple to start with ... > > Just wanted to clarify one thing though. > Is there some tool that does extract text from HTML without creating the DOM Looks like Jericho does what you want already: http://je

Re: [REINDEX] Note: re-indexing required !

2011-01-23 Thread Earl Hood
On Sat, Jan 22, 2011 at 11:14 PM, Shai Erera wrote: > Under LUCENE-2720 the index format of both trunk and 3x has changed. You > should re-index any indexes created with either of these code streams. Does the "3x" refer to the 3.x development branch? I.e. For those of using the stable 3.x releas

Re: search on a field that is NOT_ANALYZED

2011-01-19 Thread Earl Hood
On Wed, Jan 19, 2011 at 2:11 PM, Paul Libbrecht wrote: > I think you should use a TermQuery. How about IndexReader.termDocs()? >> I am trying to use >> *IndexSearcher

Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-18 Thread Earl Hood
> Where do you get your Lucene/Solr downloads from? > > [X] ASF Mirrors (linked in our release announcements or via the Lucene > website) --ewh - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional com

Re: lucene locking

2010-12-16 Thread Earl Hood
On Thu, Dec 16, 2010 at 8:36 AM, Donld Hill wrote: > Can I safely upgrade to a newer version, do I need to perform any updates on > the actual indices? >From my personal experience, upgrading from v2 to v3 required rebuilds of indices since we experienced some problems performing queries against

Re: Forcing specific index file names

2010-12-15 Thread Earl Hood
On Wed, Dec 15, 2010 at 1:41 PM, Chris Hostetter wrote: > files with the same names should be the same, files with differnet names > should be very different -- but if your binary diff tool is finding > commonalities between files in new segments as the index grows overtime, > and you feel like yo

Re: Forcing specific index file names

2010-12-15 Thread Earl Hood
On Wed, Dec 15, 2010 at 7:49 AM, Doron Cohen wrote: > Perhaps I'll change my mind after understanding the scenario that creates > this, but for now I'd rather not to ignore the file names differences. It may be possible to control the data generation process, so the filenames are consistent. Chan

Re: Forcing specific index file names

2010-12-14 Thread Earl Hood
On Tue, Dec 14, 2010 at 9:45 AM, Erick Erickson wrote: > Lucene never changes an existing segments file once it is committed. > It only merges segments then deletes the old ones. So if the file names > are different, then it seems that renaming them wouldn't be what you > really want. > > So eithe

Re: Forcing specific index file names

2010-12-14 Thread Earl Hood
On Tue, Dec 14, 2010 at 12:53 AM, Chris Hostetter wrote: > > : It is possible to always have Lucene end up with the > : same set of index filenames for each index generation > : process? > > this smells like an XY problem why do you car what the file names > are? that's an implementtaion deta

Forcing specific index file names

2010-12-13 Thread Earl Hood
It is possible to always have Lucene end up with the same set of index filenames for each index generation process? I have an application that creates an index for a set of files, and generally, the index files created are the following: _0.cfs segments_2 segments.gen However, it appears somet