Re: Spliting the Lucene

2006-12-08 Thread Andrzej Bialecki
true splitter. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.co

Re: Spliting the Lucene

2006-12-08 Thread Andrzej Bialecki
howard chen wrote: On 12/8/06, Andrzej Bialecki <[EMAIL PROTECTED]> wrote: howard chen wrote: > Hi, > > A friend from Hadoop told me someone in the list has code for spliting > the Lucene index, can anyone point me to the right place? You probably refer to the emails we exc

ANN: Luke 0.7 released

2007-02-21 Thread Andrzej Bialecki
many new Field flags * new plugin for term analysis (contributed by Mark Harwood) * many other usability and functionality improvements. Have fun! -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retri

Re: ANN: Luke 0.7 released

2007-02-21 Thread Andrzej Bialecki
l. I can't find it in the Lucene source tree. Yes, that's part of Luke - thanks for the report, I'll upload a fixed jar in a moment. It seems the build.xml didn't package this jar correctly. -- Best

Re: ANN: Luke 0.7 released

2007-02-21 Thread Andrzej Bialecki
Antony Bowesman wrote: With the luke.jar download, it throws an Exception java.lang.NoClassDefFoundError: org/apache/lucene/index/IndexGate Fixed - I uploaded an updated jar. Sorry for the problem. -- Best regards, Andrzej Bialecki

Re: [jira] Updated: (LUCENE-834) Payload Queries

2007-03-17 Thread Andrzej Bialecki
http://www.nabble.com/Performance-optimization-for-Nutch-index---query-tf3276316.html Thanks in advance! -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedde

Re: [jira] Updated: (LUCENE-834) Payload Queries

2007-03-18 Thread Andrzej Bialecki
Grant Ingersoll wrote: You know, the Nutch Dev mailing list was my last holdout for subscriptions to the Lucene mailing lists! :-) I barely can keep up with Lucene Java! I will try to have a read soon, but can't promise I can add anything meaningful. Yes, I know what you mean ... I'd be g

[ANN] Luke 0.7.1 released

2007-06-22 Thread Andrzej Bialecki
distribution analysis plugin by Mark Harwood. * Fixed IndexGate class to correctly show deletable files. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embe

Re: [ANN] Luke 0.7.1 released

2007-06-22 Thread Andrzej Bialecki
Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at

Re: [ANN] Luke 0.7.1 released

2007-06-22 Thread Andrzej Bialecki
Andrzej Bialecki wrote: Mark Miller wrote: I think it was probably compiled with Java 1.6. 1.5 does not work, but 1.6 does. Ah, yes - sorry, I forgot that 1.6 is the default in my environment. There is nothing specific in Luke that would require 1.6 - I'll recompile it and upload an up

Re: [ANN] Luke 0.7.1 released

2007-06-22 Thread Andrzej Bialecki
Steven Rowe wrote: Hi Andrzej, Andrzej Bialecki wrote: Luke still requires 1.5, because that's what Lucene requires. Lucene core requires 1.4, not 1.5. Indeed! I had a vague recollection that it requires 1.5, probably due to the gdata-server contrib module ... but I just checked it

Optimize and internal document order

2007-08-30 Thread Andrzej Bialecki
t this time I need to figure out how to preserve the old->new mapping during the optimization. So, here's the question: is this scenario feasible? If so, then in the trunk/ version of Lucene, is there any way to figure out (predictably) how internal document numbers

Re: Optimize and internal document order

2007-08-31 Thread Andrzej Bialecki
Karl Wettin wrote: 30 aug 2007 kl. 22.50 skrev Andrzej Bialecki: I think this is possible to achieve by using a FilterIndexReader, which keeps a map of updated documents, and re-maps old doc ids to the new ones on the fly. From time to time I'd like to optimize the "aux" i

Re: Optimize and internal document order

2007-09-03 Thread Andrzej Bialecki
n skipTo() that document id-s are monotonically increasing (which seems to be a part of the contract). -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| ||

Re: Updating Lucene Index with Unstored fields

2008-01-30 Thread Andrzej Bialecki
a merged index. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact

[ANN] Luke 0.8 released

2008-02-04 Thread Andrzej Bialecki
m an index. Instead this column now reads "Norms" and shows the fieldNorm value of a field. Have fun! -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__||

Re: [ANN] Luke 0.8 released

2008-02-05 Thread Andrzej Bialecki
Andrzej Bialecki wrote: Hi all, I just released Luke 0.8, the Lucene Index Toolbox. As usually, you can get it here: There was a minor issue with this release - Snowball analyzers were not included in the lukeall.jar. I fixed this and uploaded the new binary (no change in the version

Re: Lucene-based Distributed Index Leveraging Hadoop

2008-02-06 Thread Andrzej Bialecki
sts for frequent terms and phrases - at least that's what I suspect after reading this paper (not by Google folks, but very enlightening): http://citeseer.ist.psu.edu/724464.html -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__

Re: Lucene-based Distributed Index Leveraging Hadoop

2008-02-07 Thread Andrzej Bialecki
at we use versioning, and that we have a "shard manager" that knows the latest versions of each shard among the whole active set - or that clients discover this dynamically by querying the shard servers every

Index with payloads needed

2008-02-12 Thread Andrzej Bialecki
Hi all, I'm testing the payloads support in Luke, and I need a small index with payloads - if you happen to have one, please contact me off the list. Thank you! -- Best regards, Andrzej Bia

Re: Index with payloads needed

2008-02-12 Thread Andrzej Bialecki
information. Great, thanks - that's exactly what I needed. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://

[ANN] Luke 0.8.1 released

2008-02-12 Thread Andrzej Bialecki
etable files, which caused a ClassCastException. * Some query types may have been skipped when displaying Explanation. Have fun! -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic W

Re: [jira] Created: (LUCENE-1285) WeightedSpanTermExtractor doesn'

2008-05-15 Thread Andrzej Bialecki
Andrzej Bialecki (JIRA) wrote: WeightedSpanTermExtractor doesn' Key: LUCENE-1285 URL: https://issues.apache.org/jira/browse/LUCENE-1285 Project: Lucene - Java Issue Type: Bug Components: co

Re: [jira] Commented: (LUCENE-1290) Deprecate Hits

2008-05-21 Thread Andrzej Bialecki
to e.g. top 100 results, and throw exceptions beyond that number. This way users that only ever need the top 100 results can still use Hits, but it will be obvious for others that they should move to using the HitColle

A few interesting papers from WWW2008

2008-06-16 Thread Andrzej Bialecki
e benefits over other well-known algorithms. * http://www2008.org/papers/pdf/p1213-ding.pdf "Using Graphics Processors for High-Performance IR Query Processing", discusses the application of GPU for posting list decompression and intersection. -- Best regar

Re: lucene scoring

2008-08-07 Thread Andrzej Bialecki
f the level is query-independent, then it's a constant factor, which you can put in a field during the index creation - and then you could use a Filter or FunctionQuery to exclude documents with this factor below the threshold. -- Best regards, Andrze

Analyzer and Fieldable, different stored and indexed values

2008-08-27 Thread Andrzej Bialecki
ike this: public TokenStream tokenStream(String fieldName, Fieldable field) { Reader r = field.readerValue(); if (r == null) { String s = field.stringValue(); r = new StringReader(s); } return tokenStream(fieldName, r); } -- Best regar

Re: Analyzer and Fieldable, different stored and indexed values

2008-08-27 Thread Andrzej Bialecki
al cost in compatibility, and likely no cost in performance. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.

Re: RMI, Searchable and RemoteSearchable

2008-09-26 Thread Andrzej Bialecki
r to keep all search-related core classes Serializable ;) I recall a single situation when I had a use for remote searchable, and due to the operational issues with running rmiregistry we went with a custom RPC anyway. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ __

Similarity.lengthNorm and positionIncrement=0

2008-10-07 Thread Andrzej Bialecki
ongs to Similarity, and should be specific to a field, so perhaps a new method in Similarity like this would do: public float lengthNorm(String fieldName, int numTokens, int numOverlappingTokens) { return lengthNorm(fieldName, numTokens); }

Re: Similarity.lengthNorm and positionIncrement=0

2008-10-12 Thread Andrzej Bialecki
ecase is that users submit queries consisting of a single synonym, then the proposed method works better. I'll create a JIRA issue and prepare a patch. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| I

Re: [VOTE] Relax backwards-compatibility policy for package-protected APIs

2008-10-22 Thread Andrzej Bialecki
grade to a different version of Lucene. I've been forewarned, and I won't complain. :) -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded

[ANN] Luke 0.9 released

2008-11-13 Thread Andrzej Bialecki
, although I tested all functionality to make sure that there is no data loss. HOWEVER, if you work with precious data, it's always a good idea to use the "Read-only" option. As usually, bug reports or suggestions for improvements, or even better patches, are welcome!

Luke bugs (Re: [jira] Commented: (LUCENE-1454) Corrupted index produced by lucene 2.4)

2008-11-16 Thread Andrzej Bialecki
I may miss reports like this. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http:/

[ANN] Luke 0.9.1 - bugfix release

2008-11-23 Thread Andrzej Bialecki
ll commits" option was specified. Reported by Mark Harwood. o Empty index with no fields was reported as invalid. Discovered by Andrew Zhang and Michael McCandless (LUCENE-1454). Thank you! -- Best regards, A

Re: TrieRangeQuery for contrib?

2008-11-25 Thread Andrzej Bialecki
class. But this is the same with current DateUtils from Lucene (but they are more readable :-) ) Recent versions of Luke can present field contents using different decoders. I can add a DateUtils decoder, as well as TrieRange decoder once it becomes a Lucene contrib module. -- Best regard

Re: Realtime Search

2008-12-26 Thread Andrzej Bialecki
gards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at

BloomFilter-s with Lucene

2009-01-30 Thread Andrzej Bialecki
e, plus a trivial hashing scheme for setting and probing bits in that long value. Sorry if this sounds too vague, it's just some food for thought ... -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\

Re: BloomFilter-s with Lucene

2009-01-30 Thread Andrzej Bialecki
markharw00d wrote: Andrzej Bialecki wrote: Funny, I was having vague thoughts about this today too having been concerned about some of the big arrays that can end up in a typical Lucene app. Aside from providing space-efiicient lookups, another application for BloomFilters is in similarity

Re: g...@clef

2009-03-01 Thread Andrzej Bialecki
a GSoC project then even better, but even if not I still think Lucene project should participate in this effort. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Sem

[ANN] Luke 0.9.2 release

2009-03-19 Thread Andrzej Bialecki
ounts per field in Overview - contributed by Mark Harwood. o Improved the Analysis plugin to show all token information, and highlight whenever a token is selected from the list. * Bug fixes: o (None) -- Best regards, Andrzej Bia

Re: List Moderators

2009-03-24 Thread Andrzej Bialecki
you're a list moderator for dev/user, please stand up. I presume the question was related to Lucene java-user and java-dev, and not sub-projects? FYI, I'm a moderator for Nutch user/dev lists. -- Best regards, Andrze

Re: renaming fields in index

2005-03-07 Thread Andrzej Bialecki
API, that is. Because you can change the content of the *.fnm file appropriately, right? -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration

Re: Document proximity

2005-03-30 Thread Andrzej Bialecki
etieve documents that contain a given term (similar to a primary key), let's say "John 1:12". You could also add a field to flag a given document as the "end of chapter", or "end of book". I would be more than happy to help you find a good solution - I'm a bor

Re: Document proximity

2005-03-30 Thread Andrzej Bialecki
restricted searches. Today, we do the restriction after the search. This would need some testing, but I would suggest splitting this into two fields: one would be the book name, the other would be a combined chapter/verse, as an integer. --

Re: DO NOT REPLY [Bug 32965] - [PATCH] Use filter bits for next() and skipTo() in FilteredQuery

2005-04-04 Thread Andrzej Bialecki
Erik Hatcher wrote: Oh, and one other thing Paul's code relies on JDK 1.4's assert Erhm.. you meant 1.5 (five), right? -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Se

Re: ParallelReader

2005-04-28 Thread Andrzej Bialecki
to augment already existing indices, and without reindexing the main index. I can see a lot of possibilities... Should it go into the core or in contrib? I would vote for the core, if I could... -- Best regards, Andrzej Bialecki

Re: Topic Maps/Clustering + Lucene... how?

2005-07-15 Thread Andrzej Bialecki
I want it automatically ;) Clustering - use Carrot2 (http://sourceforge.net/projects/carrot2/) Topic Maps - use treemap (http://www.cs.umd.edu/hcil/treemap/) -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| In

Re: Question about BooleanQuesry.maxClauseCount

2005-11-09 Thread Andrzej Bialecki
emulate the old behaviour by calling the new setter methods using the system properties as values. I hear you :-) This will be fixed in the upcoming release (as soon as 1.9 is out). -- Best regards, Andrzej Bialecki

Performance issues with ConjunctionScorer

2005-11-22 Thread Andrzej Bialecki
is to index it and then build the summaries. Please see the profiles here: http://www.getopt.org/nutch/profile/index.html -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval

Re: Performance issues with ConjunctionScorer

2005-11-22 Thread Andrzej Bialecki
Andrzej Bialecki wrote: Hi, I've been profiling a Nutch installation, and to my surprise the largest amount of throwaway allocations and the most time spent was not in Nutch specific code, or IPC, but in Lucene ConjunctionScorer.doNext() method. This method operates on a LinkedList,

Re: Tree based BitSet (aka IntegerSet, DocSet...)

2006-01-28 Thread Andrzej Bialecki
except for academic study. Pity. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: inf

Re: Tree based BitSet (aka IntegerSet, DocSet...)

2006-01-28 Thread Andrzej Bialecki
ode, lest you become "tainted" ... ;-) -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http:

Re: Tree based BitSet (aka IntegerSet, DocSet...)

2006-01-29 Thread Andrzej Bialecki
tories, and for good reasons. The only thing they/we could do would be to ask nicely to re-license it under ASL. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \|

Re: wildcard search with variable length

2006-02-22 Thread Andrzej Bialecki
more compatible with the use of "?" in other contexts). -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embe

Re: Lazy Field Loading

2006-03-31 Thread Andrzej Bialecki
n requesting all hit's "metadata" - again, using the same index. So, option #4 would be really useful. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \|

Re: Test corpus

2006-04-02 Thread Andrzej Bialecki
same tests. At least 1000 docs, a few hundred words each. Any suggestions? 20 newsgroups or the old Reuters corpus are freely available, and contain sufficient number of documents. -- Best regards, A

Re: Benchmarking on GOV2

2006-05-29 Thread Andrzej Bialecki
warm-up. In short, I think the numbers for Lucene are not to be trusted. The indexing times seem strange, too - couple minutes for other engines, and > 4 hours for Lucene? Something's wrong here

Re: Benchmarking on GOV2

2006-05-29 Thread Andrzej Bialecki
, indeed I have ... It was so long ago I nearly forgot about it. :) I need to dust it off and see if it's of any use. It used the 20newsgroups corpus (~19,000 items). It could use the Reuters corpus, just the parser would have to be implemented. -- Best regards, A

Re: Benchmarking on GOV2

2006-05-29 Thread Andrzej Bialecki
Marvin Humphrey wrote: On May 29, 2006, at 10:34 AM, Andrzej Bialecki wrote: It could use the Reuters corpus Has anyone used existing categorization data associated with the Reuters corpus to build a benchmarker that measured IR precision and/or recall? That would be RCV1 or RCV2

Re: Lucene and Java 1.5

2006-05-30 Thread Andrzej Bialecki
ous switch to use java.util.* collections instead of Vectors and Hashtables. So, it's -0.5 from me ... ;) -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| |

Luke - in need of maintainer

2006-05-31 Thread Andrzej Bialecki
same time to exercise their GUI coding skills, you are welcome. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration

Re: Luke - in need of maintainer

2006-06-01 Thread Andrzej Bialecki
iXML is still too much Swing-like for my linking, I like the direction that JAXX is taking but it's still very young and requires the descriptors to be compiled ... -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__

Re: Luke - in need of maintainer

2006-06-01 Thread Andrzej Bialecki
, he had a well-advanced port, perhaps it just needs a little polishing (Polish-ing? :) . -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix,

Re: Luke - in need of maintainer

2006-06-01 Thread Andrzej Bialecki
so I still would have to host this part. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http:

Re: Empty doc objects & "The handle is invalid" IOExceptions

2006-08-23 Thread Andrzej Bialecki
n throughout the life of the Hits object - documents are lazy-loaded into Hits only when you request them, and behind the scenes Lucene is reading them from the IndexReader. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|

Re: Combining search steps without re-searching

2006-08-28 Thread Andrzej Bialecki
ery, ... umm, guys, wouldn't a series of QueryFilter's work much better in this case? If some of the clauses are repeatable, then filtering results through a cached BitSet in such filtered query would work nicely, right? -- Best regards, And

Re: Parallel incremental indexing

2009-08-30 Thread Andrzej Bialecki
-java/ParallelIncrementalIndexing I'm curious what is the relationship of this proposal to the design described in a CKIM '08 paper "Supporting Sub-Document Updates and Queries in an Inverted Index" (nota bene coming from IBM folks)? -- Best

Re: Lucene icon and Ohloh

2009-09-01 Thread Andrzej Bialecki
its.ohloh.net/attachments/23787/lucene_tiny.png Looks like the logos I've been using in Luke for the last few years ;) for the same reason. I propose to add these to the official logos - the need arises often enough. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _

[ANN] Luke 0.9.9 release

2009-09-29 Thread Andrzej Bialecki
Chris Pimlott and others. Enjoy! :) -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.co

Re: Optimization and Corruption Issues

2009-10-01 Thread Andrzej Bialecki
after optimizing with Luke 7.0 is 633103800023469057. This is just a timestamp, so it doesn't say what version of Lucene created the index. If you open the index with Luke, in the Overview tab there is a line that tells what is the index format version. -- Best regards, Andrze

Omit positions but not TF

2009-11-07 Thread Andrzej Bialecki
e may want to add a separate flag for this and bump the format version. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System In

Re: Omit positions but not TF

2009-11-08 Thread Andrzej Bialecki
right term for it, I haven't looked at the details of oal.index.* since 2.4-ish or so ... we'll see ;) Probably this should wait until 3.1. +1. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\

Re: Omit positions but not TF

2009-11-09 Thread Andrzej Bialecki
Andrzej Bialecki wrote: Michael McCandless wrote: +1 I guess we'd add a Fieldable.setOmitPositions? And then save that in FieldInfos, and fix the postings writing/reading to respect it? Ie, we can just change the index format. Encoding as negative numbers Yes, that's what I had

Re: Distributing index over N disks

2009-11-25 Thread Andrzej Bialecki
. This works well with static indexes (no updates, no merges), and doesn't require code modifications in existing apps. Seriously, though, I agree that FileSwitchDirectory is the way to go. -- Best regards, Andrzej Bia

Re: Announcement: Boilerplate removal library

2009-12-04 Thread Andrzej Bialecki
ge-level (local) methods can work well in such case, and it can be only solved by using the global-level (site or area of site) methods, which are more cumbersome to use in practice... I'm looking forward to experimenting with your implementation! -- Best regar

Re: [DISCUSS] Do away with Contrib Committers and make core committers

2010-03-14 Thread Andrzej Bialecki
mitters know that. +1. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www

Re: lucene and solr trunk

2010-03-16 Thread Andrzej Bialecki
giant understatement...) to handle branching and merging in git, both between git branches and syncing with external svn. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval

Re: Set IDF value manually on a search query

2010-03-22 Thread Andrzej Bialecki
this approach implemented here: https://issues.apache.org/jira/browse/SOLR-1632 though this contains some Solr-specific scaffolding, too. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval

Re: Incremental Field Updates

2010-03-29 Thread Andrzej Bialecki
id=1458171 -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: inf

Re: AW: Incremental Field Updates

2010-03-29 Thread Andrzej Bialecki
On 2010-03-29 15:11, Uwe Goetzke wrote: The filed this as patent, too: http://www.freepatentsonline.com/y2009/0228528.html .. which is not granted yet, right? It's a patent application. Besides, I live in EU ;) -- Best regards, Andrzej Bia

Re: official GIT repository / switch to GIT?

2010-04-17 Thread Andrzej Bialecki
it-svn and the problem is solved (for you). -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact:

[jira] Created: (LUCENE-811) Public API inconsistency

2007-02-22 Thread Andrzej Bialecki (JIRA)
release Reporter: Andrzej Bialecki Priority: Minor org.apache.lucene.index.SegmentInfos is public, and contains public methods (which is good for expert-level index manipulation tools such as Luke). However, SegmentInfo class has package visibility. This leads to a

[jira] Commented: (LUCENE-811) Public API inconsistency

2007-02-22 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12475193 ] Andrzej Bialecki commented on LUCENE-811: -- I'm fine with making these classes package-private -

[jira] Commented: (LUCENE-997) Add search timeout support to Lucene

2008-01-25 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12562765#action_12562765 ] Andrzej Bialecki commented on LUCENE-997: -- I believe this version of the p

[jira] Commented: (LUCENE-997) Add search timeout support to Lucene

2008-01-26 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12562936#action_12562936 ] Andrzej Bialecki commented on LUCENE-997: -- Indeed, thanks for the correction

[jira] Updated: (LUCENE-1285) WeightedSpanTermExtractor incorrectly treats the same terms occurring in different query types

2008-05-15 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki updated LUCENE-1285: -- Description: Given a BooleanQuery with multiple clauses, if a term occurs both in a

[jira] Updated: (LUCENE-1285) WeightedSpanTermExtractor incorrectly treats the same terms occurring in different query types

2008-05-15 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki updated LUCENE-1285: -- Attachment: highlighter.patch A patch to fix the issue. > WeightedSpanTermExtrac

[jira] Created: (LUCENE-1285) WeightedSpanTermExtractor doesn'

2008-05-15 Thread Andrzej Bialecki (JIRA)
sions: 2.4 Reporter: Andrzej Bialecki Fix For: 2.4 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [

[jira] Created: (LUCENE-1396) Improve PhraseQuery.toString()

2008-09-19 Thread Andrzej Bialecki (JIRA)
Reporter: Andrzej Bialecki Fix For: 2.4 PhraseQuery.toString() is overly simplistic, in that it doesn't correctly show phrases with gaps or overlapping terms. This may be misleading when presenting phrase queries built using complex analyzers and filters. --

[jira] Updated: (LUCENE-1396) Improve PhraseQuery.toString()

2008-09-19 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki updated LUCENE-1396: -- Attachment: phraseQuery.patch This patch improves toString(), and adds a unit test

[jira] Created: (LUCENE-1420) Similarity.lengthNorm and positionIncrement=0

2008-10-13 Thread Andrzej Bialecki (JIRA)
Affects Versions: 2.3.3, 2.9 Reporter: Andrzej Bialecki Fix For: 2.3.3, 2.9 Calculation of lengthNorm factor should in some cases take into account the number of tokens with positionIncrement=0. This should be made optional, to support two different scenarios

[jira] Updated: (LUCENE-1420) Similarity.lengthNorm and positionIncrement=0

2008-10-13 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki updated LUCENE-1420: -- Attachment: similarity.patch This patch adds Similarity,length(fieldName, numTokens

[jira] Updated: (LUCENE-1420) Similarity.lengthNorm and positionIncrement=0

2008-10-25 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki updated LUCENE-1420: -- Attachment: similarity-v2.patch Patch that uses FieldInvertState, as suggested

[jira] Commented: (LUCENE-1420) Similarity.lengthNorm and positionIncrement=0

2008-10-28 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12643231#action_12643231 ] Andrzej Bialecki commented on LUCENE-1420: --- Thanks for a thorough review.

[jira] Created: (LUCENE-1452) Binary field content lost during optimize

2008-11-13 Thread Andrzej Bialecki (JIRA)
Versions: 2.4, 2.9 Environment: Ubuntu 8.04, x86_64 Java HotSpot(TM) 64-Bit Server VM (build 11.0-b15, mixed mode) Reporter: Andrzej Bialecki Scenario: * create an index with arbitrary content, and close it * open IndexWriter again, and add a document with binary field (stored

[jira] Updated: (LUCENE-1452) Binary field content lost during optimize

2008-11-13 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki updated LUCENE-1452: -- Attachment: binaryField-junit.patch Test case to illustrate the problem. This happens

[jira] Created: (LUCENE-1464) FSDirectory.getDirectory always creates index path

2008-11-21 Thread Andrzej Bialecki (JIRA)
Affects Versions: 2.4, 2.9 Reporter: Andrzej Bialecki This was reported to me as a Luke bug, but going deeper it proved to be a non-intuitive (broken?) behavior of FSDirectory. If you use FSDirectory.getDirectory(File nonexistent) on a nonexistent path, but one that is located

[jira] Commented: (LUCENE-1464) FSDirectory.getDirectory always creates index path

2008-11-22 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649940#action_12649940 ] Andrzej Bialecki commented on LUCENE-1464: --- The patch looks fine to me in

[jira] Commented: (LUCENE-1464) FSDirectory.getDirectory always creates index path

2008-11-22 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649950#action_12649950 ] Andrzej Bialecki commented on LUCENE-1464: --- Well, if it's the Inde

  1   2   >