Re: [ANNOUNCE] Solr wiki editing change

2013-03-25 Thread Andrzej Bialecki
AndrzejBialecki to this group. Thank you! -- Best regards, Andrzej Bialecki http://www.sigram.com, blog http://www.sigram.com/blog ___.,___,___,___,_._. __<>< [___||.__|__/|__||\/|: Information Retrieval, System Integration ___|||__||..\|..||..|: Contact: info at s

Re: What is the "docs" number in Solr explain query results for fieldnorm?

2012-05-25 Thread Andrzej Bialecki
ample retrieve stored fields of this document. As it's shown in the Explanation-s, it can be only used to co-ordinate parts of the query that matched the same document number. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|_

Re: is there any practice to load index into RAM to accelerate solr performance?

2012-02-08 Thread Andrzej Bialecki
://www.siam.org/proceedings/alenex/2008/alx08_01transierf.pdf http://research.google.com/pubs/archive/37365.pdf -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embe

Re: Solr Lucene Index Version

2011-12-08 Thread Andrzej Bialecki
some parts missing - see LUCENE-3622. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com

Re: Solr Lucene Index Version

2011-12-08 Thread Andrzej Bialecki
there. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com

Re: Backup with lukeall XMLExporter.

2011-10-05 Thread Andrzej Bialecki
chanism to pull in a copy of the index from a running Solr instance. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration h

Re: Can I delete the stored value?

2011-07-10 Thread Andrzej Bialecki
See LUCENE-1812 for another practical application of this concept. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration

Re: Feed index with analyzer output

2011-07-05 Thread Andrzej Bialecki
/browse/SOLR-1535 -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com

Re: Performance loss - querying more than 64 cores (randomly)

2011-06-16 Thread Andrzej Bialecki
o use that excess of memory, but it won't be available for OS-level disk IO. Therefore reducing the heap size may actually increase your performance. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| In

Re: Performance loss - querying more than 64 cores (randomly)

2011-06-16 Thread Andrzej Bialecki
rease dramatically (and the performance will drop then). Modern OS-es try to keep as much data in memory as possible, so the memory usage itself is not that informative - but check what are the pagein/pageout rates when you start hitting the 32 vs 64 cores. -- Best regards, Andrzej Bia

Re: Lucid Works

2011-04-08 Thread Andrzej Bialecki
On 4/8/11 9:55 PM, Andy wrote: --- On Fri, 4/8/11, Andrzej Bialecki wrote: :) If you don't need the new functionality in 4.x, you don't need the performance improvements, What performance improvements does 4.x have over 3.1? Ah... well, many - take a look at the C

Re: Lucid Works

2011-04-08 Thread Andrzej Bialecki
reindexing cycles are long (indexes tend to stay around) then 3.1 is a safer bet. If you need a dozen or so new exciting features (e.g. results grouping) or top performance, or if you need LucidWorks with Click and other goodies, then use 4.x and be prepared for an occasional full

Re: Lucid Works

2011-04-08 Thread Andrzej Bialecki
orum http://www.lucidimagination.com/forum/ . -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact:

Re: Lucid Works

2011-04-07 Thread Andrzej Bialecki
contrib/patch that was applied? At the moment it's proprietary. I will have a talk at the Lucene Revolution conference that describes the Click tools in detail. -- Best regards, Andrzej Bia

Re: Detecting an empty index during start-up

2011-03-25 Thread Andrzej Bialecki
. For now it's better to pass openNew=false and be prepared to get a null. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System In

Re: a bug of solr distributed search

2010-10-25 Thread Andrzej Bialecki
On 2010-10-25 13:37, Toke Eskildsen wrote: > On Mon, 2010-10-25 at 11:50 +0200, Andrzej Bialecki wrote: >> * there is an exact solution to this problem, namely to make two >> distributed calls instead of one (first call to collect per-shard IDFs >> for given query terms, se

Re: a bug of solr distributed search

2010-10-25 Thread Andrzej Bialecki
e in scores across shards, or whether you want to bear the cost of an additional distributed RPC for every query... To summarize, I would qualify your statement with: "...if the composition of your shards is drastically different". Otherwise the cost of

Re: Different analyzers for dfferent documents in different languages?

2010-09-22 Thread Andrzej Bialecki
1536, it contains an example of a tokenizing chain that could use a language detector to create different fields (or tokenize differently) based on this decision. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| In

Re: SolrCloud distributed indexing (Re: anyone use hadoop+solr?)

2010-09-06 Thread Andrzej Bialecki
On 2010-09-06 22:03, Dennis Gearon wrote: What is a 'simple MOD'? md5(docId) % numShards -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Em

Re: SolrCloud distributed indexing (Re: anyone use hadoop+solr?)

2010-09-06 Thread Andrzej Bialecki
tions where a simple MOD won't do ;) so I think it would be good to hide this strategy behind an interface/abstract class. It costs nothing, and gives you flexibility in how you implement this mappin

Re: How to retrieve the full corpus

2010-09-06 Thread Andrzej Bialecki
other core (instead of using the current sub-index hack). -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://

SolrCloud distributed indexing (Re: anyone use hadoop+solr?)

2010-09-06 Thread Andrzej Bialecki
s committed I'm sure people will follow up with user-level convenience components that will make it easier. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \|

Re: anyone use hadoop+solr?

2010-09-06 Thread Andrzej Bialecki
tegrated with Nutch). SolrCloud is not far away from hitting the trunk (right, Mark? ;) ), so medium-term I think this is your best bet. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retr

Re: Analyser depending on field's value

2010-08-16 Thread Andrzej Bialecki
xt into different fields, which can then be analyzed differently. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com

Re: Auto-suggest internal terms

2010-06-03 Thread Andrzej Bialecki
On 2010-06-03 13:38, Michael Kuhlmann wrote: > Am 03.06.2010 13:02, schrieb Andrzej Bialecki: >> ..., and deploy this >> index in a separate JVM (to benefit from other CPUs than the one that >> runs your Solr core) > > Every known webserver ist multithreaded by de

Re: Auto-suggest internal terms

2010-06-03 Thread Andrzej Bialecki
atching terms as the > values. That would consume an awful lot of RAM... see SOLR-1316 for some measurements. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \

Re: Importing large datasets

2010-06-02 Thread Andrzej Bialecki
On 2010-06-02 13:12, Grant Ingersoll wrote: > > On Jun 2, 2010, at 6:53 AM, Andrzej Bialecki wrote: > >> On 2010-06-02 12:42, Grant Ingersoll wrote: >>> >>> On Jun 1, 2010, at 9:54 PM, Blargy wrote: >>> >>>> >>>> We have a

Re: Importing large datasets

2010-06-02 Thread Andrzej Bialecki
he absolute fastest way > that I know of to index is via multiple threads sending batches of documents > at a time (at least 100). Often, from DBs one can split up the table via SQL > statements that can then be fetched separately. You may want to wri

Re: Autosuggest

2010-05-15 Thread Andrzej Bialecki
On 2010-05-15 02:46, Blargy wrote: > > Thanks for your help and especially your analyzer.. probably saved me a > full-import or two :) > Also, take a look at this issue: https://issues.apache.org/jira/browse/SOLR-1316 -- Best regards, Andr

Re: SOLR-1316 How To Implement this autosuggest component ???

2010-03-31 Thread Andrzej Bialecki
On 2010-03-31 06:14, Andy wrote: --- On Tue, 3/30/10, Andrzej Bialecki wrote: From: Andrzej Bialecki Subject: Re: SOLR-1316 How To Implement this autosuggest component ??? To: solr-user@lucene.apache.org Date: Tuesday, March 30, 2010, 9:59 AM On 2010-03-30 15:42, Robert Muir wrote: On Mon

Re: SOLR-1316 How To Implement this autosuggest component ???

2010-03-30 Thread Andrzej Bialecki
n they correspond to the frequency of terms/phrases in the query logs ... TermsComponent and EdgeNGrams, while simple to use, suffer from both issues. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|

Re: SOLR-1316 How To Implement this autosuggest component ???

2010-03-29 Thread Andrzej Bialecki
ut then it's nearly equivalent to the TermsComponent; or from a list of frequent queries - but you need to build that list yourself). -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| In

Re: multiple binary documents into a single solr document - Vignette/OpenText integration

2010-03-24 Thread Andrzej Bialecki
gRequestHandler to actually parse the streams, and then you combine the results arbitrarily in your handler, eventually sending an AddUpdateCommand to the update processor. You can obtain both the update processor and SolrCell instance from req.getCore(). -- Best regards, Andrzej Bialecki &

Re: wikipedia and teaching kids search engines

2010-03-24 Thread Andrzej Bialecki
contrib/ is a quick and perhaps acceptable solution ... -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.s

Re: Features not present in Solr

2010-03-23 Thread Andrzej Bialecki
tely can't do. Could you perhaps elaborate a bit on this functionality? Your description sounds intriguing - it reminds me of ParallelReader, but I'm probably completely wrong ... -- Best regar

Re: SOLR-1316 How To Implement this autosuggest component ???

2010-03-19 Thread Andrzej Bialecki
that shows fragments of XML config files. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Cont

Re: Update Index : Updating Specific Fields

2010-03-04 Thread Andrzej Bialecki
E document consists of 4 fields, F1, F2, F3, F4 Now I want to update the value of field F2, so if I send the update xml to SOLR, can it keep the old field values for F1,F3,F4 and update the new value specified for F2? Best Regards, Kranti K K Parisa -- Best regards, Andrze

Re: If you could have one feature in Solr...

2010-02-28 Thread Andrzej Bialecki
aven't got the books in front of me). Kullback-Leibler divergence? -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration h

Re: term frequency vector access?

2010-02-11 Thread Andrzej Bialecki
Lucene - but in practice this may be too costly. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com

Re: Can Solr be forced to return all field tags for a document even if the field is empty?l

2010-01-27 Thread Andrzej Bialecki
ow those values out from the response... You can also implement a SearchComponent that post-processes results and based on the schema if a field is missing then it adds an empty node to the result. -- Best regards, A

Re: How to Split Index file.

2010-01-10 Thread Andrzej Bialecki
On 2010-01-10 01:55, Lance Norskog wrote: Make two copies of the index. In each copy, delete the records you do not want. Optimize. ... which is essentially what the MultiPassIndexSplitter does, only it avoids the initial copy (by deleting in the source index). -- Best regards, Andrzej

Re: restore space between words by spell checker

2009-11-28 Thread Andrzej Bialecki
is is to index compound words, i.e. when producing a spellchecker dictionary add a record "tommyhitfiger" with a field that points to "tommy hitfiger". Details vary depending on what spellc

Re: Index Splitter

2009-11-25 Thread Andrzej Bialecki
tions, etc. The cost for this flexibility is that it needs to read index files multiple times (hence "multi-pass"). -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information

Re: how to get the autocomplete feature in solr 1.4?

2009-11-23 Thread Andrzej Bialecki
take a look at SOLR-1316, there are patches there that implement such component using prefix trees. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| ||

Re: leading and trailing wildcard query

2009-11-05 Thread Andrzej Bialecki
erms and rotates the query term appropriately. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com

Re: Solr Cell on web-based files?

2009-10-27 Thread Andrzej Bialecki
regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com

Re: QTime always a multiple of 50ms ?

2009-10-23 Thread Andrzej Bialecki
INFO: [idx1] webapp=/solr path=/update/ params={} status=0 QTime=104 INFO: [idx1] webapp=/solr path=/update/ params={} status=0 QTime=52 ... Is this a known issue ? It may be an issue with System.currentTimeMillis() resolution on some platforms (e.g. Windows)? -- Best regards, Andrzej Bia

Re: Is negative boost possible?

2009-10-13 Thread Andrzej Bialecki
Yonik Seeley wrote: On Mon, Oct 12, 2009 at 12:03 PM, Andrzej Bialecki wrote: Solr never discarded non-positive hits, and now Lucene 2.9 no longer does either. Hmm ... The code that I pasted in my previous email uses Searcher.search(Query, int), which in turn uses search(Query, Filter, int

Re: Passing request to another handler

2009-10-13 Thread Andrzej Bialecki
(with it's defaults/invariants configured i na way you can't control) to delegate to. Indeed - thanks. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___||

Re: Is negative boost possible?

2009-10-12 Thread Andrzej Bialecki
Yonik Seeley wrote: On Mon, Oct 12, 2009 at 5:58 AM, Andrzej Bialecki wrote: BTW, standard Collectors collect only results with positive scores, so if you want to collect results with negative scores as well then you need to use a custom Collector. Solr never discarded non-positive hits, and

Re: Is negative boost possible?

2009-10-12 Thread Andrzej Bialecki
.32427183 = queryNorm 0.15342641 = (MATCH) fieldWeight(a:b in 0), product of: 1.0 = tf(termFreq(a:b)=1) 0.30685282 = idf(docFreq=1, numDocs=1) 0.5 = fieldNorm(field=a, doc=0) bsh % -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ ___

Re: Passing request to another handler

2009-10-11 Thread Andrzej Bialecki
Shalin Shekhar Mangar wrote: On Fri, Oct 9, 2009 at 10:53 PM, Andrzej Bialecki wrote: Hi, What's the canonical way to pass an update request to another handler? I'm implementing a handler that has to dispatch its result to different update handlers based on its internal process

Passing request to another handler

2009-10-09 Thread Andrzej Bialecki
ementation dependent on deployment paths defined in solrconfig.xml. Using SolrCore.getRequestHandlers(handler.class) often returns the LazyRequestHandlerWrapper, from which it's not possible to retrieve the wrapped instance of the handler .. -- Best regards, A

Re: Where to place ReversedWildcardFilterFactory in Chain

2009-10-01 Thread Andrzej Bialecki
>> Yes. Care should be taken that the query analyzer chain produces the same forward tokens, because the code in QueryParser that optionally reverses tokens acts on tokens that it receives _after_ all other query analyzers have run on the query. -- Best regards,

Re: Adding data from nutch to a Solr index

2009-09-30 Thread Andrzej Bialecki
f that process on the Solr side.) -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com

Re: Number of terms in a SOLR field

2009-09-30 Thread Andrzej Bialecki
lable fields and term counts per field". -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Cont

Re: Number of terms in a SOLR field

2009-09-29 Thread Andrzej Bialecki
- just get IndexReader.terms() enumeration and traverse it. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.s

Re: How to get a stack trace

2009-08-08 Thread Andrzej Bialecki
ype of problems when I would generate a heap dump on OOM (it's a JVM flag) and then use a tool like HAT to find largest objects and references to them. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__

Re: Language Detection for Analysis?

2009-08-07 Thread Andrzej Bialecki
/apache/nutch/analysis/lang/LanguageIdentifier.html -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com

Re: Solr Search probem w/ phrase searches, text type, w/ escaped characters

2009-08-03 Thread Andrzej Bialecki
queries don't match unrelated text. Phrase queries that you can construct using QueryParser can't match two tokens separated by a hole, unless you set a slop value > 0. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [_

Re: what crawler do you use for Solr indexing?

2009-03-10 Thread Andrzej Bialecki
impact of spam pages and to limit the size of LinkDb. If a page hits this limit then indeed the symptoms that you observe are missing (dropped) links. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| In

Re: what crawler do you use for Solr indexing?

2009-03-06 Thread Andrzej Bialecki
reation of such docs ;) -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com

Re: Integrating Solr and Nutch

2009-02-27 Thread Andrzej Bialecki
ed to reindex your segments using the solrindex command, and change the searcher configuration. See nutch-default.xml for details. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval

Re: Redhat vs FreeBSD vs other unix flavors

2009-02-27 Thread Andrzej Bialecki
Otis Gospodnetic wrote: You should be fine on either Linux or FreeBSD (or any other UNIX flavour). Running on Solaris would probably give you access to goodness like dtrace, but you can live without it. There's dtrace on FreeBSD, too. -- Best regards, Andrzej Bia

Re: Please help me integrate Nutch with Solr

2008-12-29 Thread Andrzej Bialecki
integrated within a couple days - please monitor this issue, and when it's done just download the patched code. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__||

Re: [VOTE] Community Logo Preferences

2008-11-27 Thread Andrzej Bialecki
/apache_solr_c_blue.jpg -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com

Re: TextProfileSigature using deduplication

2008-11-20 Thread Andrzej Bialecki
n the proceedings of SIGIR-08, which presents an interesting and relatively simple algorithm that yields excellent results. Who has some spare CPU cycles to implement this? ;) http://ilpubs.stanford.edu:8090/860/ -- Best regards, Andrzej Bia

Re: TextProfileSigature using deduplication

2008-11-18 Thread Andrzej Bialecki
f more than 1 word, up to N (e.g. 5) - this should work in your case. Ultimately, what you are probably looking for is a shingle-based algorithm, but it's relatively costly and requires multiple pas

Re: maxFieldLength

2008-11-07 Thread Andrzej Bialecki
rease the length of posting lists, which leads to increased memory/CPU consumption during decoding and traversing of the lists. Also, the overall increased number of positions will have an impact on the index size. -- Best regards, Andrzej Bia

Re: Advice on analysis/filtering?

2008-10-16 Thread Andrzej Bialecki
model that for any given soundexed phrase can generate the most probable original phrases. Also, knowing the context in which a query is asked may help, but usually you don't have this information (queries are short). -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _

Re: Adding bias to Distributed search feature?

2008-09-15 Thread Andrzej Bialecki
It sounds straightforward, and relieves your from the need to de-duplicate your collection. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, Syst

Re: Extending Solr with custom filter

2008-09-12 Thread Andrzej Bialecki
algorithmic stemming it provides a dictionary-based stemming, and these two methods nicely complement each other. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Sem

Re: Solr Logo thought

2008-08-06 Thread Andrzej Bialecki
t lost in logos of small size - or come up with logos of reduced complexity for smaller size versions * avoid large splashes of uniform strong color - these look bad on large logos, like poster-sized. -- Best r

Re: Lucene-based Distributed Index Leveraging Hadoop

2008-02-07 Thread Andrzej Bialecki
at we use versioning, and that we have a "shard manager" that knows the latest versions of each shard among the whole active set - or that clients discover this dynamically by querying the shard servers every

Re: big perf-difference between solr-server vs. SOlrJ req.process(solrserver)

2008-01-02 Thread Andrzej Bialecki
nt server). -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com

Re: multilingual list of stopwords

2007-10-18 Thread Andrzej Bialecki
should first perform language identification, and then apply the correct stopword list. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, Syste

Re: solr, snippets and stored field in nutch...

2007-10-15 Thread Andrzej Bialecki
ver going to request the summaries for those documents). That is the case I was referring to below. This is the case for which Nutch architecture is optimized. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Infor