AndrzejBialecki to this group. Thank you!
--
Best regards,
Andrzej Bialecki
http://www.sigram.com, blog http://www.sigram.com/blog
___.,___,___,___,_._. __<><
[___||.__|__/|__||\/|: Information Retrieval, System Integration
___|||__||..\|..||..|: Contact: info at s
ample retrieve stored fields of this
document.
As it's shown in the Explanation-s, it can be only used to co-ordinate
parts of the query that matched the same document number.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|_
://www.siam.org/proceedings/alenex/2008/alx08_01transierf.pdf
http://research.google.com/pubs/archive/37365.pdf
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embe
some parts missing - see LUCENE-3622.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
there.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
chanism to pull in a copy of the index from a running Solr instance.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
h
See LUCENE-1812 for another practical application of this concept.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
/browse/SOLR-1535
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
o use that excess of memory, but it
won't be available for OS-level disk IO. Therefore reducing the heap
size may actually increase your performance.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| In
rease
dramatically (and the performance will drop then). Modern OS-es try to
keep as much data in memory as possible, so the memory usage itself is
not that informative - but check what are the pagein/pageout rates when
you start hitting the 32 vs 64 cores.
--
Best regards,
Andrzej Bia
On 4/8/11 9:55 PM, Andy wrote:
--- On Fri, 4/8/11, Andrzej Bialecki wrote:
:) If you don't need the new functionality in 4.x, you don't
need the performance improvements,
What performance improvements does 4.x have over 3.1?
Ah... well, many - take a look at the C
reindexing cycles are long (indexes tend
to stay around) then 3.1 is a safer bet. If you need a dozen or so new
exciting features (e.g. results grouping) or top performance, or if you
need LucidWorks with Click and other goodies, then use 4.x and be
prepared for an occasional full
orum
http://www.lucidimagination.com/forum/ .
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact:
contrib/patch that was applied?
At the moment it's proprietary. I will have a talk at the Lucene
Revolution conference that describes the Click tools in detail.
--
Best regards,
Andrzej Bia
. For now it's better to pass openNew=false
and be prepared to get a null.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System In
On 2010-10-25 13:37, Toke Eskildsen wrote:
> On Mon, 2010-10-25 at 11:50 +0200, Andrzej Bialecki wrote:
>> * there is an exact solution to this problem, namely to make two
>> distributed calls instead of one (first call to collect per-shard IDFs
>> for given query terms, se
e in scores across
shards, or whether you want to bear the cost of an additional
distributed RPC for every query...
To summarize, I would qualify your statement with: "...if the
composition of your shards is drastically different". Otherwise the cost
of
1536, it contains an example of a tokenizing chain
that could use a language detector to create different fields (or
tokenize differently) based on this decision.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| In
On 2010-09-06 22:03, Dennis Gearon wrote:
What is a 'simple MOD'?
md5(docId) % numShards
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Em
tions where a simple MOD won't do ;) so I
think it would be good to hide this strategy behind an
interface/abstract class. It costs nothing, and gives you flexibility in
how you implement this mappin
other core
(instead of using the current sub-index hack).
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://
s committed
I'm sure people will follow up with user-level convenience components
that will make it easier.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \|
tegrated
with Nutch).
SolrCloud is not far away from hitting the trunk (right, Mark? ;) ), so
medium-term I think this is your best bet.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retr
xt into
different fields, which can then be analyzed differently.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
On 2010-06-03 13:38, Michael Kuhlmann wrote:
> Am 03.06.2010 13:02, schrieb Andrzej Bialecki:
>> ..., and deploy this
>> index in a separate JVM (to benefit from other CPUs than the one that
>> runs your Solr core)
>
> Every known webserver ist multithreaded by de
atching terms as the
> values.
That would consume an awful lot of RAM... see SOLR-1316 for some
measurements.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \
On 2010-06-02 13:12, Grant Ingersoll wrote:
>
> On Jun 2, 2010, at 6:53 AM, Andrzej Bialecki wrote:
>
>> On 2010-06-02 12:42, Grant Ingersoll wrote:
>>>
>>> On Jun 1, 2010, at 9:54 PM, Blargy wrote:
>>>
>>>>
>>>> We have a
he absolute fastest way
> that I know of to index is via multiple threads sending batches of documents
> at a time (at least 100). Often, from DBs one can split up the table via SQL
> statements that can then be fetched separately. You may want to wri
On 2010-05-15 02:46, Blargy wrote:
>
> Thanks for your help and especially your analyzer.. probably saved me a
> full-import or two :)
>
Also, take a look at this issue:
https://issues.apache.org/jira/browse/SOLR-1316
--
Best regards,
Andr
On 2010-03-31 06:14, Andy wrote:
--- On Tue, 3/30/10, Andrzej Bialecki wrote:
From: Andrzej Bialecki
Subject: Re: SOLR-1316 How To Implement this autosuggest component ???
To: solr-user@lucene.apache.org
Date: Tuesday, March 30, 2010, 9:59 AM
On 2010-03-30 15:42, Robert Muir
wrote:
On Mon
n they correspond to
the frequency of terms/phrases in the query logs ...
TermsComponent and EdgeNGrams, while simple to use, suffer from both issues.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|
ut then it's nearly equivalent
to the TermsComponent; or from a list of frequent queries - but you need
to build that list yourself).
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| In
gRequestHandler to actually parse the streams, and then you
combine the results arbitrarily in your handler, eventually sending an
AddUpdateCommand to the update processor. You can obtain both the update
processor and SolrCell instance from req.getCore().
--
Best regards,
Andrzej Bialecki &
contrib/ is a quick and perhaps
acceptable solution ...
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.s
tely can't do.
Could you perhaps elaborate a bit on this functionality? Your
description sounds intriguing - it reminds me of ParallelReader, but I'm
probably completely wrong ...
--
Best regar
that shows fragments of XML config files.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Cont
E document consists of 4 fields, F1, F2, F3, F4
Now I want to update the value of field F2, so if I send the update xml to
SOLR, can it keep the old field values for F1,F3,F4 and update the new value
specified for F2?
Best Regards,
Kranti K K Parisa
--
Best regards,
Andrze
aven't got the books in front of me).
Kullback-Leibler divergence?
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
h
Lucene - but in
practice this may be too costly.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
ow those values out from the response...
You can also implement a SearchComponent that post-processes results and
based on the schema if a field is missing then it adds an empty node to
the result.
--
Best regards,
A
On 2010-01-10 01:55, Lance Norskog wrote:
Make two copies of the index. In each copy, delete the records you do
not want. Optimize.
... which is essentially what the MultiPassIndexSplitter does, only it
avoids the initial copy (by deleting in the source index).
--
Best regards,
Andrzej
is is to index compound words, i.e. when
producing a spellchecker dictionary add a record "tommyhitfiger" with a
field that points to "tommy hitfiger". Details vary depending on what
spellc
tions, etc. The cost for this flexibility is
that it needs to read index files multiple times (hence "multi-pass").
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information
take a look at SOLR-1316, there are patches there that implement
such component using prefix trees.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| ||
erms
and rotates the query term appropriately.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
INFO: [idx1] webapp=/solr path=/update/ params={} status=0 QTime=104
INFO: [idx1] webapp=/solr path=/update/ params={} status=0 QTime=52
...
Is this a known issue ?
It may be an issue with System.currentTimeMillis() resolution on some
platforms (e.g. Windows)?
--
Best regards,
Andrzej Bia
Yonik Seeley wrote:
On Mon, Oct 12, 2009 at 12:03 PM, Andrzej Bialecki wrote:
Solr never discarded non-positive hits, and now Lucene 2.9 no longer
does either.
Hmm ... The code that I pasted in my previous email uses
Searcher.search(Query, int), which in turn uses search(Query, Filter, int
(with it's defaults/invariants configured i na way you can't
control) to delegate to.
Indeed - thanks.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___||
Yonik Seeley wrote:
On Mon, Oct 12, 2009 at 5:58 AM, Andrzej Bialecki wrote:
BTW, standard Collectors collect only results
with positive scores, so if you want to collect results with negative scores
as well then you need to use a custom Collector.
Solr never discarded non-positive hits, and
.32427183 = queryNorm
0.15342641 = (MATCH) fieldWeight(a:b in 0), product of:
1.0 = tf(termFreq(a:b)=1)
0.30685282 = idf(docFreq=1, numDocs=1)
0.5 = fieldNorm(field=a, doc=0)
bsh %
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ ___
Shalin Shekhar Mangar wrote:
On Fri, Oct 9, 2009 at 10:53 PM, Andrzej Bialecki wrote:
Hi,
What's the canonical way to pass an update request to another handler? I'm
implementing a handler that has to dispatch its result to different update
handlers based on its internal process
ementation dependent on deployment paths defined in solrconfig.xml.
Using SolrCore.getRequestHandlers(handler.class) often returns the
LazyRequestHandlerWrapper, from which it's not possible to retrieve the
wrapped instance of the handler ..
--
Best regards,
A
>>
Yes. Care should be taken that the query analyzer chain produces the
same forward tokens, because the code in QueryParser that optionally
reverses tokens acts on tokens that it receives _after_ all other query
analyzers have run on the query.
--
Best regards,
f that process on the Solr side.)
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
lable
fields and term counts per field".
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Cont
- just get
IndexReader.terms() enumeration and traverse it.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.s
ype of problems when I would
generate a heap dump on OOM (it's a JVM flag) and then use a tool like
HAT to find largest objects and references to them.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__
/apache/nutch/analysis/lang/LanguageIdentifier.html
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com
queries don't match unrelated text.
Phrase queries that you can construct using QueryParser can't match two
tokens separated by a hole, unless you set a slop value > 0.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[_
impact of spam
pages and to limit the size of LinkDb. If a page hits this limit then
indeed the symptoms that you observe are missing (dropped) links.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| In
reation of such docs ;)
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
ed to reindex your segments using the solrindex command, and
change the searcher configuration. See nutch-default.xml for details.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval
Otis Gospodnetic wrote:
You should be fine on either Linux or FreeBSD (or any other UNIX
flavour). Running on Solaris would probably give you access to
goodness like dtrace, but you can live without it.
There's dtrace on FreeBSD, too.
--
Best regards,
Andrzej Bia
integrated within a couple days - please monitor this
issue, and when it's done just download the patched code.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__||
/apache_solr_c_blue.jpg
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
n the proceedings of SIGIR-08, which presents an interesting and
relatively simple algorithm that yields excellent results. Who has some
spare CPU cycles to implement this? ;)
http://ilpubs.stanford.edu:8090/860/
--
Best regards,
Andrzej Bia
f more than 1 word, up to N (e.g. 5) - this should work in
your case.
Ultimately, what you are probably looking for is a shingle-based
algorithm, but it's relatively costly and requires multiple pas
rease the length of posting lists, which leads
to increased memory/CPU consumption during decoding and traversing of
the lists. Also, the overall increased number of positions will have an
impact on the index size.
--
Best regards,
Andrzej Bia
model that for any given soundexed phrase can generate the
most probable original phrases.
Also, knowing the context in which a query is asked may help, but
usually you don't have this information (queries are short).
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _
It sounds straightforward, and relieves your from the need to
de-duplicate your collection.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, Syst
algorithmic stemming it provides a dictionary-based
stemming, and these two methods nicely complement each other.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Sem
t lost in logos of small
size - or come up with logos of reduced complexity for smaller size versions
* avoid large splashes of uniform strong color - these look bad on large
logos, like poster-sized.
--
Best r
at we use versioning, and that we have a "shard manager"
that knows the latest versions of each shard among the whole active set
- or that clients discover this dynamically by querying the shard
servers every
nt server).
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
should first perform language
identification, and then apply the correct stopword list.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, Syste
ver going to request the summaries for those documents).
That is the case I was referring to below.
This is the case for which Nutch architecture is optimized.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Infor
77 matches
Mail list logo