also take a look at:
http://wiki.apache.org/solr/HierarchicalFaceting
+ SOLR-64, SOLR-792
+ http://markmail.org/message/jxbw2m5a6zq5jhlp
Regards,
Peter.
> Take a look at "Mastering the Power of Faceted Search with Chris
> Hostetter"
> (http://www.lucidimagination.com/solutions/webcasts/faceting).
I think this is possible, at the Lucene level.
You want a mix of PrefixQuery and SpanFirstQuery (hmm we don't have a
SpanLastQuery...).
I believe you can do this by creating a PrefixQuery, and then
customizing the rewrite method to create a BQ (SHOULD clauses) of
SpanFirstQuery, instead of TermQu
Hi
were you successful in trying SOLR -1604 to allow wild card queries in
phrases ?
Also does this plugin allow us to use proximity with wild card
* "solr mail*"~10 *
If this the right approach to go ahead to support these functionalities?
thanks
Mark
On Wed, Aug 4, 2010 at 2:24
There is a ticket for this request:
https://issues.apache.org/jira/browse/SOLR-1980
--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
On 5. okt. 2010, at 08.39, Maddy.Jsh wrote:
>
> Hi,
>
> I have 2 documents with following values.
> Doc1
> Subject: Weekly transport
>
Hi Mark,
unfortanelly it's still on my ToDo list... :(.
I don't know if it allows "solr mail*"~10 . I hope so, as i'll need that also
on the future.
Frederico
De: Mark N [mailto:nipen.m...@gmail.com]
Enviada: ter 05-10-2010 11:29
Para: solr-user@lucene.apache
> Also does this plugin allow us to use proximity with wild
> card
> * "solr mail*"~10 *
>
Yes it supports "solr mail*"~10 kind of queries without any problem.
Currently it throws exception with "mail*" kind of queries, but they are not
valid phrase queries. Because there is only one
Thanks ahmet
Is it also possible to search the document having a field ENDING with
"week*"
query should return documents with a field ending with week and its
derivatives such as weekly,weeks
So above query should return
"this week"
"Past three weeks"
"Report weekly"
thanks
chandan
On Tue
> Is it also possible to search the document having a
> field ENDING with
> "week*"
>
> query should return documents with a field ending
> with week and its
> derivatives such as weekly,weeks
>
> So above query should return
>
> "this week"
> "Past three weeks"
> "Report weekly"
No this is n
--- On Tue, 10/5/10, Mark N wrote:
> From: Mark N
> Subject: Re: wildcard and proximity searches
> To: solr-user@lucene.apache.org
> Date: Tuesday, October 5, 2010, 2:30 PM
> Thanks ahmet
>
> Is it also possible to search the document having a
> field ENDING with
> "week*"
>
> query should
Hi All
I want to index a large number of pdf documents i have found a reference by
searching on Google that it could be done by apache tika project, but
unfortunately i didn't find any refernce stating how to configure apache tika
with solr. Can any body state how I could do this i have downloa
Information about Lucid's Certified Distribution is at:
http://www.lucidimagination.com/Downloads/LucidWorks-for-Lucene
It's available at no cost and includes:
* Complete version of Apache Lucene v3.0.1
* Additional bug fixes from the current trunk of the Lucene project
*
Just to clarify, the effect I was look for was this.
Sneakers
Men (22)
Women (43)
AFTER a user filters by one of those, they would be presented with a NEW
facet field such as
Sneakers
Men
Size 7
Size 7
Size 7
Vincent Vu Nguyen
-Original Message-
From: Otis
Nguyen, Vincent (CDC/OD/OADS) (CTR) would like to recall the message, "multi
level faceting".
Just to clarify, the effect I was look for was this.
Sneakers
Men (22)
Women (43)
AFTER a user filters by one of those, they would be presented with a NEW
facet field such as
Sneakers
Men
Size 7 (10)
Size 8 (11)
Size 9 (23)
Vincent Vu Nguyen
-Original Message-
I've a similar problem with a project I'm working on now. I am holding out for
either SOLR-64 or SOLR-792 being a bit more mature before I need the
functionality but if not I was thinking I could do multi-level faceting by
indexing the data as a "String" like this:
id: 1
SHOE: Sneakers|Men|Siz
Can anyone explain to me the practical difference (i.e. in terms of results)
between query slop and phrase slop?
Removing those components is not likely to impact performance very much, if
at all. I would focus on other areas when tuning performance, such as
looking memory usage and configuration, query design, etc. But there isn't
any harm in removing them either. Why not do some load tests with the
componen
> Can anyone explain to me the
> practical difference (i.e. in terms of results)
> between query slop and phrase slop?
I think you are asking about dismax's parameters, right?
ps (Phrase Slop) is about pf parameter.
qs (Query Phrase Slop) : You cannot use tilde operator with dismax, so this
par
Thank you. I am talking about dismax's parameters.
This is how I understand things, please tell me where I'm wrong:
Query slop (qs) = how many words you can move the query to match the text.
Phrase slop (ps) (when used in conjunction with &pf=text - is there another
possibility?) = how many words
> There are EdgeNGramFilterFactory
> & EdgeNGramTokenizerFactory.
>
> Likewise there are StandardFilterFactory &
> StandardTokenizerFactory.
>
> LowerCaseFilterFactory & LowerCaseTokenizerFactory.
>
> Seems like they always come in pairs.
>
> What are the differences between FilterFactory and
Hi all,
At some point we will need to re-build an index that totals about 3 terabytes
in size (split over 12 shards). At our current indexing speed we estimate that
this will take about 4 weeks. We would like to reduce that time. It appears
that our main bottleneck is disk I/O during index m
Thank you once again Betsy!
Vincent Vu Nguyen
Division of Science Quality and Translation
Office of the Associate Director for Science
Centers for Disease Control and Prevention (CDC)
404-498-6154
Century Bldg 2400
Atlanta, GA 30329
-Original Message-
From: Burton-West, Tom [mailto:tbur
Nguyen, Vincent (CDC/OD/OADS) (CTR) would like to recall the message,
"Experience with large merge factors".
Nguyen, Vincent (CDC/OD/OADS) (CTR) would like to recall the message,
"Experience with large merge factors".
Good Evening and Morning.
I noticed that if I do a facet search on a field which value contains
umlaute (öäü),
the facet list returned converted the value of the field into a normal
character (oau)..
How do I precent this from happening?
I cant seem to find the configuration for faceting in
Hi Yonik,
I tried the fix suggested in your comments (using "solr.TrieDateField" ),
and it loaded up 130 cores in 1 minute, 1.3GB memory (a little more than 1GB
when turning off static warm cache, and much less than 6.5GB when use
'solr.DateField').
Will this have any impact on first query or per
Hi Max,
why don't you use WordDelimiterFilterFactory directly? I'm doing the same
stuff inside my own analyzer:
final Map args = new HashMap();
args.put("generateWordParts", "1");
args.put("generateNumberParts", "1");
args.put("catenateWords", "0");
args.put("catenateNumbers", "0");
args.put("ca
I am still trying to learn Solr. My Solr configuration is based on the
default example schema.xml (I haven't customize the field types). I am
using text for the fields that I want highlighting on.
I am searching for "medication", but I see that "Medical" is highlighted.
Should this be the case?
I guess I missed the init() method. I was looking at the factory and
thought I saw config loading stuff (like getInt) which I assumed meant it
need to have schema.xml available.
Thanks!
-Max
On Tue, Oct 5, 2010 at 2:36 PM, Mathias Walter wrote:
> Hi Max,
>
> why don't you use WordDelimiterFilt
I am developing a new schema. It has a pattern filter that trims
leading and trailing punctuation from terms.
It is resulting in empty terms, because there are situations in the
analyzer stream where a term happens to be composed of nothing but
punctuation. This problem is not happening in
I'm not sure if this is the best approach but a LengthFilter will stop blank
terms.
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.LengthFilterFactory
-Original message-
From: Shawn Heisey
Sent: Wed 06-10-2010 00:25
To: solr-user@lucene.apache.org;
Subject: PatternRe
On Oct 5, 2010, at 6:24pm, Shawn Heisey wrote:
I am developing a new schema. It has a pattern filter that trims
leading and trailing punctuation from terms.
It is resulting in empty terms, because there are situations in the
analyzer stream where a term happens to be composed of nothing
Actually, it might be a good idea to add an optional setting to the
PatternTokenizer that doesn't emit blank terms. Perhaps a allowBlanks="false"
would be a pleasant additional to the PatternTokenizer so an additional
LengthFilter can be left out and thus spare CPU cycles and some memory.
Hello,
It seems that your analysis process removes punctuation and therefore
indexes terms without it. What you see in the faceted result is the text
that has been indexed.
If you select a Tokenizer/Token Filter which preserves punctuation you
should be able to see what you want.
Cheers,
-- Savv
It is a good practice (for many cases as seen on the list) to search (usually
with fq) on analzyed fields but return the facet list based on the unanalyzed
counterparts.
-Original message-
From: Savvas-Andreas Moysidis
Sent: Wed 06-10-2010 00:46
To: solr-user@lucene.apache.org;
Subjec
alternatively, you can use "+" instead of "*" in your regular expressions so
that you dont match them at all...
I think the PatternTokenizer is doing the right thing, if your expression
says that a blank term is acceptable.
On Tue, Oct 5, 2010 at 6:39 PM, Markus Jelsma wrote:
> Actually, it migh
(10/10/06 4:41), Khai Doan wrote:
I am still trying to learn Solr. My Solr configuration is based on the
default example schema.xml (I haven't customize the field types). I am
using text for the fields that I want highlighting on.
I am searching for "medication", but I see that "Medical" is hi
Good point,
so you could have an unanalyzed counterpart field set with a
and facet on that..
On 5 October 2010 23:49, Markus Jelsma wrote:
> It is a good practice (for many cases as seen on the list) to search
> (usually with fq) on analzyed fields but return the facet list based on the
> unan
4 weeks is a depressingly long time to re-index!
Do you use multiple threads for indexing? Large RAM buffer size is
also good, but I think perf peaks out mabye around 512 MB (at least
based on past tests)?
Believe it or not, merging is typically compute bound. It's costly to
decode & re-encode
I see that medication got reduced to medic.
Thanks,
Khai
On Tue, Oct 5, 2010 at 3:57 PM, Koji Sekiguchi wrote:
> (10/10/06 4:41), Khai Doan wrote:
>
>> I am still trying to learn Solr. My Solr configuration is based on the
>> default example schema.xml (I haven't customize the field types). I
Hi Siva,
Have a look at the Solr Wiki, there is some good info on that topic there. See
http://search-lucene.com/?q=search+performance&fc_project=Solr&fc_type=wiki
Otis
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/
-
What is your index analyzer chain? In other words, are you sure the
token '1' is actually indexed?
"Marsh1" is, depending on the default operator, probably searching
for 'marsh OR 1", and hitting on only "marsh". But that's a guess
HTH
Erick
On Mon, Oct 4, 2010 at 10:24 PM, javaxmlsoapdev w
On 10/5/2010 6:34 PM, Ken Krugler wrote:
Is there any existing way to remove empty terms during analysis? I
tried TrimFilterFactory but that made no difference.
You could use LengthFilterFactory to restrict terms to being at least
one character long.
Is this a bug in PatternReplaceFilter
On 10/5/2010 6:28 PM, Markus Jelsma wrote:
I'm not sure if this is the best approach but a LengthFilter will stop blank
terms.
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.LengthFilterFactory
Two people with the answer I needed. Thank you!
Shawn
You could do periodic small optimizes. The optimize command now
includes 'maxSegments' which limits the target number of segments.
It is possible to write a Lucene program that collects a bunch of
segments and annoints it as an index. This gives you a way to collect
segments after you write them w
Faceting on analyzed text can eat a lot of RAM. This strategy might not scale.
On Tue, Oct 5, 2010 at 4:00 PM, Savvas-Andreas Moysidis
wrote:
> Good point,
>
> so you could have an unanalyzed counterpart field set with a
> and facet on that..
>
> On 5 October 2010 23:49, Markus Jelsma wrote:
>
Hi Mahesh,
here your AlchemyAPI calls are failing, in fact their status is ERROR (sent
by AlchemyAPI webservice itself) so you should try your service call outside
Solr/UIMA, for example from their website and see if and why it's failing
with the text you're trying to enrich.
However you can post h
47 matches
Mail list logo