Re: multi level faceting

2010-10-05 Thread Peter Karich
also take a look at: http://wiki.apache.org/solr/HierarchicalFaceting + SOLR-64, SOLR-792 + http://markmail.org/message/jxbw2m5a6zq5jhlp Regards, Peter. > Take a look at "Mastering the Power of Faceted Search with Chris > Hostetter" > (http://www.lucidimagination.com/solutions/webcasts/faceting).

Re: Begins with and ends with word

2010-10-05 Thread Michael McCandless
I think this is possible, at the Lucene level. You want a mix of PrefixQuery and SpanFirstQuery (hmm we don't have a SpanLastQuery...). I believe you can do this by creating a PrefixQuery, and then customizing the rewrite method to create a BQ (SHOULD clauses) of SpanFirstQuery, instead of TermQu

Re: wildcard and proximity searches

2010-10-05 Thread Mark N
Hi were you successful in trying SOLR -1604 to allow wild card queries in phrases ? Also does this plugin allow us to use proximity with wild card * "solr mail*"~10 * If this the right approach to go ahead to support these functionalities? thanks Mark On Wed, Aug 4, 2010 at 2:24

Re: Begins with and ends with word

2010-10-05 Thread Jan Høydahl / Cominvent
There is a ticket for this request: https://issues.apache.org/jira/browse/SOLR-1980 -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 5. okt. 2010, at 08.39, Maddy.Jsh wrote: > > Hi, > > I have 2 documents with following values. > Doc1 > Subject: Weekly transport >

RE: wildcard and proximity searches

2010-10-05 Thread Frederico Azeiteiro
Hi Mark, unfortanelly it's still on my ToDo list... :(. I don't know if it allows "solr mail*"~10 . I hope so, as i'll need that also on the future. Frederico De: Mark N [mailto:nipen.m...@gmail.com] Enviada: ter 05-10-2010 11:29 Para: solr-user@lucene.apache

Re: wildcard and proximity searches

2010-10-05 Thread Ahmet Arslan
> Also does this plugin allow us to use proximity with wild > card > *          "solr mail*"~10 * > Yes it supports "solr mail*"~10 kind of queries without any problem. Currently it throws exception with "mail*" kind of queries, but they are not valid phrase queries. Because there is only one

Re: wildcard and proximity searches

2010-10-05 Thread Mark N
Thanks ahmet Is it also possible to search the document having a field ENDING with "week*" query should return documents with a field ending with week and its derivatives such as weekly,weeks So above query should return "this week" "Past three weeks" "Report weekly" thanks chandan On Tue

Re: wildcard and proximity searches

2010-10-05 Thread Ahmet Arslan
> Is it also possible to search the document having a  > field ENDING with > "week*" > > query should return documents with a field ending > with  week and its > derivatives such as weekly,weeks > > So above query should return > > "this week" > "Past three weeks" > "Report weekly" No this is n

Re: wildcard and proximity searches

2010-10-05 Thread Ahmet Arslan
--- On Tue, 10/5/10, Mark N wrote: > From: Mark N > Subject: Re: wildcard and proximity searches > To: solr-user@lucene.apache.org > Date: Tuesday, October 5, 2010, 2:30 PM > Thanks ahmet > > Is it also possible to search the document having a  > field ENDING with > "week*" > > query should

Configuriguration of ExtractRequestHandler

2010-10-05 Thread Ahson Iqbal
Hi All I want to index a large number of pdf documents i have found a reference by searching on Google that it could be done by apache tika project, but unfortunately i didn't find any refernce stating how to configure apache tika with solr. Can any body state how I could do this i have downloa

Re: Different between Lucid dist. & Apache dist. ?

2010-10-05 Thread mbohlig
Information about Lucid's Certified Distribution is at: http://www.lucidimagination.com/Downloads/LucidWorks-for-Lucene It's available at no cost and includes: * Complete version of Apache Lucene v3.0.1 * Additional bug fixes from the current trunk of the Lucene project *

RE: multi level faceting

2010-10-05 Thread Nguyen, Vincent (CDC/OD/OADS) (CTR)
Just to clarify, the effect I was look for was this. Sneakers Men (22) Women (43) AFTER a user filters by one of those, they would be presented with a NEW facet field such as Sneakers Men Size 7 Size 7 Size 7 Vincent Vu Nguyen -Original Message- From: Otis

Recall: multi level faceting

2010-10-05 Thread Nguyen, Vincent (CDC/OD/OADS) (CTR)
Nguyen, Vincent (CDC/OD/OADS) (CTR) would like to recall the message, "multi level faceting".

RE: multi level faceting

2010-10-05 Thread Nguyen, Vincent (CDC/OD/OADS) (CTR)
Just to clarify, the effect I was look for was this. Sneakers Men (22) Women (43) AFTER a user filters by one of those, they would be presented with a NEW facet field such as Sneakers Men Size 7 (10) Size 8 (11) Size 9 (23) Vincent Vu Nguyen -Original Message-

RE: multi level faceting

2010-10-05 Thread Dyer, James
I've a similar problem with a project I'm working on now. I am holding out for either SOLR-64 or SOLR-792 being a bit more mature before I need the functionality but if not I was thinking I could do multi-level faceting by indexing the data as a "String" like this: id: 1 SHOE: Sneakers|Men|Siz

Query slop vs. phrase slop

2010-10-05 Thread David Boxenhorn
Can anyone explain to me the practical difference (i.e. in terms of results) between query slop and phrase slop?

Re: Tuning Solr

2010-10-05 Thread Jay Hill
Removing those components is not likely to impact performance very much, if at all. I would focus on other areas when tuning performance, such as looking memory usage and configuration, query design, etc. But there isn't any harm in removing them either. Why not do some load tests with the componen

Re: Query slop vs. phrase slop

2010-10-05 Thread Ahmet Arslan
> Can anyone explain to me the > practical difference (i.e. in terms of results) > between query slop and phrase slop? I think you are asking about dismax's parameters, right? ps (Phrase Slop) is about pf parameter. qs (Query Phrase Slop) : You cannot use tilde operator with dismax, so this par

Re: Query slop vs. phrase slop

2010-10-05 Thread David Boxenhorn
Thank you. I am talking about dismax's parameters. This is how I understand things, please tell me where I'm wrong: Query slop (qs) = how many words you can move the query to match the text. Phrase slop (ps) (when used in conjunction with &pf=text - is there another possibility?) = how many words

Re: Differences between FilterFactory and TokenizerFactory?

2010-10-05 Thread Ahmet Arslan
> There are EdgeNGramFilterFactory > & EdgeNGramTokenizerFactory. > > Likewise there are StandardFilterFactory & > StandardTokenizerFactory. > > LowerCaseFilterFactory & LowerCaseTokenizerFactory. > > Seems like they always come in pairs. > > What are the differences between FilterFactory and

Experience with large merge factors

2010-10-05 Thread Burton-West, Tom
Hi all, At some point we will need to re-build an index that totals about 3 terabytes in size (split over 12 shards). At our current indexing speed we estimate that this will take about 4 weeks. We would like to reduce that time. It appears that our main bottleneck is disk I/O during index m

RE: Experience with large merge factors

2010-10-05 Thread Nguyen, Vincent (CDC/OD/OADS) (CTR)
Thank you once again Betsy! Vincent Vu Nguyen Division of Science Quality and Translation Office of the Associate Director for Science Centers for Disease Control and Prevention (CDC) 404-498-6154 Century Bldg 2400 Atlanta, GA 30329 -Original Message- From: Burton-West, Tom [mailto:tbur

Recall: Experience with large merge factors

2010-10-05 Thread Nguyen, Vincent (CDC/OD/OADS) (CTR)
Nguyen, Vincent (CDC/OD/OADS) (CTR) would like to recall the message, "Experience with large merge factors".

Recall: Experience with large merge factors

2010-10-05 Thread Nguyen, Vincent (CDC/OD/OADS) (CTR)
Nguyen, Vincent (CDC/OD/OADS) (CTR) would like to recall the message, "Experience with large merge factors".

Umlaut in facet name attribute

2010-10-05 Thread alexander sulz
Good Evening and Morning. I noticed that if I do a facet search on a field which value contains umlaute (öäü), the facet list returned converted the value of the field into a normal character (oau).. How do I precent this from happening? I cant seem to find the configuration for faceting in

Re: Upgrade to Solr 1.4, very slow at start up when loading all cores

2010-10-05 Thread Renee Sun
Hi Yonik, I tried the fix suggested in your comments (using "solr.TrieDateField" ), and it loaded up 130 cores in 1 minute, 1.3GB memory (a little more than 1GB when turning off static warm cache, and much less than 6.5GB when use 'solr.DateField'). Will this have any impact on first query or per

RE: Using Solr Analyzers in Lucene

2010-10-05 Thread Mathias Walter
Hi Max, why don't you use WordDelimiterFilterFactory directly? I'm doing the same stuff inside my own analyzer: final Map args = new HashMap(); args.put("generateWordParts", "1"); args.put("generateNumberParts", "1"); args.put("catenateWords", "0"); args.put("catenateNumbers", "0"); args.put("ca

Should "Medical" be highlighted when user search for "medication"?

2010-10-05 Thread Khai Doan
I am still trying to learn Solr. My Solr configuration is based on the default example schema.xml (I haven't customize the field types). I am using text for the fields that I want highlighting on. I am searching for "medication", but I see that "Medical" is highlighted. Should this be the case?

Re: Using Solr Analyzers in Lucene

2010-10-05 Thread Max Lynch
I guess I missed the init() method. I was looking at the factory and thought I saw config loading stuff (like getInt) which I assumed meant it need to have schema.xml available. Thanks! -Max On Tue, Oct 5, 2010 at 2:36 PM, Mathias Walter wrote: > Hi Max, > > why don't you use WordDelimiterFilt

PatternReplaceFilterFactory creating empty string as a term

2010-10-05 Thread Shawn Heisey
I am developing a new schema. It has a pattern filter that trims leading and trailing punctuation from terms. It is resulting in empty terms, because there are situations in the analyzer stream where a term happens to be composed of nothing but punctuation. This problem is not happening in

RE: PatternReplaceFilterFactory creating empty string as a term

2010-10-05 Thread Markus Jelsma
I'm not sure if this is the best approach but a LengthFilter will stop blank terms. http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.LengthFilterFactory   -Original message- From: Shawn Heisey Sent: Wed 06-10-2010 00:25 To: solr-user@lucene.apache.org; Subject: PatternRe

Re: PatternReplaceFilterFactory creating empty string as a term

2010-10-05 Thread Ken Krugler
On Oct 5, 2010, at 6:24pm, Shawn Heisey wrote: I am developing a new schema. It has a pattern filter that trims leading and trailing punctuation from terms. It is resulting in empty terms, because there are situations in the analyzer stream where a term happens to be composed of nothing

RE: PatternReplaceFilterFactory creating empty string as a term

2010-10-05 Thread Markus Jelsma
Actually, it might be a good idea to add an optional setting to the PatternTokenizer that doesn't emit blank terms. Perhaps a allowBlanks="false" would be a pleasant additional to the PatternTokenizer so an additional LengthFilter can be left out and thus spare CPU cycles and some memory.  

Re: Umlaut in facet name attribute

2010-10-05 Thread Savvas-Andreas Moysidis
Hello, It seems that your analysis process removes punctuation and therefore indexes terms without it. What you see in the faceted result is the text that has been indexed. If you select a Tokenizer/Token Filter which preserves punctuation you should be able to see what you want. Cheers, -- Savv

RE: Re: Umlaut in facet name attribute

2010-10-05 Thread Markus Jelsma
It is a good practice (for many cases as seen on the list) to search (usually with fq) on analzyed fields but return the facet list based on the unanalyzed counterparts.   -Original message- From: Savvas-Andreas Moysidis Sent: Wed 06-10-2010 00:46 To: solr-user@lucene.apache.org; Subjec

Re: PatternReplaceFilterFactory creating empty string as a term

2010-10-05 Thread Robert Muir
alternatively, you can use "+" instead of "*" in your regular expressions so that you dont match them at all... I think the PatternTokenizer is doing the right thing, if your expression says that a blank term is acceptable. On Tue, Oct 5, 2010 at 6:39 PM, Markus Jelsma wrote: > Actually, it migh

Re: Should "Medical" be highlighted when user search for "medication"?

2010-10-05 Thread Koji Sekiguchi
(10/10/06 4:41), Khai Doan wrote: I am still trying to learn Solr. My Solr configuration is based on the default example schema.xml (I haven't customize the field types). I am using text for the fields that I want highlighting on. I am searching for "medication", but I see that "Medical" is hi

Re: Re: Umlaut in facet name attribute

2010-10-05 Thread Savvas-Andreas Moysidis
Good point, so you could have an unanalyzed counterpart field set with a and facet on that.. On 5 October 2010 23:49, Markus Jelsma wrote: > It is a good practice (for many cases as seen on the list) to search > (usually with fq) on analzyed fields but return the facet list based on the > unan

Re: Experience with large merge factors

2010-10-05 Thread Michael McCandless
4 weeks is a depressingly long time to re-index! Do you use multiple threads for indexing? Large RAM buffer size is also good, but I think perf peaks out mabye around 512 MB (at least based on past tests)? Believe it or not, merging is typically compute bound. It's costly to decode & re-encode

Re: Should "Medical" be highlighted when user search for "medication"?

2010-10-05 Thread Khai Doan
I see that medication got reduced to medic. Thanks, Khai On Tue, Oct 5, 2010 at 3:57 PM, Koji Sekiguchi wrote: > (10/10/06 4:41), Khai Doan wrote: > >> I am still trying to learn Solr. My Solr configuration is based on the >> default example schema.xml (I haven't customize the field types). I

Re: Solr admin level configurations for production

2010-10-05 Thread Otis Gospodnetic
Hi Siva, Have a look at the Solr Wiki, there is some good info on that topic there. See http://search-lucene.com/?q=search+performance&fc_project=Solr&fc_type=wiki Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ -

Re: Numeric search in text field

2010-10-05 Thread Erick Erickson
What is your index analyzer chain? In other words, are you sure the token '1' is actually indexed? "Marsh1" is, depending on the default operator, probably searching for 'marsh OR 1", and hitting on only "marsh". But that's a guess HTH Erick On Mon, Oct 4, 2010 at 10:24 PM, javaxmlsoapdev w

Re: PatternReplaceFilterFactory creating empty string as a term

2010-10-05 Thread Shawn Heisey
On 10/5/2010 6:34 PM, Ken Krugler wrote: Is there any existing way to remove empty terms during analysis? I tried TrimFilterFactory but that made no difference. You could use LengthFilterFactory to restrict terms to being at least one character long. Is this a bug in PatternReplaceFilter

Re: PatternReplaceFilterFactory creating empty string as a term

2010-10-05 Thread Shawn Heisey
On 10/5/2010 6:28 PM, Markus Jelsma wrote: I'm not sure if this is the best approach but a LengthFilter will stop blank terms. http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.LengthFilterFactory Two people with the answer I needed. Thank you! Shawn

Re: Experience with large merge factors

2010-10-05 Thread Lance Norskog
You could do periodic small optimizes. The optimize command now includes 'maxSegments' which limits the target number of segments. It is possible to write a Lucene program that collects a bunch of segments and annoints it as an index. This gives you a way to collect segments after you write them w

Re: Re: Umlaut in facet name attribute

2010-10-05 Thread Lance Norskog
Faceting on analyzed text can eat a lot of RAM. This strategy might not scale. On Tue, Oct 5, 2010 at 4:00 PM, Savvas-Andreas Moysidis wrote: > Good point, > > so you could have an unanalyzed counterpart field set with a > and facet on that.. > > On 5 October 2010 23:49, Markus Jelsma wrote: >

Re: Solr UIMA integration

2010-10-05 Thread Tommaso Teofili
Hi Mahesh, here your AlchemyAPI calls are failing, in fact their status is ERROR (sent by AlchemyAPI webservice itself) so you should try your service call outside Solr/UIMA, for example from their website and see if and why it's failing with the text you're trying to enrich. However you can post h