Re: partial optimize does not reduce the segment number to maxNumSegments

2011-04-13 Thread Renee Sun
ok I dug more into this and realize the file extensions can vary depending on schema, right? for instance we dont have *.tvx, *.tvd, *.tvf (not using term vector)... and I suspect the file extensions may change with future lucene releases? now it seems we can't just count the file using any

Is it possible to create a duplicate field ?

2011-04-13 Thread shrinath.m
For example, I am storing email ids of a person. If the person has 3 email ids, I want to store them as email = 'x...@whatever.com' email = 'a...@blah.com' email = 'p...@moreblah.com' How can we do this ? I know someone will come up with why don't you store it like email1, email2, email3 and

Re: Is it possible to create a duplicate field ?

2011-04-13 Thread William Bell
Just set up your schema with a string multivalued field... On Wed, Apr 13, 2011 at 12:47 AM, shrinath.m shrinat...@webyog.com wrote: For example, I am storing email ids of a person. If the person has 3 email ids, I want to store them as email = 'x...@whatever.com' email = 'a...@blah.com'

Re: Is it possible to create a duplicate field ?

2011-04-13 Thread shrinath.m
Bill Bell wrote: Just set up your schema with a string multivalued field... I've this in my schema: Worked.. Thanks... . -- View this message in context: http://lucene.472066.n3.nabble.com/Is-it-possible-to-create-a-duplicate-field-tp2815029p2815061.html Sent from the Solr - User

ExtractingRequestHandler and Solr 3.1

2011-04-13 Thread Liam O'Boyle
Afternoon, After an upgrade to Solr 3.1 which has largely been very smooth and painless, I'm having a minor issue with the ExtractingRequestHandler. The problem is that it's inserting metadata into the extracted content, as well as mapping it to a dynamic field. Previously the same

Re: Updates during Optimize

2011-04-13 Thread stockii
The current limitation or pause is when the ram buffer is flushing to disk - when an optimize starts and is running ~4 hours, you say, that DIH is flushing the doc`s during this pause into the index ? - --- System One

Re: function query apply only in the subset of the query

2011-04-13 Thread Marco Martinez
No, this query returns a few more documents than if a do it by lucene query parser. I'm going to generate another query parser that send a simple term query and see what is the output, when i have it, i will inform in the mail. Marco Martínez Bautista http://www.paradigmatecnologico.com Avenida

Allowing looser matches

2011-04-13 Thread Mark Mandel
Not sure if the title explains it all, or if what I want is even possible, but figured I would ask. Say, I have a series of products I'm selling, and a search of: Blue Wool Rugs Comes in. This returns 0 results, as Blue and Rugs match terms that are indexes, Wool does not. Is there a way to

RE: Allowing looser matches

2011-04-13 Thread Pierre GOSSE
For (a) I don't think anything exists today providing this mechanism. But (b) is a good description of the dismax handler with a MM parameter of 66%. Pierre -Message d'origine- De : Mark Mandel [mailto:mark.man...@gmail.com] Envoyé : mercredi 13 avril 2011 10:04 À :

RE: Searching during postcommit

2011-04-13 Thread Reeza Edah Tally
Thanks, I changed my searching to be triggered on a newSearcher event instead and use the new searcher to retrieve the documents. This works. Btw can I assume that a new searcher will always be created soon after a commit? Regards, Reeza -Original Message- From: Otis Gospodnetic

Re: Allowing looser matches

2011-04-13 Thread Mark Mandel
Thanks! I searched high and low for that, couldn't see it in front of my face! Mark On Wed, Apr 13, 2011 at 6:32 PM, Pierre GOSSE pierre.go...@arisem.comwrote: For (a) I don't think anything exists today providing this mechanism. But (b) is a good description of the dismax handler with a MM

Re: Allowing looser matches

2011-04-13 Thread lboutros
If you are using the Dismax query parser, perhaps could you take a look to the minimum should match parameter 'mm' : http://wiki.apache.org/solr/DisMaxQParserPlugin#mm_.28Minimum_.27Should.27_Match.29 Ludovic. 2011/4/13 Mark Mandel [via Lucene] ml-node+2815186-149863473-383...@n3.nabble.com

Re: SolrException: Unavailable Service

2011-04-13 Thread Phong Dais
Erick, I was under the misconception that a solr transaction is ACID. From what you said, I guess solr transactions are not Isolated. Thanks, Phong On Tue, Apr 12, 2011 at 2:54 PM, Erick Erickson erickerick...@gmail.comwrote: See below: On Tue, Apr 12, 2011 at 2:21 PM, Phong Dais

Re: Searching during postcommit

2011-04-13 Thread Erick Erickson
Yes, you can assume this since that's the only way new content will be searchable, as you've discovered Best Erick On Wed, Apr 13, 2011 at 4:42 AM, Reeza Edah Tally re...@nova-hub.comwrote: Thanks, I changed my searching to be triggered on a newSearcher event instead and use the new

Field Analyzers: which values are indexed?

2011-04-13 Thread Ben Davies
Hi there, Just a quick question that the wiki page ( http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters) didn't seem to answer very well. Given an analyzer that has zero or more Char Filter Factories, one Tokenizer Factory, and zero or more Token Filter Factories, which value(s) are

Re: Analysing all tokens in a stream

2011-04-13 Thread Ahmet Arslan
I would like to build a component that during indexing analyses all tokens in a stream and adds metadata to a new field based on my analysis. I have different tasks that I would like to perform, like basic classification and certain more advanced phrase detections. How would I do this? A

Re: ExtractingRequestHandler and Solr 3.1

2011-04-13 Thread Grant Ingersoll
On Apr 13, 2011, at 12:06 AM, Liam O'Boyle wrote: Afternoon, After an upgrade to Solr 3.1 which has largely been very smooth and painless, I'm having a minor issue with the ExtractingRequestHandler. The problem is that it's inserting metadata into the extracted content, as well as

Re: Field Analyzers: which values are indexed?

2011-04-13 Thread Koji Sekiguchi
Or is the only the final value after completing the whole chain indexed? Yes. Koji -- http://www.rondhuit.com/en/

jetty update

2011-04-13 Thread ramires
hi how to update jetty 6 to jetty 7 ? -- View this message in context: http://lucene.472066.n3.nabble.com/jetty-update-tp2816084p2816084.html Sent from the Solr - User mailing list archive at Nabble.com.

Result order when score is the same

2011-04-13 Thread kenf_nc
I'm using version 1.4.1. It appears that when several documents in a result set have the same score, the secondary sort is by 'indexed_at' ascending. Can this be altered in the config xml files? If I wanted the secondary sort to be indexed_at descending for example, or by a different field, say

strange behavior of echoParams

2011-04-13 Thread Bernd Fehling
Dear list, after setting echoParams to none wildcard search isn't working. Only if I set echoParams to explicit then wildcard is possible. http://wiki.apache.org/solr/CoreQueryParameters states that echoParams is for debugging purposes. We use Solr 3.1.0. Snippet from solrconfig.xml:

Re: strange behavior of echoParams

2011-04-13 Thread Erik Hatcher
What does the parsed query look like with debugQuery=true for both scenarios? Any difference? Doesn't make any sense that echoParams would have an effect, unless somehow your search client is relying on parameters returned to do something with them.?! Erik On Apr 13, 2011, at 09:57

Re: function query apply only in the subset of the query

2011-04-13 Thread Marco Martinez
Its seems that is a problem of my own query, now i need to investigate if there is something different between a normal query and my implementation of the query, because if you use it alone, its works properly. Thanks, Marco Martínez Bautista http://www.paradigmatecnologico.com Avenida de

phpnative response writer in SOLR 3.1 ?

2011-04-13 Thread Ralf Kraus
Hello, I just updatet to SOLR 3.1 and wondering if the phpnative response writer plugin is part of it? ( https://issues.apache.org/jira/browse/SOLR-1967 ) When I try to compile the sources files I get some errors : PHPNativeResponseWriter.java:57:

DIH : Unexpected character '=' (code 61); expected a semi-colon after the reference for entity 'st'

2011-04-13 Thread Rosa (Anuncios)
Hi I'm having an error when i import an xml file with DIH. In this file my id is an url wich looks like this : http://www.example.com/?cp=30_sst=ac=655 Apparently the issue is with the = character? Is there any workaround? Error trace: rows processed:0 Processing Document # 849 at

Re: DIH : Unexpected character '=' (code 61); expected a semi-colon after the reference for entity 'st'

2011-04-13 Thread Markus Jelsma
This is invalid XML. Entities must be encoded or embedded within CDATA tags. On Wednesday 13 April 2011 16:10:51 Rosa (Anuncios) wrote: Hi I'm having an error when i import an xml file with DIH. In this file my id is an url wich looks like this : http://www.example.com/?cp=30_sst=ac=655

Re: strange behavior of echoParams

2011-04-13 Thread Bernd Fehling
Hi Erik, never mind. Can't reproduce this strange behavior. Obviously stopping and starting of solr solved this. Thanks, Bernd Am 13.04.2011 16:00, schrieb Erik Hatcher: What does the parsed query look like with debugQuery=true for both scenarios? Any difference? Doesn't make any sense that

Re: function query apply only in the subset of the query

2011-04-13 Thread Yonik Seeley
On Wed, Apr 13, 2011 at 10:00 AM, Marco Martinez mmarti...@paradigmatecnologico.com wrote: Its seems that is a problem of my own query, now i need to investigate if there is something different between a normal query and my implementation of the query, because if you use it alone, its works

Indexing Question for large dataset

2011-04-13 Thread Joshua Bouchair
We have an ecommerce application B2C/B2B with a large amount of price list that range into 2000+ and growing. They want to index price to have facets and sorting. That seems like that would be a lot of columns to index, example below: INDEX COLUMN: NamePrice

Re: Indexing Question for large dataset

2011-04-13 Thread kenf_nc
Indexing isn't a problem, it's just disk space and space is cheap. But, if you do facets on all those price columns, that gets put into RAM which isn't as cheap or plentiful. Your cache buffers may get overloaded a lot and performance will suffer. 2000 price columns seems like a lot, could the

Re: Field Analyzers: which values are indexed?

2011-04-13 Thread Ben Davies
Thanks both for your replies Eric, Yep, I use the Analysis page extensively, but what I was directly looking for was whether all of only the last line of values given by the analysis page, where eventually indexed. I think we've concluded it's only the last line. Cheers, Ben On Wed, Apr 13,

RE: Indexing Question for large dataset

2011-04-13 Thread Joshua Bouchair
Don't know of any other way to organize the documents. We need to have the specific price that belongs to the user, so I don't think that the facets would be the issue. The facet querying would be modified to the corresponding price list field for that user. Let's say the customer belongs to

Re: jetty update

2011-04-13 Thread Sam Granieri
I found this link after googling for a few minutes. http://wiki.eclipse.org/Jetty/Howto/Upgrade_from_Jetty_6_to_Jetty_7 I hope that helps Also, a question like this may be more appropriate for a jetty mailing list. On Wed, Apr 13, 2011 at 8:44 AM, ramires uy...@beriltech.com wrote: hi  how to

Re: phpnative response writer in SOLR 3.1 ?

2011-04-13 Thread Chris Hostetter
: Subject: phpnative response writer in SOLR 3.1 ? : References: : 15647_1302703023_zzh0o1kefjfix.00_4da5abae.5070...@uni-bielefeld.de : 0d30a85b-b981-4c27-9dbe-7fc8e0619...@gmail.com : In-Reply-To: 0d30a85b-b981-4c27-9dbe-7fc8e0619...@gmail.com

Re: jetty update

2011-04-13 Thread stockii
is it necessary to update for solr ? - --- System One Server, 12 GB RAM, 2 Solr Instances, 7 Cores, 1 Core with 31 Million Documents other Cores 100.000 - Solr1 for Search-Requests - commit every Minute - 5GB Xmx -

Re: jetty update

2011-04-13 Thread Sam Granieri
Is your current solr installation with Jetty 6 working well for you in a production environment? I dont know enough about Jetty to help you further on this question. On Wed, Apr 13, 2011 at 10:47 AM, stockii stock.jo...@googlemail.com wrote: is it necessary to update for solr ? -

RE: Indexing Question for large dataset

2011-04-13 Thread kenf_nc
Is NAME a product name? Why would it be multivalue? And why would it appear on more than one document? Is each 'document' a package of products? And the pricing tiers are on the package, not individual pieces? So sounds like you could, potentially, have a PriceListX column for each user. As your

RE: Indexing Question for large dataset

2011-04-13 Thread Joshua Bouchair
Name equals the product name. Each separate product can have 1 to n prices based upon pricelist. A single document represents that single product. doc field name=id1/field field name=nameThe product name./field field name=price1.00/field field

Regarding filterquery

2011-04-13 Thread soumya rao
Hi, I am a newbie to solr. I could see that the queries are not cached. Would like to apply filterCache to queries in ruby. Can anyone provide me the syntax for this please? Thanks.

RE: Regarding filterquery

2011-04-13 Thread Joshua Bouchair
Uncomment solrconfig.xml at the following location. !-- An optimization that attempts to use a filter to satisfy a search. If the requested sort does not include score, then the filterCache will be checked for a filter matching the query. If found, the filter will be

Re: Regarding filterquery

2011-04-13 Thread soumya rao
Thanks for the reply Josh. And where should I make changes in ruby to add filters? Soumya On Wed, Apr 13, 2011 at 11:20 AM, Joshua Bouchair joshuabouch...@wasserstrom.com wrote: Uncomment solrconfig.xml at the following location. !-- An optimization that attempts to use a filter to

how to get lots fields this way?

2011-04-13 Thread Floyd Wu
Hi, As I know when using fl=*, score means we need to get all field and score as returned search result. And if field is stored, all text will be returned as part of result. Now I have 2x fields, some of fields name have no prefix or fixed naming rule and cannot be predicted what name will be. I

Re: Updates during Optimize

2011-04-13 Thread Mark Miller
Not cleanly currently. SOLR-2193: Re-architect Update Handler, should take care of this though. - Mark On Apr 12, 2011, at 8:21 AM, stockii wrote: Hello. When is start an optimize (which takes more than 4 hours) no updates from DIH are possible. i thougt solr is copy the hole index and

Re: Result order when score is the same

2011-04-13 Thread kenf_nc
Is sort order when 'score' is the same a Lucene thing? Should I ask on the Lucene forum? -- View this message in context: http://lucene.472066.n3.nabble.com/Result-order-when-score-is-the-same-tp2816127p2817330.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Result order when score is the same

2011-04-13 Thread Rob Casson
you could just explicitly send multiple sorts...from the tutorial: sort=inStock asc, price desc cheers. On Wed, Apr 13, 2011 at 2:59 PM, kenf_nc ken.fos...@realestate.com wrote: Is sort order when 'score' is the same a Lucene thing? Should I ask on the Lucene forum? -- View this

Re: Result order when score is the same

2011-04-13 Thread Jonathan Rochkind
In real life though, it seems unlikely that the relevancy score will ever be identical, so the second sort field will never be used. Is relevancy score ever identical? Rarely at any rate. On 4/13/2011 3:22 PM, Rob Casson wrote: you could just explicitly send multiple sorts...from the

Re: Result order when score is the same

2011-04-13 Thread kenf_nc
Au contraire, I have almost 4 million documents, representing businesses in the US. And having the score be the same is a very common occurrence. It is quite clear from testing that if score is the same, then it sorts on indexed_at ascending. It seems silly to make me add a sort on every query,

Re: Regarding filterquery

2011-04-13 Thread Li
You should just ask me. Sent from my iPhone On Apr 13, 2011, at 11:27 AM, soumya rao soumrao...@gmail.com wrote: Thanks for the reply Josh. And where should I make changes in ruby to add filters? Soumya On Wed, Apr 13, 2011 at 11:20 AM, Joshua Bouchair

Curl bulk XML

2011-04-13 Thread Li
Hey guys, how do you curl update all the XML inside a folder from A-D? Example: curl http://localhost:8080/solr update Sent from my iPhone

Re: Result order when score is the same

2011-04-13 Thread Markus Jelsma
If you omitNorms and omitTermFreqAndPositions on the query field(s) and use no funky boost functions, all results will have identical score in AND-queries (or queries with one search term). IDF has no meaning because of AND, queryNorm is the same across the resultset, fieldNorm is 1 and TF is

Re: Curl bulk XML

2011-04-13 Thread Markus Jelsma
Either put all documents in a large file or loop over them with a simple shell script. Hey guys, how do you curl update all the XML inside a folder from A-D? Example: curl http://localhost:8080/solr update Sent from my iPhone

Re: Curl bulk XML

2011-04-13 Thread Ezequiel Calderara
From the post.jar i think that you can do something like... java -jar post.jar A*.xml java -jar post.jar B*.xml java -jar post.jar C*.xml java -jar post.jar D*.xml (im in windows) On Wed, Apr 13, 2011 at 4:41 PM, Markus Jelsma markus.jel...@openindex.iowrote: Either put all documents in a

Re: Result order when score is the same

2011-04-13 Thread Markus Jelsma
Sorting a large set is costly, the more fields you sort on, the more memory is consumed (and likely cached). If i remember correctly the result set will be ordered according to Lucene DocID's if there's nothing to sort on. If i read correctly, you don't want to specify those fixed sort

RE: Regarding filterquery

2011-04-13 Thread Joshua Bouchair
You have to specify the query. In the query you will have fq parameter which means facet query. http://wiki.apache.org/solr/solr-ruby -Original Message- From: soumya rao [mailto:soumrao...@gmail.com] Sent: Wednesday, April 13, 2011 2:27 PM To: solr-user@lucene.apache.org Subject: Re:

Re: Result order when score is the same

2011-04-13 Thread kenf_nc
Is a new DocID generated everytime a doc with the same UniqueID is added to the index? If so, then docID must be incremental and would look like indexed_at ascending. What I see (and why it's a problem for me) is the following. a search brings back the first 5 documents in a result set of say 60.

tika/pdfbox knobs levers

2011-04-13 Thread Jay Luker
Hi all, I'm wondering if there are any knobs or levers i can set in solrconfig.xml that affect how pdfbox text extraction is performed by the extraction handler. I would like to take advantage of pdfbox's ability to normalize diacritics and ligatures [1], but that doesn't seem to be the default

Re: Result order when score is the same

2011-04-13 Thread Markus Jelsma
Is a new DocID generated everytime a doc with the same UniqueID is added to the index? If so, then docID must be incremental and would look like indexed_at ascending. What I see (and why it's a problem for me) is the following. Yes, Solr removes the old and inserts a new when updating an

Re: partial optimize does not reduce the segment number to maxNumSegments

2011-04-13 Thread Jay Hill
As Hoss mentioned earlier in the thread, you can use the statistics page from the admin console to view the current number of segments. But if you want to know by looking at the files, each segment will have a unique prefix, such as _u. There will be one unique prefix for every segment in the

Re: tika/pdfbox knobs levers

2011-04-13 Thread Markus Jelsma
Hi, I'm not sure how Solr allows for adjusting these Tika settings to get the desired output. At least a few desirable Tika subsystems cannot be called from the ExtractingRequestHandler such as Tika's BoilerPlateContentHandler. I'm also not really sure if it's a good idea to normalize

Re: Result order when score is the same

2011-04-13 Thread Jonathan Rochkind
all documents. But, I would want the sort to be at the system level, I dont' want the overhead of sorting every query I ever make. How would 'doing it at the system level' avoid the 'overhead of sorting every query'? Every query has to be sorted, if you want it sorted. Beyond setting a

Re: how to get lots fields this way?

2011-04-13 Thread Otis Gospodnetic
Floyd, You need to explicitly list all fields in fl=... Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Floyd Wu floyd...@gmail.com To: solr-user@lucene.apache.org Sent: Wed, April

Re: Result order when score is the same

2011-04-13 Thread Otis Gospodnetic
Hi Ken, It sounds like you want to just sort by time changed/added (reverse chrono order). I would not worry about issues just yet unless you have some reasons to think this is going to cause problems (e.g. giant index, low RAM). Jonathan is right about commits, and the NRT-ness of search

Seattle Solr/Lucene User Group?

2011-04-13 Thread Gary Yngve
Hi all, Does anyone know if there is a Solr/Lucene user group / birds-of-feather that meets in Seattle? If not, I'd like to start one up. I'd love to learn and share tricks pertaining to NRT, performance, distributed solr, etc. Also, I am planning on attending the Lucene Revolution! Let's

DIH CachedSqlEntityProcessor null exception

2011-04-13 Thread Zac Smith
I have come across an issue with the DIH where I get a null exception when pre-caching entities. I expect my entity to have null values so this is a bit of a roadblock for me. The issue was described more succinctly in this discussion:

Re: Seattle Solr/Lucene User Group?

2011-04-13 Thread Chris Hostetter
: Does anyone know if there is a Solr/Lucene user group / : birds-of-feather that meets in Seattle? I don't live in seattle, but this group use to send meeting announvements to solr-user promoting Seattle Hadoop/Lucene/NoSQL Meetups. They still list solr in their keywords, but not in their

Re: how to get lots fields this way?

2011-04-13 Thread Floyd Wu
Can solr list fields in fl=... like this way? fl=!fieldName,score Floyd 2011/4/14 Otis Gospodnetic otis_gospodne...@yahoo.com Floyd, You need to explicitly list all fields in fl=... Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search ::

Re: jetty update

2011-04-13 Thread Bill Bell
There is a patch that fixes UTF-8 and performance issues with Jetty. So I would recommend you use the patched version in 3.1/4.0. On 4/13/11 9:47 AM, stockii stock.jo...@googlemail.com wrote: is it necessary to update for solr ? - --- System

Re: my index has 500 million docs ,how to improve solr search performance?

2011-04-13 Thread lu.rongbin
5G memory per JVM -- View this message in context: http://lucene.472066.n3.nabble.com/my-index-has-500-million-docs-how-to-improve-solr-search-performance-tp1902595p2819179.html Sent from the Solr - User mailing list archive at Nabble.com.