Expunging deletes from a very large index

2011-06-06 Thread Simon Wistow
Due to some emergency maintenance I needed to run delete on a large number of documents in a 200Gb index. The problem is that it's taking an inordinately long amount of time (2+ hours so far and counting) and is steadily eating up disk space - presumably up to 2x index size which is getting

Re: Applying synonyms increase the data size from MB to GBs

2011-06-06 Thread pravesh
Since you r using expand=true , so, every time a matching synonym entry is found the analyzer is expanding the term with all synonyms set in the index. This may cause the index to grow in size. -- View this message in context:

Re: Feature: skipping caches and info about cache use

2011-06-06 Thread pravesh
SOLR1.3+ logs only the fresh queries in the logs. If you re-run the same query then it is served from cache, and not printed on the logs(unless cache(s) are not warmed or sercher is reopened). So, Otis's proposal would definitely help in doing some benchmarks baselining the search :) -- View

Re: Solr Field name restrictions

2011-06-06 Thread Marc SCHNEIDER
Hi, Using Solr 3.1 I'm getting errors when trying to sort on fields containing dashes in the name... So that's true stay away from dashes if you can. Marc. On Sun, Jun 5, 2011 at 3:46 PM, Erick Erickson erickerick...@gmail.comwrote: I'd stay away from dashes too. It's too easy for the query

Travel Assistance applications now open for ApacheCon NA 2011

2011-06-06 Thread Simon Willnauer
The Apache Software Foundation (ASF)'s Travel Assistance Committee (TAC) is now accepting applications for ApacheCon North America 2011, 7-11 November in Vancouver BC, Canada. The TAC is seeking individuals from the Apache community at-large --users, developers, educators, students, Committers,

Re: Applying synonyms increase the data size from MB to GBs

2011-06-06 Thread Ahmet Arslan
Is there a way where in I can apply all those file to same tag with some delimiter separated? like this:         filter class=solr.SynonymFilterFactory synonyms=BODYTaxonomy.txt , ClinicalObs.txt, MicTaxo.txt, SPTaxo.txt ignoreCase=true expand=true/ Yes, you can perfectly feed

Re: Expunging deletes from a very large index

2011-06-06 Thread Michael McCandless
You can drop your mergeFactor to 2 and then run expungeDeletes? This will make the operation take longer but (assuming you have 3 segments in your index) should use less transient disk space. You could also make a custom merge policy, that expunges one segment at a time (even slower but even

Re: synonyms problem

2011-06-06 Thread Erick Erickson
What does call synonym methods in Java mean? That is, what are you trying to accomplish and from where? Best Erick On Sun, Jun 5, 2011 at 9:48 PM, deniz denizdurmu...@gmail.com wrote: well i have changed it into text... but still confused about how to use synonyms... and also I want to know

Re: TIKA INTEGRATION PERFORMANCE

2011-06-06 Thread Tomás Fernández Löbbe
1. About the commit strategy, all the ExtractingRequestHandler (request handler that uses Tika to extract content from the input file) will do is extract the content of your file and add it to a SolrInputDocument. The commit strategy should not change because of this, compared to other documents

Re: java.io.IOException: The specified network name is no longer available

2011-06-06 Thread Erick Erickson
Yep, but note the discussion. It's not at all clear that Solr is the place to deal with an unreliable network, and it sounds like that's the root of your issue. It doesn't look like anyone's hot to change Solr's behavior here, and it's arguable that Solr isn't the place to compensate for an

Re: Applying synonyms increase the data size from MB to GBs

2011-06-06 Thread Erick Erickson
Have you considered query-time expansion rather than index-time expansion? In general this will lead to more complex queries, but smaller indexes. Take a look at the analysis page available from the admin page to see exactly what happens. What is the high-legel problem you're trying to solve?

problem: zooKeeper Integration with solr

2011-06-06 Thread Mohammad Shariq
Hi folk, I am using solr to index around 100mn docs. now I am planning to move to cluster based solr, so that I can scale the indexing and searching process. since solrCloud is in development stage, I am trying to index in shard based environment using zooKeeper. I followed the steps from

RE: Solr performance tuning - disk i/o?

2011-06-06 Thread Demian Katz
Thanks once again for the helpful suggestions! Regarding the selection of facet fields, I think publishDate (which is actually just a year) and callnumber-first (which is actually a very broad, high-level category) are okay. authorStr is an interesting problem: it's definitely a useful facet

Auto-scaling solr setup

2011-06-06 Thread Akshay
So i am trying to setup an auto-scaling search system of ec2 solr-slaves which scale up as number of requests increase and vice versa Here is what I have 1. A solr master and underlying slaves(scalable). And an elastic load balancer to distribute the load. 2. The ec2-auto-scaling setup fires nodes

Need query help

2011-06-06 Thread Denis Kuzmenok
For now i have a collection with: id (int) price (double) multivalue brand_id (int) filters (string) multivalue I need to get available brand_id, filters, price values and list of id's for current query. For example now i'm doing queries with facet.field=brand_id/filters/price: 1) to

Master Slave help

2011-06-06 Thread Rohit Gupta
Hi, I have configured my master slave server and everything seems to be running fine, the replication completed the firsttime it ran. But everytime I go the the replication link in the admin panel after restarting the server or server startup I notice the replication starting from scratch or

Re: Search with Synonyms in two fields

2011-06-06 Thread Jonathan Rochkind
On 6/5/2011 3:36 AM, occurred wrote: Ok, thx for the answer. My idea now is to store both field-values in one field and pre- and suffix the values from field2 with something very special. Also then the synonyms have to have the special pre- and suffixes. What are you actually trying to do?

Re: Solr Indexing Patterns

2011-06-06 Thread Judioo
On 5 June 2011 14:42, Erick Erickson erickerick...@gmail.com wrote: See: http://wiki.apache.org/solr/SchemaXml By adding ' multiValued=true ' to the field, you can add the same field multiple times in a doc, something like add doc field name=mvvalue1/field field name=mvvalue2/field

Re: Solr performance tuning - disk i/o?

2011-06-06 Thread Erick Erickson
Polling interval was in reference to slaves in a multi-machine master/slave setup. so probably not a concern just at present. Warmup time of 0 is not particularly normal, I'm not quite sure what's going on there but you may want to look at firstsearcher, newsearcher and autowarm parameters in

How to get default result?

2011-06-06 Thread richardr
Dear list, i got a question regarding my address search: I am searching for address data. If there is one address field not definied (in this case the housenumber) for the specific query (e.g. city = a, street = b, housenumber=14), I am getting no result. For every street there exists at least

Default query parser operator

2011-06-06 Thread Brian Lamb
Hi all, Is it possible to change the query parser operator for a specific field without having to explicitly type it in the search field? For example, I'd like to use: http://localhost:8983/solr/search/?q=field1:word token field2:parser syntax instead of

Re: How to get default result?

2011-06-06 Thread Tomás Fernández Löbbe
Hi Richard, are you setting the value to 0 at index time when the housenumber is not present? If you are, this would be as simple as modify the query at the application layer to city = a, street= b, housenumber=(14 OR 0). If you are not doing anything at index time with the not present

Re: TIKA INTEGRATION PERFORMANCE

2011-06-06 Thread Naveen Gupta
Hi Tomas, 1. Regarding SolrInputDocument, We are not using java client, rather we are using php solr, wrapping content in SolrInputDocument, i am not sure how to do in PHP client? In this case, we need tika related jars to avail the metadata such as content .. we certainly don't want to handle

Re: SolrJ and Range Faceting

2011-06-06 Thread Jamie Johnson
Thanks Martijn. I pulled your patch and it looks like what I was looking for. The original FacetField class has a getAsFilterQuery method which returns the criteria to use as an fq parameter, I have logic which does this in my class which works, any chance of getting something like this added to

Re: SolrJ and Range Faceting

2011-06-06 Thread Jamie Johnson
Small error, shouldn't be using this.start but should instead be using Double.parseDouble(this.getValue()); and sdf.parse(count.getValue()); respectfully. On Mon, Jun 6, 2011 at 1:16 PM, Jamie Johnson jej2...@gmail.com wrote: Thanks Martijn. I pulled your patch and it looks like what I was

Re: Auto-scaling solr setup

2011-06-06 Thread Erick Erickson
The HTTP interface (http://wiki.apache.org/solr/SolrReplication#HTTP_API) can be used to control lots of parts of replication. As to warmups, I don't know of a good way to test that. I don't know whether getting the current status on the slave includes whether warmup is completed or not. At

RE: Solr performance tuning - disk i/o?

2011-06-06 Thread Demian Katz
All of my cache autowarmCount settings are either 1 or 5. maxWarmingSearchers is set to 2. I previously shared the contents of my firstSearcher and newSearcher events -- just a queries array surrounded by a standard-looking listener tag. The events are definitely firing -- in

Re: Auto-scaling solr setup

2011-06-06 Thread Akshay
Yes sadly .. I too have not much clue about AWS. The SolrReplication API doesnt give me what i want exactly.. For the time being i have hacked my way into the amazon image bootstrapping the replication check in a shell script ((curl awk) very dirty way) . Once the check suceeds I enable the

Re: Need query help

2011-06-06 Thread Alexey Serba
See Tagging and excluding Filters section * http://wiki.apache.org/solr/SimpleFacetParameters#Tagging_and_excluding_Filters 2011/6/6 Denis Kuzmenok forward...@ukr.net: For now i have a collection with: id (int) price (double) multivalue brand_id (int) filters (string) multivalue I  need  

Re: TIKA INTEGRATION PERFORMANCE

2011-06-06 Thread Tomás Fernández Löbbe
On Mon, Jun 6, 2011 at 1:47 PM, Naveen Gupta nkgiit...@gmail.com wrote: Hi Tomas, 1. Regarding SolrInputDocument, We are not using java client, rather we are using php solr, wrapping content in SolrInputDocument, i am not sure how to do in PHP client? In this case, we need tika related

Re: Solr Indexing Patterns

2011-06-06 Thread Erick Erickson
#Everybody# (including me) who has any RDBMS background doesn't want to flatten data, but that's usually the way to go in Solr. Part of whether it's a good idea or not depends on how big the index gets, and unfortunately the only way to figure that out is to test. But that's the first approach

Re: Solr performance tuning - disk i/o?

2011-06-06 Thread Erick Erickson
If you're seeing results, things must be OK. It's a little strange, though, I'm seeing warmup times of 1 on the trivial reload of the example documents. But I wouldn't worry too much here. Those are pretty high autowarm counts, you might have room to reduce them but absent long autowarm times

Re: Solr Indexing Patterns

2011-06-06 Thread Judioo
Thanks On 6 June 2011 19:32, Erick Erickson erickerick...@gmail.com wrote: #Everybody# (including me) who has any RDBMS background doesn't want to flatten data, but that's usually the way to go in Solr. Part of whether it's a good idea or not depends on how big the index gets, and

Re: Solr Indexing Patterns

2011-06-06 Thread Judioo
I do think that Solr would be better served if there was a *best practice section *of the site. Looking at the majority of emails to this list they resolve around how do I do X?. Seems like tutorials with real world examples would serve Solr no end of good. I still do not have an example of the

Re: Solr Indexing Patterns

2011-06-06 Thread Jonathan Rochkind
This is a start, for many common best practices: http://wiki.apache.org/solr/SolrRelevancyFAQ Many of the questions in there have an answer that involves de-normalizing. As an example. It may be that even if your specific problem isn't in there, I myself anyway found reading through there

How do I make sure the resulting documents contain the query terms?

2011-06-06 Thread Gabriele Kahlout
Hello, I've seen that through boosting it's possible to influence the scoring function, but what I would like is sort of a boolean property. In some way it's to search only the indexed documents by that keyword (or the intersection/union) rather than the whole set. Is this supported in any way?

SpellCheckComponent performance

2011-06-06 Thread Demian Katz
I'm continuing to work on tuning my Solr server, and now I'm noticing that my biggest bottleneck is the SpellCheckComponent. This is eating multiple seconds on most first-time searches, and still taking around 500ms even on cached searches. Here is my configuration: searchComponent

Re: How do I make sure the resulting documents contain the query terms?

2011-06-06 Thread Erick Erickson
I'm having a hard time understanding what you're driving at, can you provide some examples? This *looks* like filter queries, but I think you already know about those... Best Erick On Mon, Jun 6, 2011 at 4:00 PM, Gabriele Kahlout gabri...@mysimpatico.com wrote: Hello, I've seen that through

Re: SpellCheckComponent performance

2011-06-06 Thread Erick Erickson
Hmmm, how are you configuring you spell checker? The first-time slowdown is probably due to cache warming, but subsequent 500 ms slowdowns seem odd. How many unique terms are there in your spellecheck index? It'd probably be best if you showed us your fieldtype and field definition... Best Erick

Re: synonyms problem

2011-06-06 Thread deniz
well i was trying to say that; i have changed the config files for synonyms and so on but nothing happens so i thought i needed to do something in java code too... i was trying to ask about that... - Zeki ama calismiyor... Calissa yapar... -- View this message in context:

Re: Master Slave help

2011-06-06 Thread Jayendra Patil
Do you mean the replication happens everytime you restart the server ? If so, you would need to modify the events you want the replication to happen. Check for the replicateAfter tag and remove the startup option, if you don't need it. requestHandler name=/replication

Re: problem: zooKeeper Integration with solr

2011-06-06 Thread bmdakshinamur...@gmail.com
Instead of integrating zookeeper, you could create shards over multiple machines and specify the shards while you are querying solr. Eg: http://localhost:8983/solr/select?shards=*Machine:Port/Solr Path,* *Machine:Port/Solr Path*indent=trueq=query On Mon, Jun 6, 2011 at 5:59 PM, Mohammad Shariq