Re: solre scores remains same for exact match and nearly exact match

2013-04-03 Thread Gora Mohanty
On 3 April 2013 10:52, amit amit.mal...@gmail.com wrote: Below is my query http://localhost:8983/solr/select/?q=subject:session management in phpfq=category:[*%20TO%20*]fl=category,score,subject [...] Add debugQuery=on to your Solr URL, and you will get an explanation of the score. Your

Re: Out of memory on some faceting queries

2013-04-03 Thread Toke Eskildsen
On Tue, 2013-04-02 at 17:08 +0200, Dotan Cohen wrote: Most of the time I facet on one field that has about twenty unique values. They are likely to be disk cached so warming those for 9M documents should only take a few seconds. However, once per day I would like to facet on the text field,

maxWarmingSearchers in Solr 4.

2013-04-03 Thread Dotan Cohen
I have been dragging the same solrconfig.xml from Solr 3.x to 4.0 to 4.1, with no customization (bad, bad me!). I'm now looking into customizing it and I see that the Solr 4.1 solrconfig.xml is much simpler and shorter. Is this simply because many of the examples have been removed? In particular,

Re: Out of memory on some faceting queries

2013-04-03 Thread Dotan Cohen
On Tue, Apr 2, 2013 at 6:26 PM, Andre Bois-Crettez andre.b...@kelkoo.com wrote: warmupTime is available on the admin page for each type of cache (in milliseconds) : http://solr-box:8983/solr/#/core1/plugins/cache Or if you are only interested in the total :

Re: Out of memory on some faceting queries

2013-04-03 Thread Dotan Cohen
On Wed, Apr 3, 2013 at 10:11 AM, Toke Eskildsen t...@statsbiblioteket.dk wrote: However, once per day I would like to facet on the text field, which is a free-text field usually around 1 KiB (about 100 words), in order to determine what the top keywords / topics are. That query would take up

Re: Solr 4.2.0 results links

2013-04-03 Thread zeroeffect
Thanks for the response. I found the issue. The data was being ingested correctly it just being echoed incorrectly. while inspecting the final HTML output I was able to find that the richtext-doc.vm file was used to display my data. The code in this file generated the links to local files. I did

Query parser cuts last letter from search term.

2013-04-03 Thread vsl
Hi, I have strange problem with Solr query. I added to my Solr Index new document with behave! word inside content. While I was trying to search this document using behave search term it was impossible. Only behave! returns result. Additionaly search debug returns following information: debug: {

RE: MoreLikeThis - Odd results - what am I doing wrong?

2013-04-03 Thread DC tech
Thanks David - I suppose it is an AWS question and thank you for the pointers. As a further input to the MLT question - it does seem that 3.6 behavior is different from 4.2 - the issue seems to be more in terms of the raw query that is generated. I will some more research and revert back with

Re: Query parser cuts last letter from search term.

2013-04-03 Thread Upayavira
This is called 'stemming', and is caused by this: filter class=solr.SnowballPorterFilterFactory language=English/ It means that all of these terms would match: behave behaving behaved (and possibly more) because they would all stem down to 'behav'. This stemming will happen at

Re: Query parser cuts last letter from search term.

2013-04-03 Thread vsl
So why Solr does not return proper document? -- View this message in context: http://lucene.472066.n3.nabble.com/Query-parser-cuts-last-letter-from-search-term-tp4053432p4053435.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Flow Chart of Solr

2013-04-03 Thread Furkan KAMACI
So, all in all, is there anybody who can write down just main steps of Solr(including parsing, stemming etc.)? 2013/4/2 Furkan KAMACI furkankam...@gmail.com I think about myself as an example. I have started to make research about Solr just for some weeks. I have learned Solr and its related

Words being duplicated with highlighting DictionaryCompoundWordTokenFilterFactory

2013-04-03 Thread Philtjens, Raf
I'm having issues with highlighting DictionaryCompoundWordTokenFilterFactory in Solr 3.6.1/3.6.2. It's duplicating/adding words in the highlighted snippet. For example, my dictionary (dutch) has the following words: premie, beter, ring. If I search for 'verbetering', results with

Solr ZooKeeper ensemble with HBase

2013-04-03 Thread Amit Sela
Hi all, I have a running Hadoop + HBase cluster and the HBase cluster is running it's own zookeeper (HBase manages zookeeper). I would like to deploy my SolrCloud cluster on a portion of the machines on that cluster. My question is: Should I have any trouble / issues deploying an additional

Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

2013-04-03 Thread Mark Miller
Clear out it's tlogs before starting it again may help. - Mark On Apr 2, 2013, at 10:07 PM, Jamie Johnson jej2...@gmail.com wrote: I brought the bad one down and back up and it did nothing. I can clear the index and try4.2.1. I will save off the logs and see if there is anything else odd

Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

2013-04-03 Thread Mark Miller
No, not that I know if, which is why I say we need to get to the bottom of it. - Mark On Apr 2, 2013, at 10:18 PM, Jamie Johnson jej2...@gmail.com wrote: Mark It's there a particular jira issue that you think may address this? I read through it quickly but didn't see one that jumped out On

Re: Flow Chart of Solr

2013-04-03 Thread Jack Krupansky
Sure, yes. But... it comes down to what level of detail you want and need for a specific task. In other words, there are probably a dozen or more levels of detail. The reality is that if you are going to work at the Solr code level, that is very, very different than being a user of Solr, and at

Re: Query parser cuts last letter from search term.

2013-04-03 Thread Jack Krupansky
The standard tokenizer recognizes ! as a punctuation character, so it will be treated as white space. You could use the white space tokenizer if punctuation is considered significant. -- Jack Krupansky -Original Message- From: vsl Sent: Wednesday, April 03, 2013 6:25 AM To:

RE: Confusion over Solr highlight hl.q parameter

2013-04-03 Thread Van Tassell, Kristian
Thank you for the response, unfortunately it didn't change that I'm still getting no highlighting hits for this query. ...hl.q={!dismax}text_it_IT:l'assieme... -Original Message- From: Koji Sekiguchi [mailto:k...@r.email.ne.jp] Sent: Tuesday, April 02, 2013 9:00 PM To:

Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

2013-04-03 Thread Jamie Johnson
Ok, so clearing the transaction log allowed things to go again. I am going to clear the index and try to replicate the problem on 4.2.0 and then I'll try on 4.2.1 On Wed, Apr 3, 2013 at 8:21 AM, Mark Miller markrmil...@gmail.com wrote: No, not that I know if, which is why I say we need to get

Re: is there a way we can build spell dictionary from solr index such that it only take words leaving all`special characters

2013-04-03 Thread Rohan Thakur
hi upayavira you mean to say that I dont have to follow this : http://wiki.apache.org/solr/SpellCheckComponent and directly I can create spell check field from copyfield and use it...I dont have to build dictionary on the fieldjust use copyfield for spell suggetions? thanks regards Rohan

Re: Solr metrics in Codahale metrics and Graphite?

2013-04-03 Thread Shawn Heisey
On 3/29/2013 12:07 PM, Walter Underwood wrote: What are folks using for this? I don't know that this really answers your question, but Solr 4.1 and later includes a big chunk of codahale metrics internally for request handler statistics - see SOLR-1972. First we tried including the jar and

Re: Synonyms problem

2013-04-03 Thread Shawn Heisey
On 3/29/2013 12:14 PM, Plamen Mihaylov wrote: Can I ask you another question: I have Magento + Solr and have a requirement to create an admin magento module, where I can add/remove synonyms dynamically. Is this possible? I searched google but it seems not possible. If you change the synonym

Question on Exact Matches - edismax

2013-04-03 Thread Sandeep Mestry
Hi All, I have a requirement where in exact matches for 2 fields (Series Title, Title) should be ranked higher than the partial matches. The configuration looks like below: requestHandler name=assetdismax class=solr.SearchHandler lst name=defaults str name=defTypeedismax/str

Re: solre scores remains same for exact match and nearly exact match

2013-04-03 Thread amit
Thanks. I added a copy field and that fixed the issue. On Wed, Apr 3, 2013 at 12:29 PM, Gora Mohanty-3 [via Lucene] ml-node+s472066n4053412...@n3.nabble.com wrote: On 3 April 2013 10:52, amit [hidden email]http://user/SendEmail.jtp?type=nodenode=4053412i=0 wrote: Below is my query

Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

2013-04-03 Thread Jamie Johnson
Something interesting that I'm noticing as well, I just indexed 300,000 items, and some how 300,020 ended up in the index. I thought perhaps I messed something up so I started the indexing again and indexed another 400,000 and I see 400,064 docs. Is there a good way to find possibile duplicates?

Re: Solr ZooKeeper ensemble with HBase

2013-04-03 Thread Michael Della Bitta
Hello, Amit: My guess is that, if HBase is working hard, you're going to have more trouble with HBase and Solr on the same nodes than HBase and Solr sharing a Zookeeper. Solr's usage of Zookeeper is very minimal. Michael Della Bitta Appinions 18

Re: Flow Chart of Solr

2013-04-03 Thread Jack Park
There are three books on Solr, two with that in the title, and one, Taming Text, each of which have been very valuable in understanding Solr. Jack On Wed, Apr 3, 2013 at 5:25 AM, Jack Krupansky j...@basetechnology.com wrote: Sure, yes. But... it comes down to what level of detail you want and

Re: Solr ZooKeeper ensemble with HBase

2013-04-03 Thread Amit Sela
Trouble in what why ? If I have enough memory - HBase RegionServer 10GB and maybe 2GB for Solr ? - or you mean CPU / disk ? On Wed, Apr 3, 2013 at 5:54 PM, Michael Della Bitta michael.della.bi...@appinions.com wrote: Hello, Amit: My guess is that, if HBase is working hard, you're going to

Re: Solr ZooKeeper ensemble with HBase

2013-04-03 Thread Michael Della Bitta
Solr heavily uses RAM for disk caching, so depending on your index size and what you intend to do with it, 2 GB could easily not be enough. We run with 6 GB heaps on 34 GB boxes, and the remaining RAM is there solely to act as a disk cache. We're on EC2, though, so unless you're using the SSD

Re: Upgrade Solr3.5 to Solr4.1 - Index Reformat ?

2013-04-03 Thread Shawn Heisey
On 4/1/2013 12:19 PM, feroz_kh wrote: Hi Shawn, I tried optimizing using this command... curl 'http://localhost:/solr/update?optimize=truemaxSegments=10waitFlush=true' And i got this response within secs... ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint

Re: Lengthy description is converted to hash symbols

2013-04-03 Thread Danny Watari
Yes... the str.. / is what I see in the admin console when I perform a search for the document. Currently, I am using solrj and the addBean() method to update the core. Whats strange is in our QA env, the document indexed correctly. But in prod, I see hash symbols and thus any user search

Re: Flow Chart of Solr

2013-04-03 Thread Jack Krupansky
And another one on the way: http://www.amazon.com/Lucene-Solr-Definitive-comprehensive-realtime/dp/1449359957 Hopefully that help a lot as well. Plenty of diagrams. Lots of examples. -- Jack Krupansky -Original Message- From: Jack Park Sent: Wednesday, April 03, 2013 11:25 AM To:

Re: Lengthy description is converted to hash symbols

2013-04-03 Thread Jack Krupansky
Show us the exact query URL as well as the request handler defaults. Make sure to try to do an explicit query on the field that has the # value. QA and prod may differ because maybe QA got completely reindexed more recently and maybe prod hasn't gotten fully reindexed recently. Maybe the

SolrCloud not distributing documents across shards

2013-04-03 Thread vsilgalis
So we have 3 servers in a SolrCloud cluster. http://lucene.472066.n3.nabble.com/file/n4053506/Cloud1.png We have 2 shards for our collection (classic_bt) with a shard on each of the first two servers as the picture shows. The third server has replicas of the first 2 shards just for high

Re: Filtering Search Cloud

2013-04-03 Thread Shawn Heisey
On 4/1/2013 3:02 PM, Furkan KAMACI wrote: I want to separate my cloud into two logical parts. One of them is indexer cloud of SolrCloud. Second one is Searcher cloud of SolrCloud. My first question is that. Does separating my cloud system make sense about performance improvement. Because I

Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

2013-04-03 Thread Jamie Johnson
Since I don't have that many items in my index I exported all of the keys for each shard and wrote a simple java program that checks for duplicates. I found some duplicate keys on different shards, a grep of the files for the keys found does indicate that they made it to the wrong places. If you

SolrException: Error opening new searcher

2013-04-03 Thread Van Tassell, Kristian
We're suddenly seeing an error when trying to do updates/commits. This is on Solr 4.2 (Tomcat, solr war deployed to webapps, on Linux SuSE 11). Based off of some initial searching on things related to this issue, I have set ulimit in Linux to 'unlimited' and verified that Tomcat has enough

Re: Lengthy description is converted to hash symbols

2013-04-03 Thread Danny Watari
I looked at the text via the admin analysis tool. The text appeared to be ok! Unfortunately, the description is client data... so I can't post it here, but I do not see any issues when running the analysis tool. -- View this message in context:

Re: Solr ZooKeeper ensemble with HBase

2013-04-03 Thread Walter underwood
It will be limited by disk IO until you get the caches full. Then it will be limited by CPU. wunder On Apr 3, 2013, at 8:55 AM, Amit Sela am...@infolinks.com wrote: Trouble in what why ? If I have enough memory - HBase RegionServer 10GB and maybe 2GB for Solr ? - or you mean CPU / disk ?

Re: maxWarmingSearchers in Solr 4.

2013-04-03 Thread Shawn Heisey
On 4/3/2013 1:48 AM, Dotan Cohen wrote: I have been dragging the same solrconfig.xml from Solr 3.x to 4.0 to 4.1, with no customization (bad, bad me!). I'm now looking into customizing it and I see that the Solr 4.1 solrconfig.xml is much simpler and shorter. Is this simply because many of the

Re: Flow Chart of Solr

2013-04-03 Thread Jack Park
Jack, Is that new book up to the 4.+ series? Thanks The other Jack On Wed, Apr 3, 2013 at 9:19 AM, Jack Krupansky j...@basetechnology.com wrote: And another one on the way: http://www.amazon.com/Lucene-Solr-Definitive-comprehensive-realtime/dp/1449359957 Hopefully that help a lot as well.

Re: Flow Chart of Solr

2013-04-03 Thread Jack Krupansky
We're using the 4.x branch code as the basis for our writing. So, effectively it will be for at least 4.3 when the book comes out in the summer. Early access will be in about a month or so. O'Reilly will be showing a galley proof for 200 pages of the book next week at Big Data TechCon next

Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

2013-04-03 Thread Jamie Johnson
no, my thought was wrong, it appears that even with the parameter set I am seeing this behavior. I've been able to duplicate it on 4.2.0 by indexing 100,000 documents on 10 threads (10,000 each) when I get to 400,000 or so. I will try this on 4.2.1. to see if I see the same behavior On Wed,

Re: Query parser cuts last letter from search term.

2013-04-03 Thread Upayavira
On Wed, Apr 3, 2013, at 11:36 AM, vsl wrote: So why Solr does not return proper document? You're gonna have to give us a bit more than that. What is wrong with the documents it is returning? Upayavira

Re: Solr Multiword Search

2013-04-03 Thread skmirch
I have been trying to use the MultiWordSpellingQueryConverter.java since I need to be able to find the document that correspond to the suggested collations. At the moment it seems to be producing collations based on word matches and arbitrary words from the field are picked up to form collation

Re: Out of memory on some faceting queries

2013-04-03 Thread Shawn Heisey
On 4/2/2013 3:09 AM, Dotan Cohen wrote: I notice that this only occurs on queries that run facets. I start Solr with the following command: sudo nohup java -XX:NewRatio=1 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -Dsolr.solr.home=/mnt/SolrFiles100/solr -jar

Re: SolrCloud not distributing documents across shards

2013-04-03 Thread Michael Della Bitta
Hello Vytenis, What exactly do you mean by aren't distributing across the shards? Do you mean that POSTs against the server for shard 1 never end up resulting in documents saved in shard 2? Michael Della Bitta Appinions 18 East 41st Street, 2nd

Re: Lengthy description is converted to hash symbols

2013-04-03 Thread Danny Watari
Here is a query that should return 2 documents... but it only returns 1. /solr/m7779912/select?indent=onversion=2.2q=description%3Agatewayfq=start=0rows=10fl=descriptionqt=wt=explainOther=hl.fl= Oddly enough, the description of the two documents are exactly the same. Except one is indexed

Solr Tika Override

2013-04-03 Thread JerryC
I am researching Solr and seeing if it would be a good fit for a document search service I am helping to develop. One of the requirements is that we will need to be able to customize how file contents are parsed beyond the default configurations that are offered out of the box by Tika. For

Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

2013-04-03 Thread Mark Miller
Thanks for digging Jamie. In 4.2, hash ranges are assigned up front when a collection is created - each shard gets a range, which is stored in zookeeper. You should not be able to end up with the same id on different shards - something very odd going on. Hopefully I'll have some time to try

RE: AW: AW: java.lang.OutOfMemoryError: Map failed

2013-04-03 Thread Van Tassell, Kristian
I just posted a similar error and discovered that decreasing the Xmx fixed the problem for me. The free command/top, etc. indicated I was flying just below the threshold for my allowed memory, and with swap/virtual space available, so I'm still confused as to what the issue is, but you may try

RE: Solr Multiword Search

2013-04-03 Thread Dyer, James
You have specified spellcheck.q in your query. The whole purpose of spellcheck.q is to bypass any query converter you've configured giving it raw keywords instead. But possibly a custom query converter is not your best answer? I agree that charles charlie is an edit distance of 2, so if

Re: SolrCloud not distributing documents across shards

2013-04-03 Thread vsilgalis
Michael Della Bitta-2 wrote Hello Vytenis, What exactly do you mean by aren't distributing across the shards? Do you mean that POSTs against the server for shard 1 never end up resulting in documents saved in shard 2? So we indexed a set of 33010 documents on server01 which are now in

Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

2013-04-03 Thread Jamie Johnson
Where is this information stored in ZK? I don't see it in the cluster state (or perhaps I don't understand it ;) ). Perhaps something with my process is broken. What I do when I start from scratch is the following ZkCLI -cmd upconfig ... ZkCLI -cmd linkconfig but I don't ever explicitly

Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

2013-04-03 Thread Mark Miller
It should be part of your clusterstate.json. Some users have reported trouble upgrading a previous zk install when this change came. I recommended manually updating the clusterstate.json to have the right info, and that seemed to work. Otherwise, I guess you have to start from a clean zk state.

Re: SolrCloud not distributing documents across shards

2013-04-03 Thread Chris Hostetter
: So we indexed a set of 33010 documents on server01 which are now in shard1. : And we kicked off a set of 85934 documents on server02 which are now in : shard2 (as tests). In my understanding of how SolrCloud works, the : documents should be distributed across the shards in the collection. Now

Re: It seems a issue of deal with chinese synonym for solr

2013-04-03 Thread Kuro Kurosaka
On 3/11/13 6:15 PM, 李威 wrote: in org.apache.solr.parser.SolrQueryParserBase, there is a function: protected Query newFieldQuery(Analyzer analyzer, String field, String queryText, boolean quoted) throws SyntaxError The below code can't process chinese rightly. BooleanClause.Occur

Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

2013-04-03 Thread Jamie Johnson
The router says implicit. I did start from a blank zk state but perhaps I missed one of the ZkCLI commands? One of my shards from the clusterstate.json is shown below. What is the process that should be done to bootstrap a cluster other than the ZkCLI commands I listed above? My process right

Re: SolrCloud not distributing documents across shards

2013-04-03 Thread vsilgalis
Chris Hostetter-3 wrote I'm not familiar with the details, but i've seen miller respond to a similar question with reference to the issue of not explicitly specifying numShards when creating your collections... http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201303.mbox/%

HTML entities being missed by DIH HTMLStripTransformer

2013-04-03 Thread Ashok
Hi, I am using DIH to index some database fields. These fields contain html formatted text in them. I use the 'HTMLStripTransformer' to remove that markup. This works fine when the text is like for example: liItem One/li or *This is in Bold* However when the text has HTML entity names like in:

Re: HTML entities being missed by DIH HTMLStripTransformer

2013-04-03 Thread Gora Mohanty
On 4 April 2013 00:30, Ashok ash...@qualcomm.com wrote: [...] Two questions. (1) Is this the expected behavior of DIH HTMLStripTransformer? Yes, I believe so. (2) If yes, is there an another transformer that I can employ first to turn these html entities into their usual symbols that can

Re: Filtering Search Cloud

2013-04-03 Thread Furkan KAMACI
Shawn, thanks for your detailed explanation. My system will work on high load. I mean I will always index something and something always will be queried at my system. That is why I consider about physically separating indexer and query reply machines. I think about that: imagine a machine that

Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

2013-04-03 Thread Mark Miller
If you don't specify numShards after 4.1, you get an implicit doc router and it's up to you to distribute updates. In the past, partitioning was done on the fly - but for shard splitting and perhaps other features, we now divvy up the hash range up front based on numShards and store it in

Re: HTML entities being missed by DIH HTMLStripTransformer

2013-04-03 Thread Ashok
Well, the database field has text, sometimes with HTML entities and at other times with html tags. I have no control over the process that populates the database tables with info. -- View this message in context:

Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

2013-04-03 Thread Jamie Johnson
ah interestingso I need to specify num shards, blow out zk and then try this again to see if things work properly now. What is really strange is that for the most part things have worked right and on 4.2.1 I have 600,000 items indexed with no duplicates. In any event I will specify num

Re: SolrCloud not distributing documents across shards

2013-04-03 Thread Michael Della Bitta
With earlier versions of Solr Cloud, if there was any error or warning when you made a collection, you likely were set up for implicit routing which means that documents only go to the shard you're talking to. What you want is compositeId routing, which works how you think it should. Go into the

Re: HTML entities being missed by DIH HTMLStripTransformer

2013-04-03 Thread Alexandre Rafalovitch
Then, I would say, you have a bigger problem However, you can probably run RegEx filter and replace those known escapes with real characters before you run your HTMLStrip filter. Or run, HTMLStrip, RegEx and HTMLStrip again. Regards, Alex. Personal blog: http://blog.outerthoughts.com/

Re: Filtering Search Cloud

2013-04-03 Thread Shawn Heisey
On 4/3/2013 1:13 PM, Furkan KAMACI wrote: Shawn, thanks for your detailed explanation. My system will work on high load. I mean I will always index something and something always will be queried at my system. That is why I consider about physically separating indexer and query reply machines.

Re: HTML entities being missed by DIH HTMLStripTransformer

2013-04-03 Thread Steve Rowe
Hi Ashok, HTMLStripTransformer uses HTMLStripCharFilter under the hood, and HTMLStripCharFilter converts all HTML entities to their corresponding characters. What version of Solr are you using? My guess is that it only appears that nothing is happening, since when they are presented in a

Re: SolrCloud not distributing documents across shards

2013-04-03 Thread vsilgalis
Michael Della Bitta-2 wrote With earlier versions of Solr Cloud, if there was any error or warning when you made a collection, you likely were set up for implicit routing which means that documents only go to the shard you're talking to. What you want is compositeId routing, which works how

Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

2013-04-03 Thread Jamie Johnson
answered my own question, it now says compositeId. What is problematic though is that in addition to my shards (which are say jamie-shard1) I see the solr created shards (shard1). I assume that these were created because of the numShards param. Is there no way to specify the names of these

Re: SolrCloud not distributing documents across shards

2013-04-03 Thread Michael Della Bitta
If you can work with a clean state, I'd turn off all your shards, clear out the Solr directories in Zookeeper, reset solr.xml for each of your shards, upgrade to the latest version of Solr, and turn everything back on again. Then upload config, recreate your collection, etc. I do it like this,

Re: Filtering Search Cloud

2013-04-03 Thread Furkan KAMACI
Thanks for your explanation, you explained every thing what I need. Just one more question. I see that I can not make it with Solr Cloud, but I can do something like that with master-slave replication of Solr. If I use master-slave replication of Solr, can I eliminate (filter) something (something

Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

2013-04-03 Thread Mark Miller
I had thought you could - but looking at the code recently, I don't think you can anymore. I think that's a technical limitation more than anything though. When these changes were made, I think support for that was simply not added at the time. I'm not sure exactly how straightforward it would

Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

2013-04-03 Thread Jamie Johnson
ok, so that's not a deal breaker for me. I just changed it to match the shards that are auto created and it looks like things are happy. I'll go ahead and try my test to see if I can get things out of sync. On Wed, Apr 3, 2013 at 3:56 PM, Mark Miller markrmil...@gmail.com wrote: I had

Re: HTML entities being missed by DIH HTMLStripTransformer

2013-04-03 Thread Ashok
Hi Steve, Fabulous suggestion! Yup, that is it! Using the HTMLStripTransformer twice did the trick. I am using Solr 4.1. Thank you very much! - ashok -- View this message in context:

do SearchComponents have access to response contents

2013-04-03 Thread xavier jmlucjav
I need to implement some SearchComponent that will deal with metrics on the response. Some things I see will be easy to get, like number of hits for instance, but I am more worried with this: We need to also track the size of the response (as the size in bytes of the whole xml response tat is

Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

2013-04-03 Thread Jamie Johnson
with these changes things are looking good, I'm up to 600,000 documents without any issues as of right now. I'll keep going and add more to see if I find anything. On Wed, Apr 3, 2013 at 4:01 PM, Jamie Johnson jej2...@gmail.com wrote: ok, so that's not a deal breaker for me. I just changed

Re: HTML entities being missed by DIH HTMLStripTransformer

2013-04-03 Thread Steve Rowe
Cool, glad I was able to help. On Apr 3, 2013, at 4:18 PM, Ashok ash...@qualcomm.com wrote: Hi Steve, Fabulous suggestion! Yup, that is it! Using the HTMLStripTransformer twice did the trick. I am using Solr 4.1. Thank you very much! - ashok -- View this message in context:

Re: SolrCloud not distributing documents across shards

2013-04-03 Thread vsilgalis
Michael Della Bitta-2 wrote If you can work with a clean state, I'd turn off all your shards, clear out the Solr directories in Zookeeper, reset solr.xml for each of your shards, upgrade to the latest version of Solr, and turn everything back on again. Then upload config, recreate your

Re: Question on Exact Matches - edismax

2013-04-03 Thread Jan Høydahl
Can you show us your *_ci field type? Solr does not really have a way to tell whether a match is exact or only partial, but you could hack around it with the fieldType. See https://github.com/cominvent/exactmatch for a possible solution. -- Jan Høydahl, search solution architect Cominvent AS -

Re: do SearchComponents have access to response contents

2013-04-03 Thread Jack Krupansky
The search components can see the response as a namedlist, but it is only when SolrDispatchFIlter calls the QueryResponseWriter that XML or JSON or whatever other format (Javabin as well) is generated from the named list for final output in an HTTP response. You probably want a custom query

Re: Solr Tika Override

2013-04-03 Thread Jan Høydahl
You'd probably want to work on the XML output from Tika's PDF parser, from which you can identify which page and context. Personally I would build a separate indexing application in Java and call Tika directly, then build a SolrInputDocument which you pass to solr through SolrJ. I.e. not use

Re: SolrCloud not distributing documents across shards

2013-04-03 Thread Michael Della Bitta
From what I can tell, the Collections API has been hardened significantly since 4.2 and now will refuse to create a collection if you give it something ambiguous to do. So if you upgrade to 4.2, things will become more safe. But overall I'd find a way of using the Collections API that works and

Re: SolrCloud not distributing documents across shards

2013-04-03 Thread Mark Miller
On Apr 3, 2013, at 5:53 PM, Michael Della Bitta michael.della.bi...@appinions.com wrote: From what I can tell, the Collections API has been hardened significantly since 4.2 I did a lot of work here for 4.2.1 - there was a lot to improve. Hopefully there is much less now, but if anyone

Re: Filtering Search Cloud

2013-04-03 Thread Shawn Heisey
On 4/3/2013 1:52 PM, Furkan KAMACI wrote: Thanks for your explanation, you explained every thing what I need. Just one more question. I see that I can not make it with Solr Cloud, but I can do something like that with master-slave replication of Solr. If I use master-slave replication of Solr,

Streaming search results

2013-04-03 Thread Victor Miroshnikov
Is it possible to stream search results from Solr? Seems that this feature is missing. I see two options to solve this: 1. Using search results pagination feature The idea is to implement a smart proxy that will stream chunks from search results using pagination. 2. Implement Solr plugin

Re: Solr metrics in Codahale metrics and Graphite?

2013-04-03 Thread Walter Underwood
That sounds great. I'll check out the bug, I didn't see anything in the docs about this. And if I can't find it with a search engine, it probably isn't there. --wunder On Apr 3, 2013, at 6:39 AM, Shawn Heisey wrote: On 3/29/2013 12:07 PM, Walter Underwood wrote: What are folks using for

Re: Solr metrics in Codahale metrics and Graphite?

2013-04-03 Thread Otis Gospodnetic
It's there! :) http://search-lucene.com/?q=percentilefc_project=Solrfc_type=issue Otis -- Solr ElasticSearch Support http://sematext.com/ On Wed, Apr 3, 2013 at 6:29 PM, Walter Underwood wun...@wunderwood.org wrote: That sounds great. I'll check out the bug, I didn't see anything in the

Re: Solr metrics in Codahale metrics and Graphite?

2013-04-03 Thread Walter Underwood
In the Jira, but not in the docs. It would be nice to have VM stats like GC, too, so we can have common monitoring and alerting on all our services. wunder On Apr 3, 2013, at 3:31 PM, Otis Gospodnetic wrote: It's there! :) http://search-lucene.com/?q=percentilefc_project=Solrfc_type=issue

Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

2013-04-03 Thread Jamie Johnson
just an update, I'm at 1M records now with no issues. This looks promising as to the cause of my issues, thanks for the help. Is the routing method with numShards documented anywhere? I know numShards is documented but I didn't know that the routing changed if you don't specify it. On Wed,

RE: Solr Multiword Search

2013-04-03 Thread skmirch
The following query is doing a word search (based on my previous post)... solr/spell?q=(charles+and+the+choclit+factory+OR+(title2:(charles+and+the+choclit+factory)))spellcheck.collate=truespellcheck=truespellcheck.q=charles+and+the+choclit+factory It produces a lot of unwanted matches. In

Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

2013-04-03 Thread Jamie Johnson
I am occasionally seeing this in the log, is this just a timeout issue? Should I be increasing the zk client timeout? WARNING: Overseer cannot talk to ZK Apr 3, 2013 11:14:25 PM org.apache.solr.cloud.DistributedQueue$LatchChildWatcher process INFO: Watcher fired on path: null state: Expired type

Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

2013-04-03 Thread Mark Miller
Yeah. Are you using the concurrent low pause garbage collector? This means the overseer wasn't able to communicate with zk for 15 seconds - due to load or gc or whatever. If you can't resolve the root cause of that, or the load just won't allow for it, next best thing you can do is raise it to

Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

2013-04-03 Thread Mark Miller
This shouldn't be a problem though, if things are working as they are supposed to. Another node should simply take over as the overseer and continue processing the work queue. It's just best if you configure so that session timeouts don't happen unless a node is really down. On the other hand,

Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

2013-04-03 Thread Jamie Johnson
I am not using the concurrent low pause garbage collector, I could look at switching, I'm assuming you're talking about adding -XX:+UseConcMarkSweepGC correct? I also just had a shard go down and am seeing this in the log SEVERE: org.apache.solr.common.SolrException: I was asked to wait on state

Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

2013-04-03 Thread Mark Miller
On Apr 3, 2013, at 8:17 PM, Jamie Johnson jej2...@gmail.com wrote: I am not using the concurrent low pause garbage collector, I could look at switching, I'm assuming you're talking about adding -XX:+UseConcMarkSweepGC correct? Right - if you don't do that, the default is almost always the

hl.usePhraseHighlighter defaults to true but Query form and wiki suggest otherwise

2013-04-03 Thread Timothy Potter
Minor issues - It seems that the hl.usePhraseHighlighter is enabled by default, which definitely makes sense but the wiki says it's default value is false and the checkbox is unchecked by default on the Query form. This gives the impression this parameter defaults to false. I'm assuming the code

Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

2013-04-03 Thread Jamie Johnson
Thanks I will try that. On Wed, Apr 3, 2013 at 8:28 PM, Mark Miller markrmil...@gmail.com wrote: On Apr 3, 2013, at 8:17 PM, Jamie Johnson jej2...@gmail.com wrote: I am not using the concurrent low pause garbage collector, I could look at switching, I'm assuming you're talking about

Difference Between Indexing and Reindexing

2013-04-03 Thread Furkan KAMACI
OK, This could be a so easy question but I want to learn just a bit more technical detail of it. When I use Nutch to send documents to Solr to be indexing there are two parameters: -index and -reindex. What Solr does at each one different from the other one?

  1   2   >