Re: hot deploy of newer version of solr schema in production

2012-01-24 Thread Jan Høydahl
Hi, To be able to do a true hot deploy of newer schema without reindexing, you must carefully see to that none of your changes are breaking changes. So you should test the process on your development machine and make sure it works. Adding and deleting fields would work, but not changing the

Re: Highlighting stopwords

2012-01-24 Thread Koji Sekiguchi
(12/01/24 9:31), O. Klein wrote: Let's say I search for spellcheck solr on a website that only contains info about Solr, so solr was added to the stopwords.txt. The query that will be parsed then (dismax) will not contain the term solr. So fragments won't contain highlights of the term solr. So

Re: Size of index to use shard

2012-01-24 Thread Vadim Kisselmann
Hi, it depends from your hardware. Read this: http://www.derivante.com/2009/05/05/solr-performance-benchmarks-single-vs-multi-core-index-shards/ Think about your cache-config (few updates, big caches) and a good HW-infrastructure. In my case i can handle a 250GB index with 100mil. docs on a I7

RE: Filtering search results by an external set of values

2012-01-24 Thread John, Phil (CSS)
Thanks for the responses. Groups probably wouldn't work as while there will be some overlap between customers, each will have a very different overall set of accessible resources. I'll try the suggestion about simply reindexing, or using the no-cache option and see how I get on. Failing that,

Re: ExractionHandler/Cell ignore just 2 fields defined in schema 3.5.0

2012-01-24 Thread Wayne W
Ah perfect - thank you Jan so much. :-) On Tue, Jan 24, 2012 at 11:14 AM, Jan Høydahl jan@cominvent.com wrote: Hi, It's because lowernames=true by default in solrconfig.xml, and it will convert any - into _ in field names. So try adding a request parameter lowernames=false or change

Re: Filtering search results by an external set of values

2012-01-24 Thread Mikhail Khludnev
Phil, Some time ago I posted my thoughts about the similar problem. Scroll to part II. http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201201.mbox/%3CCANGii8egoB1_rXFfwJMheyxx72v48B_DA-6KteKOymiBrR=m...@mail.gmail.com%3E Regards On Tue, Jan 24, 2012 at 1:36 PM, John, Phil (CSS)

Re: Highlighting stopwords

2012-01-24 Thread O. Klein
Ah, I never used the hl.q That did the trick. Thanx! -- View this message in context: http://lucene.472066.n3.nabble.com/Highlighting-stopwords-tp3681901p3684245.html Sent from the Solr - User mailing list archive at Nabble.com.

solr stopwords issue - documents are not matching

2012-01-24 Thread Ankita Patil
Hi, I am using solr-3.4. My part of the schema looks like : fieldType name=text class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory

highlighter not supporting surround parser

2012-01-24 Thread manyutomar
i want performing span queries using surround parser and i want tos how the results with highlighter, but the problem is highlighter is not working properly with surround query parser.Are their any plugins or updates available to do it. -- View this message in context:

Re: index-time over boosted

2012-01-24 Thread remi tassing
Any idea? This is a snippet of my schema.xml now: ?xml version=1.0 encoding=UTF-8 ? !-- Licensed to the Apache Software Foundation (ASF) under one or more ... !-- fields for index-basic plugin -- field name=host type=url stored=false indexed=true/ field name=site type=string

Re: Size of index to use shard

2012-01-24 Thread Dmitry Kan
Hi, The article you gave mentions 13GB of index size. It is quite small index from our perspective. We have noticed, that at least solr 3.4 has some sort of choking point with respect to growing index size. It just becomes substantially slower than what we need (a query on avg taking more than

Re: Advanced stopword handling edismax

2012-01-24 Thread O. Klein
O. Klein wrote As I understand it with edismax in trunk, whenever you have a query that only contains stopwords then all the terms are required. But when I try this I only get an empty parsedQuery like: (+() () () () () () () () () () ()

Re: Size of index to use shard

2012-01-24 Thread Anderson vasconcelos
Apparently, not so easy to determine when to break the content into pieces. I'll investigate further about the amount of documents, the size of each document and what kind of search is being used. It seems, I will have to do a load test to identify the cutoff point to begin using the strategy of

Re: index-time over boosted

2012-01-24 Thread Jan Høydahl
That looks right. Can you restart your Solr, do a new search with debugQuery=true and copy/paste the full EXPLAIN output for your query? -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 24. jan. 2012, at 13:22, remi tassing

RE: Highlighting more than 1 term

2012-01-24 Thread Tim Hibbs
Nitin and any others who may have followed this item, I resolved the issue, but I'm not exactly sure of the originating cause. I had change the field types of my text fields to text_en and then re-indexed. Changing to text_en kept highlighting from happening to more than one term in the fields

Re: index-time over boosted

2012-01-24 Thread remi tassing
Hello, thanks for helping out Jan, I really appreciate that! These are full explains of two results: Result#1.-- 3.0412199E-5 = (MATCH) max of: 3.0412199E-5 = (MATCH) weight(content:mobil broadband^0.5 in 19081), product of: 0.13921623 = queryWeight(content:mobil broadband^0.5),

full import is not working and still not showing any errors

2012-01-24 Thread scabra4
hi all, anyone can help me with this please. i am trying to do a full import, i've done everything correctly, now when i try the full import an xml page displays showing the following and i stays like this now matter how i refresh the page: This XML file does not appear to have any style

Not getting the expected search results

2012-01-24 Thread m0rt0n
Hello, I am a newbie in this Solr world and I am getting surprised because I try to do searches, both with the browser interface and by using a Java client and the expected results do not appear. The issue is: 1) I have set up an entity called via in my data-config.xml with 5 fields. I do the

Re: Limiting term frequency in a document to a specific term

2012-01-24 Thread solr user
With the Solr search relevancy functions, a ParseException, unknown function ttf in FunctionQuery. http://localhost:8983/solr/select/?fl=score,documentPageIddefType=funcq=ttf(contents,amplifiers) where contents is a field name, and amplifiers is text in the field name. Just curious why I get a

analyzing stored fields (removing HTML tags)

2012-01-24 Thread Robert Stewart
Is it possible to configure schema to remove HTML tags from stored field content? As far as I can tell analyzers can only be applied to indexed content, but they don't affect stored content. I want to remove HTML tags from text fields so that returned text content from stored field has no HTML

Re: index-time over boosted

2012-01-24 Thread Jan Høydahl
Hi, Well, I think you do it right, but get tricked by either editing the wrong file, a typo or browser caching. Why not try to start with a fresh Solr3.5.0, start the example app, index all exampledocs, search for Podcasts, you get one hit, in fields text and features. Then change

Re: Solr Java client API

2012-01-24 Thread Erick Erickson
It would really help to see the relevant parts of the code you're using to see what you've tried. You might want to review: http://wiki.apache.org/solr/UsingMailingLists Best Erick On Mon, Jan 23, 2012 at 2:45 PM, jingjung Ng jingjun...@gmail.com wrote: Hi, I implemented the facet using

Re: analyzing stored fields (removing HTML tags)

2012-01-24 Thread darul
You probably may use a Sanitizer as we do here. http://stackoverflow.com/questions/1947021/libs-for-html-sanitizing -- View this message in context: http://lucene.472066.n3.nabble.com/analyzing-stored-fields-removing-HTML-tags-tp3685144p3685182.html Sent from the Solr - User mailing list

Re: Hierarchical faceting in UI

2012-01-24 Thread Yuhao
Darren, One challenge for me is that a term can appear in multiple places of the hierarchy.  So it's not safe to simply use the term as it appears to get its children; I probably need to include the entire tree path up to this term.  For example, if the hierarchy is Cardiovascular Diseases

Re: java.net.SocketException: Too many open files

2012-01-24 Thread Michael Kuhlmann
Hi Jonty, no, not really. When we first had such problems, we really thought that the number of open files is the problem, so we implemented an algorithm that performed an optimize from time to time to force a segment merge. Due to some misconfiguration, this ran too often. With the result

Re: java.net.SocketException: Too many open files

2012-01-24 Thread Sethi, Parampreet
Hi Jonty, You can try changing the maximum number of files opened by a process using command: ulimit -n XXX In case, the number of opened files is not increasing with time and just a constant number which is larger than system default limit, this should fix it. -param On 1/24/12 11:40 AM,

using pre-core properties in dih config

2012-01-24 Thread Robert Stewart
I have a multi-core setup, and for each core I have a shared data-config.xml which specifies a SQL query for data import. What I want to do is have the same data-config.xml file shared between my cores (linked to same physical file). I'd like to specify core properties in solr.xml such that each

Re: Size of index to use shard

2012-01-24 Thread Erick Erickson
Talking about index size can be very misleading. Take a look at http://lucene.apache.org/java/3_5_0/fileformats.html#file-names. Note that the *.fdt and *.fdx files are used to for stored fields, i.e. the verbatim copy of data put in the index when you specify stored=true. These files have

Re: Limiting term frequency in a document to a specific term

2012-01-24 Thread Erick Erickson
At a guess, you're using 3.x and the relevance functions are only on trunk (4.0). Best Erick On Tue, Jan 24, 2012 at 7:49 AM, solr user mvidaat...@gmail.com wrote: With the Solr search relevancy functions, a ParseException, unknown function ttf in FunctionQuery.

phrase auto-complete with suggester component

2012-01-24 Thread Tommy Chheng
I'm testing out the various auto-complete functionalities on the wikipedia dataset. I first tried the facet.prefix and found it slow at times. I'm now looking at the Suggester component. Given a query like new york, I would like to get results like New York or New York City. When I tried using

Re: Hierarchical faceting in UI

2012-01-24 Thread Darren Govoni
Yuhao, Ok, let me think about this. A term can have multiple parents. Each of those parents would be 'different', yes? In this case, use a multivalued field for the parent and add all the parent names or id's to it. The relations should be unique. Your UI will associate the correct parent

SolrCell maximum file size

2012-01-24 Thread Augusto Camarotti
Hi everybody Does anyone knows if there is a maximum file size that can be uploaded to the extractingrequesthandler via http request? Thanks in advance, Augusto Camarotti

HTMLStripCharFilterFactory not working in Solr4?

2012-01-24 Thread Mike Hugo
We recently updated to the latest build of Solr4 and everything is working really well so far! There is one case that is not working the same way it was in Solr 3.4 - we strip out certain HTML constructs (like trademark and registered, for example) in a field as defined below - it was working in

Re: HTMLStripCharFilterFactory not working in Solr4?

2012-01-24 Thread Yonik Seeley
You can use LegacyHTMLStripCharFilterFactory to get the previous behavior. See https://issues.apache.org/jira/browse/LUCENE-3690 for more details. -Yonik http://www.lucidimagination.com On Tue, Jan 24, 2012 at 1:34 PM, Mike Hugo m...@piragua.com wrote: We recently updated to the latest build

Re: Hierarchical faceting in UI

2012-01-24 Thread Yuhao
Hi Darren.  You said: Your UI will associate the correct parent id to build the facet query This is the part I'm having trouble figuring out how to accomplish and some guidance would help. How would I get the value of the parent to build the facet query in the UI, if the value is in another

Re: Size of index to use shard

2012-01-24 Thread Anderson vasconcelos
Thanks for the explanation Erick :) 2012/1/24, Erick Erickson erickerick...@gmail.com: Talking about index size can be very misleading. Take a look at http://lucene.apache.org/java/3_5_0/fileformats.html#file-names. Note that the *.fdt and *.fdx files are used to for stored fields, i.e. the

Re: HTMLStripCharFilterFactory not working in Solr4?

2012-01-24 Thread Mike Hugo
Thanks for the response Yonik, Interestingly enough, changing to to the LegacyHTMLStripCharFilterFactory does NOT solve the problem - in fact I get the same result I can see that the LegacyHTMLStripCharFilterFactory is being applied at startup: Jan 24, 2012 1:25:29 PM

Re: phrase auto-complete with suggester component

2012-01-24 Thread O. Klein
You might wanna read http://lucene.472066.n3.nabble.com/suggester-issues-td3262718.html#a3264740 which contains the solution to your problem. -- View this message in context: http://lucene.472066.n3.nabble.com/phrase-auto-complete-with-suggester-component-tp3685572p3685730.html Sent from the

Indexing failover and replication

2012-01-24 Thread Anderson vasconcelos
Hi I'm doing now a test with replication using solr 1.4.1. I configured two servers (server1 and server 2) as master/slave to sincronized both. I put apache on the front side, and we index sometime in server1 and sometime in server2. I realized that the both index servers are now confused. In

Re: phrase auto-complete with suggester component

2012-01-24 Thread Tommy Chheng
Thanks, I'll try out the custom class file. Any possibilities this class can be merged into solr? It seems like an expected behavior. On Tue, Jan 24, 2012 at 11:29 AM, O. Klein kl...@octoweb.nl wrote: You might wanna read

RE: HTMLStripCharFilterFactory not working in Solr4?

2012-01-24 Thread Steven A Rowe
Hi Mike, When I add the following test to TestHTMLStripCharFilterFactory.java on Solr trunk, it passes: public void testNumericCharacterEntities() throws Exception { final String text = Bose#174; #8482;; // |Bose® ™| HTMLStripCharFilterFactory htmlStripFactory = new

RE: HTMLStripCharFilterFactory not working in Solr4?

2012-01-24 Thread Michael Ryan
Try putting the HTMLStripCharFilterFactory before the StandardTokenizerFactory instead of after it. I vaguely recall being burned by something like this before. -Michael

Re: HTMLStripCharFilterFactory not working in Solr4?

2012-01-24 Thread Yonik Seeley
Oops, I didn't read carefully enough to see that you wanted those constructs entirely stripped out. Given that you're seeing numbers indexed, this strongly indicates an escaping bug in the SolrJ client that must have been introduced at some point. I'll see if I can reproduce it in a unit test.

Re: dismax: limiting term match to one field

2012-01-24 Thread astubbs
This seems like a real shame. As soon as you search across more than one field, the mm setting becomes nearly useless. -- View this message in context: http://lucene.472066.n3.nabble.com/dismax-limiting-term-match-to-one-field-tp2056498p3685850.html Sent from the Solr - User mailing list archive

Re: Size of index to use shard

2012-01-24 Thread Vadim Kisselmann
@Erick thanks:) i´m with you with your opinion. my load tests show the same. @Dmitry my docs are small too, i think about 3-15KB per doc. i update my index all the time and i have an average of 20-50 requests per minute (20% facet queries, 80% large boolean queries with wildcard/fuzzy) . How much

Re: HTMLStripCharFilterFactory not working in Solr4?

2012-01-24 Thread Mike Hugo
Thanks for the responses everyone. Steve, the test method you provided also works for me. However, when I try a more end to end test with the HTMLStripCharFilterFactory configured for a field I am still having the same problem. I attached a failing unit test and configuration to the following

Fw: Problem with SpliBy in Solr 3.4

2012-01-24 Thread Sumit Sen
- Forwarded Message - From: Sumit Sen sumitse...@yahoo.com To: Solr List solr-user@lucene.apache.org Sent: Tuesday, January 24, 2012 3:53 PM Subject: Problem with SpliBy in Solr 3.4 Hi All: I have a very silly problem. I am using Solr 3.4. I have a data import handle for indexing

Re: Do Hignlighting + proximity using surround query parser

2012-01-24 Thread Scott Stults
I got this working the way you describe it (in the getHighlightQuery() method). The span queries were tripping it up, so I extracted the query terms and created a DisMax query from them. There'll be a loss of accuracy in the highlighting, but in my case that's better than no highlighting. Should

Re: Solr 3.5.0 can't find Carrot classes

2012-01-24 Thread Christopher J. Bottaro
On Tuesday, January 24, 2012 at 3:07 PM, Christopher J. Bottaro wrote: SEVERE: java.lang.NoClassDefFoundError: org/carrot2/core/ControllerFactory at org.apache.solr.handler.clustering.carrot2.CarrotClusteringEngine.init(CarrotClusteringEngine.java:102) at

Re: Do Hignlighting + proximity using surround query parser

2012-01-24 Thread Ahmet Arslan
I got this working the way you describe it (in the getHighlightQuery() method). The span queries were tripping it up, so I extracted the query terms and created a DisMax query from them. There'll be a loss of accuracy in the highlighting, but in my case that's better than no highlighting.

solr not working with magento enterprise 1.11

2012-01-24 Thread vishal_asc
I am integrating solr 3.5 with jetty in magento EE 1.11. I have followed all the necessary steps, configure and tested solr connection in magento catalog system config. I have copied magento/lib/Solr/conf/ content to solr installation. I have run the index management, restarted jetty but when I

Re: solr not working with magento enterprise 1.11

2012-01-24 Thread David Radunz
Hey, Shouldn't you be asking this question to the Magento people? You have an Enterprise edition, so you have paid for their support. Cheers, David On 25/01/2012 2:57 PM, vishal_asc wrote: I am integrating solr 3.5 with jetty in magento EE 1.11. I have followed all the necessary

Re: Solr Cores

2012-01-24 Thread Sujatha Arun
Thanks Erick. Regards Sujatha On Mon, Jan 23, 2012 at 11:16 PM, Erick Erickson erickerick...@gmail.comwrote: You can have a large number of cores, some people have multiple hundreds. Having multiple cores is preferred over having multiple JVMs since it's more efficient at sharing system

Re: solr not working with magento enterprise 1.11

2012-01-24 Thread vishal_asc
Thanks David. As of now we are configuring it on local WAMP server and we have only development version provided by sales team. Do you when where solr saves information or push the xml docs when we run index management in magento ? I followed this site:

RE: solr not working with magento enterprise 1.11

2012-01-24 Thread vishal_asc
Thanks David. As of now we are configuring it on local WAMP server and we have only development version provided by sales team. Do you when where solr saves information or push the xml docs when we run index management in magento ? I followed this site:

Re: solr not working with magento enterprise 1.11

2012-01-24 Thread David Radunz
Hey, I am using Magento Community Edition, I wrote my own Magento extension to integrate Solr and it works fine. So I really don't know what the Enterprise edition does. On a personal and unrelated note, I would never use Windows for a server; Unreliable and most of the system resources

Re: SpellCheck Help

2012-01-24 Thread vishal_asc
I have installed the same solr 3.5 with jetty and integrating it magento 1.11 but it seems to be not working. As my search result is not showing Did you mean string ? when I misspelled any word. I followed all steps necessary for magento solr integration. Please help ASAP. Thanks Vishal --