Re: Does configuration change requires Zookeeper restart?

2013-09-09 Thread Upayavira
Upload changed config files to zookeeper, using the zookeeper cli, which I think is in example/cloud-scripts. Then use the collections api, over http, to reload the collection. Upayavira On Tue, Sep 10, 2013, at 06:25 AM, Prasi S wrote: > Hi, > I have solrcloud with two collections. I have indexe

Re: find all two word phrases that appear in more than one document

2013-09-09 Thread Alexandre Rafalovitch
I believe one of the admin pages (Solr 4+) shows all the terms and frequencies. You can use that even with stock example. Try that. If that makes sense, you can explore further. As to other examples, there is a couple of books. I bet Jack's book covers this. Regards, Alex. Personal website: h

Does configuration change requires Zookeeper restart?

2013-09-09 Thread Prasi S
Hi, I have solrcloud with two collections. I have indexed 100Million docs to the first collection. I need some changes to the solr configuration files. Im going to index the new data tot he second collection. What are the steps that i should follow? Should i restart the zookeeper? Pls suggest T

Re: find all two word phrases that appear in more than one document

2013-09-09 Thread Ali, Saqib
Thanks Alexandre. I looked at the wiki page for the TermsComponent. But I am not sure if I follow. Do you have an example or some better document? Thanks! :) On Mon, Sep 9, 2013 at 8:17 PM, Alexandre Rafalovitch wrote: > The "phases" are usually called n-grams or shingles. > > You can probably u

Re: Restrict Parsing duplicate file in Solr

2013-09-09 Thread shabbir
Thanks for the response. My requirement is make sure I detect file if its already indexed , neglect instead of replacing the existing one. -- View this message in context: http://lucene.472066.n3.nabble.com/Restrict-Parsing-duplicate-file-in-Solr-tp4088471p4089023.html Sent from the Solr - Use

Re: find all two word phrases that appear in more than one document

2013-09-09 Thread Alexandre Rafalovitch
The "phases" are usually called n-grams or shingles. You can probably use ShingleFilterFactory to create your shingles (possibly with outputUnigrams=false) and then use TermsComponent ( http://wiki.apache.org/solr/TermsComponent) to list the results. Regards, Alex. Personal website: http://ww

Re: Solr4.4 or zookeeper 3.4.5 do not support too many collections? more than 600?

2013-09-09 Thread diyun2008
Thank you very much for your advice. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr4-4-or-zookeeper-3-4-5-do-not-support-too-many-collections-more-than-600-tp4088689p4089009.html Sent from the Solr - User mailing list archive at Nabble.com.

find all two word phrases that appear in more than one document

2013-09-09 Thread Ali, Saqib
Dear Solr Ninjas, We would like to run a query that returns two word phrases that appear in more than one document. So for e.g. take the string "Solr Ninja". Since it appears in more than one document in our Solr instance, the query should return that. The query should find all such phrases from

Re: Data import

2013-09-09 Thread Alexandre Rafalovitch
Sounds like you want a custom UpdateRequestProcessor chain that checks if the document already exists with given primary key and does not even bother passing it on to the next processor in the chain. This would make sense as an optimization or as a first step in a complex update chain that perhaps

Re: Data import

2013-09-09 Thread Luis Portela Afonso
But with atomic updates i need to send the information, right? I want that solr automatic indexes it. And he is doing that. Can you look at the solr example in the source? There is an example on example-DIH folder. Imagine that you run the URL to import the data every 15 minutes. If the same info

Re: Data import

2013-09-09 Thread Chris Hostetter
: With cron job, I do a http request using curl, to the address : http://localhost:port/solr/core/dataimport/?command=full-import&clean=false : : When it runs, if the rss source has a feed that is already indexed on solr, : it updates the existing source. : So if the source has the same informati

Re: Data import

2013-09-09 Thread Luis Portela Afonso
So I'm indexing RSS feeds. I'm running the data import full-import command with a cron job. It runs every 15 minutes and indexes a lot of RSS feeds from many sources. With cron job, I do a http request using curl, to the address http://localhost:port/solr/core/dataimport/?command=full-import&clean

Re: Data import

2013-09-09 Thread Chris Hostetter
: When i run "dataimport/?command=full-import&clean=false", solr add new : documents with the information. But if the same information already : exists with the same uniquekey, it replaces the existing document with a : new one. : It does not update the document, it creates a new one. It's that

Re: Data import

2013-09-09 Thread Chris Hostetter
: Any form of indexing would always "replace" a document and never update it. At a very low level this is true, but Solr does support "Atomic Updates" (aka "Partial Updates") that can be used to allow a lcient to only specify the values of an existing document they want to chagne and Solr will

Re: Expunge deleting using excessive transient disk space

2013-09-09 Thread Chris Hostetter
: Looking on the infostream I can see that the first merges do succeed but : older segments are kept in reference thus cannot be deleted until all the : merging are done. I suspect what you are seeing is that filehandles for the older segemnts are kept open (and thus, the bytes on disk for thos

Re: charfilter doesn't do anything

2013-09-09 Thread Jack Krupansky
Use XML then. Although you will need to escape the XML special characters as I did in the pattern. The point is simply: Quickly and simply try to find the simple test scenario that illustrates the problem. -- Jack Krupansky -Original Message- From: Andreas Owen Sent: Monday, Septem

Re: charfilter doesn't do anything

2013-09-09 Thread Jack Krupansky
Did you at least try the pattern I gave you? The point of the curl was the data, not how you send the data. You can just use the standard Solr simple post tool. -- Jack Krupansky -Original Message- From: Andreas Owen Sent: Monday, September 09, 2013 6:40 PM To: solr-user@lucene.apac

Re: Solr4.4 or zookeeper 3.4.5 do not support too many collections? more than 600?

2013-09-09 Thread Yago Riveiro
If you have 15K collections I guess that you are doing custom sharding and not using collection sharding. My first approach was the same as you are doing. In fact, I have the same lote of cores issue. I use the Djute.maxbuffer without any issue. In last versions, Solr implements a way to do sha

Re: Solr suggest - How to define solr suggest as case insensitive

2013-09-09 Thread Chris Hostetter
: This is probably because your dictionary is made up of all lower case tokens, : but when you query the spell-checker similar analysis doesnt happen. Ideal : case would be when you query the spellchecker you send lower case queries You can init the SpellCheckComponent with a "queryAnalyzerFieldT

Re: Facet sort descending

2013-09-09 Thread Chris Hostetter
: Is there a plan to add a descending sort order for facet queries ? : Best regards Sandro I don't understand your question. if you specify multiple facet.query params, then the constraint counts are returned in the order they were initially specified -- there is no need for server side sortin

Re: charfilter doesn't do anything

2013-09-09 Thread Andreas Owen
i've downloaded curl and tried it in the comman prompt and power shell on my win 2008r2 server, thats why i used my dataimporter with a single line html file and copy/pastet the lines into schema.xml On 9. Sep 2013, at 11:20 PM, Jack Krupansky wrote: > Did you in fact try my suggested example?

Re: unknown _stream_source_info while indexing rich doc in solr

2013-09-09 Thread Chris Hostetter
: Subject: Re: unknown _stream_source_info while indexing rich doc in solr : : Error got resolved,thanks a lot Sir.I have been trying since days to : resolve it. Usersn't shouldn't have to worry about problems like this ... i'll try to make this less error prone... https://issues.apache.org/

Re: charfilter doesn't do anything

2013-09-09 Thread Andreas Owen
i tried but that isn't working either, it want a data-stream, i'll have to check how to post json instead of xml On 10. Sep 2013, at 12:52 AM, Jack Krupansky wrote: > Did you at least try the pattern I gave you? > > The point of the curl was the data, not how you send the data. You can just >

Re: Profiling Solr Lucene for query

2013-09-09 Thread Dmitry Kan
Hi Manuel, The frontend solr instance is the one that does not have its own index and is doing merging of the results. Is this the case? If yes, are all 36 shards always queried? Dmitry On Mon, Sep 9, 2013 at 10:11 PM, Manuel Le Normand < manuel.lenorm...@gmail.com> wrote: > Hi Dmitry, > > I h

Re: charfilter doesn't do anything

2013-09-09 Thread Andreas Owen
i index html pages with a lot of lines and not just a string with the body-tag. it doesn't work with proper html files, even though i took all the new lines out. html-file: nav-content nur das will ich sehenfooter-content solr update debug output: "text_html": ["\r\n\r\n\r\n\r\n\r\n\r\nnav-cont

eDismax Phrase Field Boosts on Single Terms

2013-09-09 Thread Jeff Porter
I am curious how the dismay parser handles single term queries and phrase boosts. For example, if I had a query q=bars with the following dismax parameters: qf=categories and pf=categories^100 I would expect that the parser would match on the QF parameter but then also match again on the

Re: charfilter doesn't do anything

2013-09-09 Thread Jack Krupansky
Did you in fact try my suggested example? If not, please do so. -- Jack Krupansky -Original Message- From: Andreas Owen Sent: Monday, September 09, 2013 4:42 PM To: solr-user@lucene.apache.org Subject: Re: charfilter doesn't do anything i index html pages with a lot of lines and not j

Re: Profiling Solr Lucene for query

2013-09-09 Thread Mikhail Khludnev
Hello Manuel, 1 minute sampling brings too few data. Lowering termindex should help, however I don't know how FST really behaves on in. It definitely helped at 3.x; Would you mind if I ask which OS you have and which Directory implementation is used actually? On Sun, Sep 8, 2013 at 7:56 PM, Manu

Re: Expunge deleting using excessive transient disk space

2013-09-09 Thread Walter Underwood
10% free space is guaranteed to cause problems. That is a faulty installation. Explain to ops that Solr needs double the minimum index size. This is required for normal operation. That isn't "extra", it is required for merges. Solr makes copies instead of doing record locking. The merge design i

Re: Facet Sort with non ASCII Characters

2013-09-09 Thread Yonik Seeley
On Mon, Sep 9, 2013 at 7:16 AM, Sandro Zbinden wrote: > Is there a plan to add support for alphabetical facet sorting with non ASCII > Characters ? The entire unicode range should already work. Can you give an example of what you would like to see? -Yonik http://lucidworks.com

Re: Expunge deleting using excessive transient disk space

2013-09-09 Thread Manuel Le Normand
I can only agree for the 50% free space recommendation. Unfortunately I do not have this for the current time, I'm standing on a 10% free disk (out of 300GB for each server). I'm aware it is very low. Does this seem reasonable adapting the current merge policy (or writing a new one) that would fre

Re: Profiling Solr Lucene for query

2013-09-09 Thread Manuel Le Normand
Hi Dmitry, I have solr 4.3 and every query is distributed and merged back for ranking purpose. What do you mean by frontend solr? On Mon, Sep 9, 2013 at 2:12 PM, Dmitry Kan wrote: > are you querying your shards via a frontend solr? We have noticed, that > querying becomes much faster if resul

Re: How to Manage RAM Usage at Heavy Indexing

2013-09-09 Thread P Williams
Hi, I've been seeing the same thing on CentOS with high physical memory use with low JVM-Memory use. I came to the conclusion that this was expected behaviour. Using top I noticed that my solr user's java process has Virtual memory allocated of about twice the size of the index, actual is within

Re: solr suggestion -

2013-09-09 Thread tamanjit.bin...@yahoo.co.in
Don't do any analysis on the field you are using for suggestion. What is happening here is that query time and indexing time the tokens are being broken on white space. So effectively, "at" is being taken as one token and "l" is being taken as another token for which you get two different suggestio

Re: Solr suggest - How to define solr suggest as case insensitive

2013-09-09 Thread tamanjit.bin...@yahoo.co.in
This is probably because your dictionary is made up of all lower case tokens, but when you query the spell-checker similar analysis doesnt happen. Ideal case would be when you query the spellchecker you send lower case queries -- View this message in context: http://lucene.472066.n3.nabble.com/

Re: Data import

2013-09-09 Thread tamanjit.bin...@yahoo.co.in
Any form of indexing would always "replace" a document and never update it. If you dont want replacements dont use a unique key in your schema and sort on time/date etc. But i still dont get one thing, if i have two indexes that i try to merge and both the indexes have some documents with same un

Re: Searching solr on school name during year

2013-09-09 Thread tamanjit.bin...@yahoo.co.in
You could either add two separate fields, one for start year and another for end year. And then facilitate range queries to include all docs. eg. Name - Boris start year - 2001 end year - 2005 Or you could just have one field and put in multivalued years a student has attended the school. name

Re: How to Manage RAM Usage at Heavy Indexing

2013-09-09 Thread Shawn Heisey
On 9/9/2013 10:35 AM, P Williams wrote: Is it odd that my index is ~16GB but top shows 30GB in virtual memory? Would the extra be for the field and filter caches I've increased in size? This should probably be a new thread, but it might have some applicability here, so I'm replying. I have

Re: Data import

2013-09-09 Thread Luís Portela Afonso
When i run "dataimport/?command=full-import&clean=false", solr add new documents with the information. But if the same information already exists with the same uniquekey, it replaces the existing document with a new one. It does not update the document, it creates a new one. It's that possible?

Re: collections api setting dataDir

2013-09-09 Thread mike st. john
hi, i've sorted it all out. basically a few replicas had failed and the counts on the replicas were less than the leader., i basically killed the index on those replicas and let them recover. Thanks for the help. msj On Mon, Sep 9, 2013 at 11:08 AM, Shawn Heisey wrote: > On 9/7/2013 2:2

Re: collections api setting dataDir

2013-09-09 Thread Shawn Heisey
On 9/7/2013 2:25 PM, mike st. john wrote: > yes the collections api ignored it,what i ended up doing, was just > building out some fairness in regards to creating the cores and calling > coreadmin to create the cores, seemed to work ok. Only issue i'm having > now, and i'm still investigating

Stemming and protwords configuration

2013-09-09 Thread csicard.ext
Hi, We have a Solr server using stemming: I would like to query the French words "frais" and "fraise" separately. I put the word "fraise" in protwords.txt file. - When I query the word "fraise", no document indexed with the word "frais" are found. - When I query the word "frais", I've got do

Re: Solr4.4 or zookeeper 3.4.5 do not support too many collections? more than 600?

2013-09-09 Thread diyun2008
I just found this option "-Djute.maxbuffer" in zookeeper admin document. But it's a "Unsafe Options". I can't really know what it mean. Maybe that will bring some unstable problems? Does someone have some real practical experiences when using this parameter? I will have at least 15K collections. Or

Re: Connection Established but waiting for response for a long time.

2013-09-09 Thread qungg
10 1 false 5000 65536 1500 false Everything is default expect for 5000 65536 Thanks, Q

Re: Solr4.4 or zookeeper 3.4.5 do not support too many collections? more than 600?

2013-09-09 Thread diyun2008
Thank you Yago. That seems some strange. Do you know some official document detail this? I really need more evidence to do dicision.I mean I need to compare the two method and find out which have more advantages in terms of performance and cost. And I will change my parameter to do more testing. I

Re: How to Manage RAM Usage at Heavy Indexing

2013-09-09 Thread Furkan KAMACI
Is there anything says something about that bug? 2013/8/28 Dan Davis > This could be an operating systems problem rather than a Solr problem. > CentOS 6.4 (linux kernel 2.6.32) may have some issues with page flushing > and I would read-up up on that. > The VM parameters can be tuned in /etc/sys

SOLR 4 stopwords and token positions

2013-09-09 Thread Fermin Silva
Hi Everyone, I'm migrating from SOLR 3.x to 4.x and I'm required to keep the results as close as possible as before. So I'm running some tests and found some differences. My query is: *title_search_pt:(geladeira/refrigerador)* And the parsed query becomes: *MultiPhraseQuery(title_search_pt:"(refr

Re: Solr Cell Question

2013-09-09 Thread Jamie Johnson
Thanks Erick, This is how I was doing it but when I saw the Solr Cell stuff I figured I'd give it a go. What I ended up doing is the following ModifiableSolrParams params = indexer.index(artifact); params.add("fmap.content", "my_custom_field"); params.add("extractFormat", "text"); ContentS

Re: Ideal Server Environment

2013-09-09 Thread Toke Eskildsen
On Mon, 2013-09-09 at 12:42 +0200, Raheel Hasan wrote: > Also, could you tell me if CentOS or Ubuntu will be better? You are asking for short answers to complex questions. There is nothing inherent in Solr that favours one Linux installation over another. CentOS is aimed at the enterprise, so I

Re: More on topic of Meta-search/Federated Search with Solr

2013-09-09 Thread Jakub Skoczen
Hi Dan, You might want to take a look at pazpar2 [1], an open-source, federated search engine with first-class support for SOLR (with addition to standard information retrieval protocols like Z39.50/SRU). [1] http://www.indexdata.com/pazpar2 On Thu, Sep 5, 2013 at 9:55 PM, Paul Libbrecht wrote

Facet sort descending

2013-09-09 Thread Sandro Zbinden
Dear solr users Is there a plan to add a descending sort order for facet queries ? Best regards Sandro Sandro Zbinden Software Engineer

Facet Sort with non ASCII Characters

2013-09-09 Thread Sandro Zbinden
Dear solr users Is there a plan to add support for alphabetical facet sorting with non ASCII Characters ? Best regards Sandro Sandro Zbinden Software Engineer

Re: Profiling Solr Lucene for query

2013-09-09 Thread Dmitry Kan
are you querying your shards via a frontend solr? We have noticed, that querying becomes much faster if results merging can be avoided. Dmitry On Sun, Sep 8, 2013 at 6:56 PM, Manuel Le Normand < manuel.lenorm...@gmail.com> wrote: > Hello all > Looking on the 10% slowest queries, I get very bad

Re: Ideal Server Environment

2013-09-09 Thread Raheel Hasan
ok thanks for the reply Also, could you tell me if CentOS or Ubuntu will be better? On Mon, Sep 9, 2013 at 3:17 PM, Toke Eskildsen wrote: > On Mon, 2013-09-09 at 09:39 +0200, Raheel Hasan wrote: > > Also, I wonder if Solr will require High processor? High Memory or High > > Storage? > > > >

Re: Ideal Server Environment

2013-09-09 Thread Toke Eskildsen
On Mon, 2013-09-09 at 09:39 +0200, Raheel Hasan wrote: > Also, I wonder if Solr will require High processor? High Memory or High > Storage? > > 1) For Indexing * Processor * Bulk read/write. > 2) For querying * Processor only if you have complex queries * Fast random I/O reads, which can be acc

help regarding custom query which returns custom output

2013-09-09 Thread Rohan Thakur
hi all I have requirement like I have implemented fulltext search and autosuggestion and spellcorrection functionality in solr but they all are running on different cores so I have to call 3 different request handlers for getting the results which is adding the unnecessary delay so I wanted to kno

Re: Dynamic Field

2013-09-09 Thread Alvaro Cabrerizo
Hi: As you posted, a possibility could be, to define the fields "jobs" and "batch" as multivalued and use the partial updateto add new values to those fields. Hope it helps. On Sun, Sep 8, 2013 at 9:49 PM, anurag.jain wrote: >

Re: Solr4.4 or zookeeper 3.4.5 do not support too many collections? more than 600?

2013-09-09 Thread Yago Riveiro
If you want have more collections you need to configure in zookeeper and solr the -Djute.maxbuffer variable to override the default limitation. In zookeeper you can configure it in zookeeper-env.sh file. On Solr pass the variable like the others. Note: In both cases the value configured need t

Re: multiple update processor chains.

2013-09-09 Thread mike st. john
Your correct, its not specifically for the update.chain. my mistake. thanks msj On Mon, Sep 9, 2013 at 3:34 AM, Alexandre Rafalovitch wrote: > Which section in the docs specifically? I thought it was multiple chains > per config file, but you had to choose your specific chain for individual

Re: Ideal Server Environment

2013-09-09 Thread Raheel Hasan
Also, I wonder if Solr will require High processor? High Memory or High Storage? 1) For Indexing 2) For querying On Mon, Sep 9, 2013 at 12:36 PM, Raheel Hasan wrote: > Hi guyz, > > I am trying to setup a LIVE environment for my project that uses Apache > Solr along with PHP/MySQL... > > The in

Ideal Server Environment

2013-09-09 Thread Raheel Hasan
Hi guyz, I am trying to setup a LIVE environment for my project that uses Apache Solr along with PHP/MySQL... The indexing is of heavy data (about many GBs).. Please can someone recommend the best server for this? Thanks a lot. -- Regards, Raheel Hasan

Re: multiple update processor chains.

2013-09-09 Thread Alexandre Rafalovitch
Which section in the docs specifically? I thought it was multiple chains per config file, but you had to choose your specific chain for individual processors. I might be wrong though. Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandre