Re: Semantic search with python numpy and Solr

2014-03-25 Thread Alexandre Rafalovitch
Didn't you ask exactly this question two weeks ago and got some replies that you need to do more domain analysis? Did you have any progress since and do you have more precise Solr-specific questions? Regards, Alex. P.s. http://www.brainyquote.com/quotes/quotes/a/alberteins133991.html Personal

Re: Fixing corrupted index?

2014-03-25 Thread Dmitry Kan
Oh, somehow missed that in your original e-mail. How do you run the checkindex? Do you pass the -fix option? [1] You may want to try luke [2] to open index without opening the IndexReader and run the Tools-Check Index tool from the luke. [1] http://java.dzone.com/news/lucene-and-solrs-checkindex

intersect query

2014-03-25 Thread cmd.ares
my_index(one core): id,dealer,productName,amount,region 1,A1,iphone4,400,east 2,A1,iphone4s,450,east 3,A2,iphone5s,550,east .. 4,A1,iphone4,400,west 5,A1,iphone4s,450,west 6,A3,iphone5s,550,west .. -I'd like to get which dealer sale the 'iphone' both in the 'east' and 'west' pl/sql

Re: Fixing corrupted index?

2014-03-25 Thread zqzuk
Thank you. I tried Luke with IndexReader disabled, however it seems the index is compeletely broken, as it complains ERROR: java.lang.Exception: there is no valid Lucene index in this directory. Sounds like I am out of luck, is it so? -- View this message in context:

Re: Fixing corrupted index?

2014-03-25 Thread Dmitry Kan
1. Luke: if you leave the IndexReader on, does the index even open? Can you access the CheckIndex? 2. The command line CheckIndex: what does the CheckIndex -fix do? On Tue, Mar 25, 2014 at 10:54 AM, zqzuk ziqizh...@hotmail.co.uk wrote: Thank you. I tried Luke with IndexReader disabled,

document migrate

2014-03-25 Thread Cihat güzel
hi all, I have a test for document migrate. I followed this url: https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api12Migratedocumentstoanothercollection I am trying on solr- 4.6.1. I have two collection (collection1 and collection2) and two shards. my collection1

How to index only the pdf content/text

2014-03-25 Thread Croci Francesco Luigi (ID SWS)
I searched a way to index only the content/text part of a PDF (without all the other fields Tika creates) and I found the solution with the uprefix = ignored_ and dynamicField name=ignored_* type=ignored multiValued=true indexed=false stored=false /. The problem is, that uprefix works on

Re: Fixing corrupted index?

2014-03-25 Thread zqzuk
1. No, if IndexReader is on I get the same error message from checkindex 2. It doesnt do any thing but giving that error message I posted before then quit. The full print of the error trace is: Opening index @ E:\...\zookeeper\solr\collec tion1\data\index ERROR: could not read any segments

Indexing parts of an HTML file differently

2014-03-25 Thread Michael Clivot
Hello, I have the following issue and need help: One HTML file has different parts for different countries. For example: !-- Country: FR, BE --- Address for France and Benelux !-- Country End -- !-- Country: CH -- Address for Switzerland !-- Country End -- Depending on a

Re: Indexing parts of an HTML file differently

2014-03-25 Thread Gora Mohanty
On 25 March 2014 15:59, Michael Clivot cli...@netmedia.de wrote: Hello, I have the following issue and need help: One HTML file has different parts for different countries. For example: !-- Country: FR, BE --- Address for France and Benelux !-- Country End -- !-- Country: CH

Re: document migrate

2014-03-25 Thread Jan Høydahl
Migrate is new in Solr 4.7. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com 25. mars 2014 kl. 10:51 skrev Cihat güzel c.guzel@gmail.com: hi all, I have a test for document migrate. I followed this url:

Re: document migrate

2014-03-25 Thread Furkan KAMACI
Hi; I think that we should add which version includes which parameters at Collections API wiki page. A new 'migrate' collection API to split all documents with a route key into another collection is introduced with Solr 4.7.0 Thanks; Furkan KAMACI 2014-03-25 11:51 GMT+02:00 Cihat güzel

Re: solr 4.x reindexing issues

2014-03-25 Thread Jan Høydahl
Hi, Seems you try to reindex from one server to the other. Be aware that it could be easier for you to simply copy the whole index folder over to your 4.6.1 server and start Solr as it will be able to read your 3.x index. This is unless you also want to do major upgrades of your schema or

Re: document migrate

2014-03-25 Thread Jan Høydahl
I think that we should add which version includes which parameters at Collections API wiki page. A new 'migrate' collection API to split all documents with a route key into another collection is introduced with Solr 4.7.0 Should not be necessary, since the top of every Confluence page reads

Re: Indexing parts of an HTML file differently

2014-03-25 Thread Jack Krupansky
There is no Solr feature that would break up your HTML file - you will have to do that yourself, either before you send the file to Solr or by developing a custom update processor that extracts the sections and directs each to a specific field for the language. The former is probably easier

Re: Multilingual indexing, search results, edismax and stopwords

2014-03-25 Thread Jan Høydahl
If using stopwords with edismax, please make sure that ALL fields referred in qf have stopwords defined in the fieldType and also that the stopword dictionary is the SAME for all these. This way you will not encounter the infamous edismax+stopwords bug mentioned in

Re: How to secure Solr admin page?

2014-03-25 Thread Jan Høydahl
Hi, First of all, the wiki page you refer to is *not* the official ref-guide. The official one can be found here https://cwiki.apache.org/confluence/display/solr/Apache+Solr+Reference+Guide The wiki you found is a community-edited wiki, and may talk about ideas or patches. The autentication

Re: intersect query

2014-03-25 Thread Ahmet Arslan
Hi Ares, How about using field collapsing?  https://wiki.apache.org/solr/FieldCollapsing q=+region:(east OR west) +productName:iPhone group=true group.field=dealer If the number of distinct groups is high, CollapsingQueryParser could be used too.

Re: Fixing corrupted index?

2014-03-25 Thread Dmitry Kan
right. If you have cfs files in the index directory, there is a thread discussing the method of regenerating the segment files: http://www.gossamer-threads.com/lists/lucene/java-user/39744 backup before doing changes! source on SO:

alternate address for solr-user list, subscription confirmation

2014-03-25 Thread Philip Durbin
Thanks for Solr! It's a great product. I've been hanging out in #lucene-dev for a while but I thought I'd join the mailing list. ezmlm seems to pick up an alternate email address of mine in the Return-Path header so I tried to override the default subscription address by emailing

Re: Can the solr dataimporthandler consume an atom feed?

2014-03-25 Thread eShard
Gora! It works now! You are amazing! thank you so much! I dropped the atom: from the xpath and everything is working. I did have a typo that might have been causing issues too. thanks again! -- View this message in context:

search WITH or WITHOUT accents (selection at runtime) + highlights

2014-03-25 Thread elfu
Hello, I have the following problem to resolve using solr : search WITH or WITHOUT accents (selection at runtime) + highlights how can i configure the schema to realize this ? for example: inputString aaa près bbb pres A) accent sensitive 1.search for *près* highlight =aaa

Re: SolrCloud from Stopping recovery for warnings to crash

2014-03-25 Thread Lukas Mikuckis
This night the problem occurred again and I have more data. This time this problem happened only in one solr server and it successfully recovered. solr which had all the leaders: [06:38:58.205 - 06:38:58.222] Stopping recovery for zkNodeName=core_node2core=** *- for all collections*

Re: creating shards on the fly in a single Solr instance (shards query parameter)

2014-03-25 Thread Shalin Shekhar Mangar
Hi Philip, Comments inline: On Tue, Mar 25, 2014 at 8:11 PM, Philip Durbin philip_dur...@harvard.edu wrote: I'm new to Solr and am exploring the idea of creating shards on the fly. Once the shards have been created and populated, I am hoping to use the shards query parameter to combine

Re: Question on highlighting edgegrams

2014-03-25 Thread Software Dev
Bump On Mon, Mar 24, 2014 at 3:00 PM, Software Dev static.void@gmail.com wrote: In 3.5.0 we have the following. fieldType name=autocomplete class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter

Replication (Solr Cloud)

2014-03-25 Thread Software Dev
I see that by default in SolrCloud that my collections are replicating. Should this be disabled in SolrCloud as this is already handled by it? From the documentation: The Replication screen shows you the current replication state for the named core you have specified. In Solr, replication is for

Multiple search analyzers question

2014-03-25 Thread ku3ia
Hi all! Now I have a default search field, defined as field name=Text type=text indexed=true stored=true / ... fieldType name=text class=solr.TextField autoGeneratePhraseQueries=true analyzer type=index tokenizer class=solr.ClassicTokenizerFactory/ filter

Re: Replication (Solr Cloud)

2014-03-25 Thread Shawn Heisey
On 3/25/2014 10:42 AM, Software Dev wrote: I see that by default in SolrCloud that my collections are replicating. Should this be disabled in SolrCloud as this is already handled by it? From the documentation: The Replication screen shows you the current replication state for the named core

Re: solr 4.x reindexing issues

2014-03-25 Thread Ravi Solr
Thank you very much for responding Mr. Høydahl. I removed the recursion which eliminated the stack overflow exception. However, I still encountering my main problem with the docs not getting indexed in solr 4.x as I mentioned in my original email. The reason I am reindexing is that with solr 4.x

creating shards on the fly in a single Solr instance (shards query parameter)

2014-03-25 Thread Philip Durbin
I'm new to Solr and am exploring the idea of creating shards on the fly. Once the shards have been created and populated, I am hoping to use the shards query parameter to combine results from multiple shards into a single results set. By following the Testing Index Sharding on Two Local Servers

Re: Re-index Parent-Child Schema

2014-03-25 Thread Vijay Kokatnur
Hello Mikhail, Thanks for the suggestions. It took some time to get to this - 1. FieldsCollapsing cannot be done on Multivalue fields - https://wiki.apache.org/solr/FieldCollapsing 2. Join acts on documents, how can I use it to join multi-value fields in the same document? 3. Block-join

Re: Replication (Solr Cloud)

2014-03-25 Thread Michael Della Bitta
No, don't disable replication! The way shards ordinarily keep up with updates is by sending every document to each member of the shard. However, if a shard goes offline for a period of time and comes back, replication is used to catch up that shard. So you really need it on. If you created your

Re: Replication (Solr Cloud)

2014-03-25 Thread Software Dev
Thanks for the reply. Ill make sure NOT to disable it.

Re: Solr Cloud collection keep going down?

2014-03-25 Thread Software Dev
Can anyone else chime in? Thanks On Mon, Mar 24, 2014 at 10:10 AM, Software Dev static.void@gmail.com wrote: Shawn, Thanks for pointing me in the right direction. After consulting the above document I *think* that the problem may be too large of a heap and which may be affecting GC

Re: Solr Cloud collection keep going down?

2014-03-25 Thread Michael Della Bitta
What kind of load are the machines under when this happens? A lot of writes? A lot of http connections? Do your zookeeper logs mention anything about losing clients? Have you tried turning on GC logging or profiling GC? Have you tried running with a smaller max heap size, or setting

AND not as a boolean operator in Phrase

2014-03-25 Thread abhishek jain
hi friends, when i search for A and B it gives me result for A , B , i am not sure why? Please guide how can i exact match when it is within phrase/quotes. -- Thanks and kind Regards, Abhishek jain

Re: AND not as a boolean operator in Phrase

2014-03-25 Thread Jack Krupansky
What does your field type analyzer look like? I suspect that you have a stop filter which cause and to be removed. -- Jack Krupansky -Original Message- From: abhishek jain Sent: Tuesday, March 25, 2014 1:29 PM To: solr-user@lucene.apache.org Subject: AND not as a boolean operator

Re: Replication (Solr Cloud)

2014-03-25 Thread Software Dev
One other question. If I optimize a collection on one node, does this get replicated to all others when finished? On Tue, Mar 25, 2014 at 10:13 AM, Software Dev static.void@gmail.com wrote: Thanks for the reply. Ill make sure NOT to disable it.

Re: solr 4.x reindexing issues

2014-03-25 Thread Ravi Solr
Iam also seeing the following in the log. Is it really commiting ??? Now I am totally confused about how solr 4.x indexes. My relavant update config is as shown below updateHandler class=solr.DirectUpdateHandler2 maxPendingDeletes1/maxPendingDeletes autoCommit maxDocs100/maxDocs

Re: Replication (Solr Cloud)

2014-03-25 Thread Software Dev
Ehh.. found out the hard way. I optimized the collection on 1 machine and when it was completed it replicated to the others and took my cluster down. Shitty On Tue, Mar 25, 2014 at 10:46 AM, Software Dev static.void@gmail.com wrote: One other question. If I optimize a collection on one node,

Re: Replication (Solr Cloud)

2014-03-25 Thread Shawn Heisey
On 3/25/2014 11:59 AM, Software Dev wrote: Ehh.. found out the hard way. I optimized the collection on 1 machine and when it was completed it replicated to the others and took my cluster down. Shitty It doesn't get replicated -- each core in the collection will be optimized. In older

Re: Replication (Solr Cloud)

2014-03-25 Thread Software Dev
So its generally a bad idea to optimize I gather? - In older versions it might have done them all at once, but I believe that newer versions only do one core at a time. On Tue, Mar 25, 2014 at 11:16 AM, Shawn Heisey s...@elyograg.org wrote: On 3/25/2014 11:59 AM, Software Dev wrote: Ehh..

Re: Replication (Solr Cloud)

2014-03-25 Thread Software Dev
In older versions it might have done them all at once, but I believe that newer versions only do one core at a time. It looks like it did it all at once and I'm on the latest (4.7) On Tue, Mar 25, 2014 at 11:27 AM, Software Dev static.void@gmail.com wrote: So its generally a bad idea to

Re: solr 4.x reindexing issues

2014-03-25 Thread Lan
Ravi, It looks like you are re-indexing data by pulling data from your solr server and then indexing it back to the same server. I can think of many things that could go wrong with this setup. For example are all your fields stored? Since you are iterating through all documents on the solr server

Re: Replication (Solr Cloud)

2014-03-25 Thread Walter Underwood
Yes, it is generally a bad idea to optimize. The system continually does merges as needed. You generally do not need to force a full merge. wunder On Mar 25, 2014, at 11:27 AM, Software Dev static.void@gmail.com wrote: So its generally a bad idea to optimize I gather? - In older

document level security filter solution for Solr

2014-03-25 Thread Philip Durbin
I'm new to Solr and I'm looking for a document level security filter solution. Anonymous users searching my application should be able to find public data. Logged in users should be able to find public data and private data they have access to. Earlier today I wrote about shards as a possible

Re: document level security filter solution for Solr

2014-03-25 Thread Yonik Seeley
Depending on requirements, another option for simple security is to store the security info in the index and utilize a join. This really only works when you have a single shard since joins aren't distributed. # the documents, with permissions id:doc1, perms:public,... id:doc2, perms:group1

Re: solr 4.x reindexing issues

2014-03-25 Thread Ravi Solr
I just tried even reading from one core A and indexed it into core B and the same issue still persists. On Tue, Mar 25, 2014 at 2:49 PM, Lan dung@gmail.com wrote: Ravi, It looks like you are re-indexing data by pulling data from your solr server and then indexing it back to the same

Re: solr 4.x reindexing issues

2014-03-25 Thread Ravi Solr
Sorry Guys, really apologize for wasting your time...bone headed coding on my part. Did not set the rows and start to correct values for proper pagination so it was getting the same 10 docs every single time. Thanks Ravi Kiran Bhaskar On Tue, Mar 25, 2014 at 3:50 PM, Ravi Solr

Re: Question on highlighting edgegrams

2014-03-25 Thread Software Dev
Same problem here: http://lucene.472066.n3.nabble.com/Solr-4-x-EdgeNGramFilterFactory-and-highlighting-td4114748.html On Tue, Mar 25, 2014 at 9:39 AM, Software Dev static.void@gmail.com wrote: Bump On Mon, Mar 24, 2014 at 3:00 PM, Software Dev static.void@gmail.com wrote: In 3.5.0

Re: Multiple search analyzers question

2014-03-25 Thread Gora Mohanty
On Mar 25, 2014 10:37 PM, ku3ia dem...@gmail.com wrote: Hi all! Now I have a default search field, defined as field name=Text type=text indexed=true stored=true / ... fieldType name=text class=solr.TextField autoGeneratePhraseQueries=true analyzer type=index tokenizer

DIH dataimport.properties Zulu time

2014-03-25 Thread Kiran J
Hi Is it possible to set up the data import handler so that it keeps track of the last imported time in Zulu time and not local time ? Its not very clear from the documentation how to do it or if it is even possible to do it. Ref:

Memory Problems + java.lang.ref.Finalizer

2014-03-25 Thread Harish Agarwal
In reference to my prior thread: http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201403.mbox/%3ccac-cpvrzbhizomcdhkrhygqizguerntkwtkxwwx3j1rqcxe...@mail.gmail.com%3E I followed the advice to set unmap=false on my indexes with promising results. Without performing any index updates I

Re: leaks in solr

2014-03-25 Thread harish.agarwal
I'm having a very similar issue to this currently on 4.6.0 (large java.lang.ref.Finalizer usage, many open file handles to long gone files) -- were you able to make any progress diagnosing this issue? -- View this message in context:

What contributes to disk IO?

2014-03-25 Thread Software Dev
What are the main contributing factors for Solr Cloud generating a lot of disk IO? A lot of reads? Writes? Insufficient RAM? I would think if there was enough disk cache available for the whole index there would be little to no disk IO.

RE: intersect query

2014-03-25 Thread Susheel Kumar
How big is you index? #documents, #size? Thanks, Susheel -Original Message- From: cmd.ares [mailto:cmd.a...@gmail.com] Sent: Tuesday, March 25, 2014 4:50 AM To: solr-user@lucene.apache.org Subject: intersect query my_index(one core): id,dealer,productName,amount,region

Re: AND not as a boolean operator in Phrase

2014-03-25 Thread Koji Sekiguchi
(2014/03/26 2:29), abhishek jain wrote: hi friends, when i search for A and B it gives me result for A , B , i am not sure why? Please guide how can i exact match when it is within phrase/quotes. Generally speaking (w/ LuceneQParser), if you want phrase match results, use quotes, i.e. q=A B.

Re: AND not as a boolean operator in Phrase

2014-03-25 Thread François Schiettecatte
Better to user '+A +B' rather than AND/OR, see: http://searchhub.org/2011/12/28/why-not-and-or-and-not/ François On Mar 25, 2014, at 10:21 PM, Koji Sekiguchi k...@r.email.ne.jp wrote: (2014/03/26 2:29), abhishek jain wrote: hi friends, when i search for A and B it gives me result

Re: DIH dataimport.properties Zulu time

2014-03-25 Thread Gora Mohanty
On 26 March 2014 02:44, Kiran J kiranjuni...@gmail.com wrote: Hi Is it possible to set up the data import handler so that it keeps track of the last imported time in Zulu time and not local time ? [...] Start your JVM with the desired timezone, e.g., java -Duser.timezone=UTC -jar start.jar

Issue with passing local params 4.7

2014-03-25 Thread William Bell
q_score=cancer

Re: Issue with passing local params 4.7

2014-03-25 Thread William Bell
sort: rint(product(sum($p_score,$s_score,$q_score),100)) desc,s_query asc ,tie: 1,q1: $q,q_score: query({!dismax qf=\user_query_edge^1 user_query^0.5 user_query_fuzzy\ v=$q1}), I also tried q1=cancer... It does not work unless I set v='cancer' On Tue, Mar 25, 2014 at 9:12 PM, William Bell