Use a different folder for schema.xml

2012-08-22 Thread Alexander Cougarman
Hi. For our Solr instance, we need to put the schema.xml file in a different location than where it resides now. Is this possible? Thanks. Sincerely, Alex

Re: Co-existing solr cloud installations

2012-08-22 Thread Lance Norskog
ZK has a 'chroot' feature (named after the Unix multi-tenancy feature). http://zookeeper.apache.org/doc/r3.2.2/zookeeperProgrammers.html#ch_zkSessions https://issues.apache.org/jira/browse/ZOOKEEPER-237 The last I heard, this feature could work for making a single ZK cluster support multiple

Re: Scalability of Solr Result Grouping/Field Collapsing: Millions/Billions of documents?

2012-08-22 Thread Lance Norskog
How do you separate the documents among the shards? Can you set up the shards such that one collapse group is only on a single shard? That you never have to do distributed grouping? On Tue, Aug 21, 2012 at 4:10 PM, Tirthankar Chatterjee tchatter...@commvault.com wrote: This wont work, see my

Re: Does DIH commit during large import?

2012-08-22 Thread Lance Norskog
Solr has a separate feature called 'autoCommit'. This is configured in solrconfig.xml. You can set Solr to commit all documents every N milliseconds or every N documents, whichever comes first. If you want intermediate commits during a long DIH session, you have to use this or make your own script

Re: How to design index for related versioned database records

2012-08-22 Thread Lance Norskog
Another option is to take the minimum time interval and record every active interval during an employee record. Make a compound key of the employee and the time range. (Look at the SignatureUpdateProcessor for how to do this.) Add one multi-valued field that contains all of the time intervals for

Re: Solr search – Tika extracted text from PDF not return highlighting snippet

2012-08-22 Thread Lance Norskog
There is no copyField in the schema. You have to store the parsed text in a field which is stored! Highlighting works on stored fields. There is no text field in the schema. I don't know how the DIH automatically creates it. On Tue, Aug 21, 2012 at 2:10 PM, anarchos78

Re: Use a different folder for schema.xml

2012-08-22 Thread Lance Norskog
It is possible to store the entire conf/ directory somewhere. To store only the schema.xml file, try soft links or the XML include feature: conf/schema.xml includes from somewhere else. On Tue, Aug 21, 2012 at 11:31 PM, Alexander Cougarman acoug...@bwc.org wrote: Hi. For our Solr instance, we

Which directories are required in Solr?

2012-08-22 Thread Alexander Cougarman
Hi. Which folders/files can be deleted from the default Solr package (apache-solr-3.6.1.zip) on Windows if all we'd like to do is index/store documents? Thanks. Sincerely, Alex

Highlighting is case sensitive when search with double quote

2012-08-22 Thread vrpar...@gmail.com
when i search with abc cde, solr will return result but highlighting portion is as per below, lst name=highlighting lst name=1 /lst /lst and when i search with ABC cde it will have below response lst name=highlighting lst name=1 arr name=SearchField str ... ... ABC cde . /str /arr /lst

display SOLR Query in web page

2012-08-22 Thread Bernd Fehling
Now this is very scary, while searching for solr direct access per docid I got a hit from US Homeland Security Digital Library. Interested in what they have to tell me about my search I clicked on the link to the page. First the page had nothing unusual about it, but why I get the hit?

RE: Use a different folder for schema.xml

2012-08-22 Thread Alexander Cougarman
Thanks, Lance. Please forgive my ignorance, but what do you mean by soft links/XML include feature? Can you provide an example? Thanks again. Sincerely, Alex -Original Message- From: Lance Norskog [mailto:goks...@gmail.com] Sent: 22 August 2012 9:55 AM To: solr-user@lucene.apache.org

Re: Use a different folder for schema.xml

2012-08-22 Thread Ravish Bhagdev
You can include one xml file into another, something like 1. ?xml version='1.0' encoding='utf-8'? 2. !DOCTYPE document [ !ENTITY resourcedb SYSTEM 3. 'file:/some/absolute/path/a.xml' ] 4. resource 5. childofbresourcedb;childofb 6. /resource - Ravish On Wed, Aug 22, 2012 at

Re: Solr search – Tika extracted text from PDF not return highlighting snippet

2012-08-22 Thread anarchos78
Thanks for your reply, I had tryied many things (copy field etc) with no succes. Notice that the pdfs are stored as BLOB in mysql database. I am trying to use DIH in order to fetch the binaries from DB. Is it possible? Thanks! -- View this message in context:

Weighted Search Results / Multi-Value Value's Not Aggregating Weight

2012-08-22 Thread David Radunz
Hey, I have been having some problems getting good search results when using weighting against many fields with multi-values. After quite a bit of testing it seems to me that the problem is (at least as far as my query is concerned) is that the only one weighting is taken into account

Solr - case-insensitive search do not work

2012-08-22 Thread meghana
I want to apply case-insensitive search for field *myfield* in solr. I googled a bit for that , and i found that , i need to apply *LowerCaseFilterFactory *to Field Type and field should be of solr.TextFeild. I applied that in my *schema.xml* and re-index the data, then also my search seems to

Re: Solr - case-insensitive search do not work

2012-08-22 Thread Ravish Bhagdev
filter class=solr.LowerCaseFilterFactory/ is already present in your field type definition (its twice now) Are you adding quotes around your query by any chance? Ravish On Wed, Aug 22, 2012 at 11:31 AM, meghana meghana.rav...@amultek.comwrote: I want to apply case-insensitive search for

Re: Solr - case-insensitive search do not work

2012-08-22 Thread meghana
@Ravish Bhagdev , Yes I am adding double quotes around my search , as shown in my post. Like, myfield:cloud university -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-case-insensitive-search-do-not-work-tp4002605p4002610.html Sent from the Solr - User mailing list

Re: Does DIH commit during large import?

2012-08-22 Thread Alexandre Rafalovitch
Thanks, I will look into autoCommit. I assume there are memory implications of not committing? Or is it just writing in a separate file and can theoretically do it indefinitely? Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn:

Runtime.exec() not working on Tomcat

2012-08-22 Thread 122jxgcn
I have following code on my Apache Tika Maven project. This code works when I test locally, but fails when it's attached as external jar in Apache Solr (container is Tomcat). String cmd; contains command string that will convert file with input as ./convert.bin input.custom output.xml I

Re: Runtime.exec() not working on Tomcat

2012-08-22 Thread Alexandre Rafalovitch
Could it be different 'current' working directories? What happens if you hardcode the full path into the command and input/output files? ./convert.bin - /Dev/Solr/bin/convert.bin, etc. Also, you may want to use some file system observation tools to figure out exactly what file is touched where.

Re: Solr - case-insensitive search do not work

2012-08-22 Thread Ravish Bhagdev
OK. Try without quotes like myfield:cloud+university and see if it has any effect. Also, try both queries with debugging turned on and post the output of the same ( http://wiki.apache.org/solr/CommonQueryParameters#Debugging ) It must be some field configuration issue or that double quotes are

Re: Solr - case-insensitive search do not work

2012-08-22 Thread Ravish Bhagdev
Also, try comparing your field configuration to Solrs default text field and see if you can spot any differences. Ravish On Wed, Aug 22, 2012 at 1:09 PM, Ravish Bhagdev ravish.bhag...@gmail.comwrote: OK. Try without quotes like myfield:cloud+university and see if it has any effect. Also,

Edismax parser weird behavior

2012-08-22 Thread amitesh116
Hi I am experiencing 2 strange behavior in edismax: edismax is configured to behave default OR (using mm=0) Total there are 700 results 1. Search for *auto* = *50 results* Search for *NOT auto* it gives *651 results*. Mathematically, it should give only 650 results for *NOT auto*. 2. Search

Re: Scalability of Solr Result Grouping/Field Collapsing: Millions/Billions of documents?

2012-08-22 Thread Tirthankar Chatterjee
You can collapse in each Shards as a separate query Lance Norskog goks...@gmail.com wrote: How do you separate the documents among the shards? Can you set up the shards such that one collapse group is only on a single shard? That you never have to do distributed grouping? On Tue, Aug 21,

Re: SpellCheck Component does not work for certain words

2012-08-22 Thread mechravi25
Hi, Just few things to add up, I found that when we search for less than or equal to 3 letters I'm not able to get any suggestions and also when I search for finding, I dont get any suggestions related to it even though i have search results regarding the same. But when i Search for findingg i

Re: display SOLR Query in web page

2012-08-22 Thread Michael Della Bitta
Ouch, not to mention the potential for XSS. I'll see if I can get in touch with someone. Michael Della Bitta Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017 www.appinions.com Where Influence Isn’t a Game On Wed, Aug 22, 2012 at

Re: display SOLR Query in web page

2012-08-22 Thread Michael Della Bitta
Actually, I'm having a little trouble coming up with a proof-of-concept exploit for this... it doesn't seem like Solr is exposed directly, and it does seem like it's escaping submitted content before redisplaying it on the page. I'm not crazy about leaking the raw query string into the HTML, but

Re: Solr - case-insensitive search do not work

2012-08-22 Thread meghana
Hi Ravish , the defination for text_en_splitting in solr default schema and of mine are same.. still its not working... any idea? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-case-insensitive-search-do-not-work-tp4002605p4002645.html Sent from the Solr - User

Problem to start solr-4.0.0-BETA with tomcat-6.0.20

2012-08-22 Thread Claudio Ranieri
Hi, I tried to start the solr-4.0.0-BETA with tomcat-6.0.20 but does not work. I copied the apache-solr-4.0.0-BETA.war to $TOMCAT_HOME/webapps. Then I copied the directory apache-solr-4.0.0-BETA\example\solr to C:\home\solr-4.0-beta and adjusted the file

Re: Solr - case-insensitive search do not work

2012-08-22 Thread Ravish Bhagdev
Did you see my message about debugging parameters? Try that and see what's happening behind the scenes. I can confirm that by default the queries are NOT case sensitive. Ravish On Wed, Aug 22, 2012 at 2:45 PM, meghana meghana.rav...@amultek.com wrote: Hi Ravish , the defination for

Re: display SOLR Query in web page

2012-08-22 Thread Bernd Fehling
I haven't spent time in trying anything, just entered a query and recognized that it showed up in the page source view. If they really escape everything it is not that dangerous? Actually I don't want to try anything with their page, they might not have any humor ;-) Bernd Am 22.08.2012 15:41,

search is slow for URL fields of type String.

2012-08-22 Thread srinalluri
This is string fieldType: fieldType name=string class=solr.StrField sortMissingLast=true / These are the filelds using 'string' fieldType: field name=image_url type=string indexed=true stored=true multiValued=true / field name=url type=string indexed=true stored=true multiValued=true /

Solr memory: CATALINA_OPTS in setenv.sh ?

2012-08-22 Thread Bruno Mannina
Dear users, I try to know if my add in the setenv.sh (which I need to create because it didn't exist) file has been set but when I click on the link Java Properties on Admin Solr web page I can't see the variable CATALINA_OPTS. In fact, I would like to know if my line added in the file

Re: display SOLR Query in web page

2012-08-22 Thread Michael Della Bitta
It's not great to leak internal implementation details of your application out like this, and it may be that someone more skilled at exploiting things like this could find one. Michael Della Bitta Appinions | 18 East 41st St., Suite 1806 | New

Re: Solr Score threshold 'reasonably', independent of results returned

2012-08-22 Thread Mou
Hi, I think that this totally depends on your requirements and thus applicable for a user scenario. Score does not have any absolute meaning, it is always relative to the query. If you want to watch some particular queries and want to show results with score above previously set threshold, you can

RE: Co-existing solr cloud installations

2012-08-22 Thread Buttler, David
This is really nice. Thanks for pointing it out. Dave -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Tuesday, August 21, 2012 8:23 PM To: solr-user@lucene.apache.org Subject: Re: Co-existing solr cloud installations You can use a connect string of

Re: Scalability of Solr Result Grouping/Field Collapsing: Millions/Billions of documents?

2012-08-22 Thread Tom Burton-West
Hi Lance, I don't understand enough of how the field collapsing is implemented, but I thought it worked with distributed search. Are you saying it only works if everything that needs collapsing is on the same shard? Tom On Wed, Aug 22, 2012 at 2:41 AM, Lance Norskog goks...@gmail.com wrote:

Re: Scalability of Solr Result Grouping/Field Collapsing: Millions/Billions of documents?

2012-08-22 Thread Tom Burton-West
Hi Tirthankar, Can you give me a quick summary of what won't work and why? I couldn't figure it out from looking at your thread. You seem to have a different issue, but maybe I'm missing something here. Tom On Tue, Aug 21, 2012 at 7:10 PM, Tirthankar Chatterjee tchatter...@commvault.com

Re: Solr memory: CATALINA_OPTS in setenv.sh ?

2012-08-22 Thread Bruno Mannina
Le 22/08/2012 16:57, Bruno Mannina a écrit : Dear users, I try to know if my add in the setenv.sh (which I need to create because it didn't exist) file has been set but when I click on the link Java Properties on Admin Solr web page I can't see the variable CATALINA_OPTS. In fact, I would

Re: Solr Score threshold 'reasonably', independent of results returned

2012-08-22 Thread Ravish Bhagdev
Commercial solutions often have %age that is meant to signify the quality of match. Solr has relative score and you cannot tell by just looking at this value if a result is relevant enough to be in first page or not. Score depends on what else is in the index so not easy to normalize in the way

Query-side Join work in distributed Solr?

2012-08-22 Thread Timothy Potter
Just to clarify that query-side joins ( e.g. {!join from=id to=parent_signal_id_s}id:foo ) do not work in a distributed mode yet? I saw LUCENE-3759 as unresolved but also some some Twitter traffic saying there was a patch available. Cheers, Tim

Re: Solr memory: CATALINA_OPTS in setenv.sh ?

2012-08-22 Thread Michael Della Bitta
Check your cores' status page and see if you're running the MMapDirectory (you probably are.) In that case, you probably want to devote even less RAM to Tomcat's heap because the index files are being read out of memory-mapped pages that don't reside on the heap, so you'd be devoting more memory

Re: Solr 3.6.1: query performance is slow when asterisk is in the query

2012-08-22 Thread david3s
Hello Chris, thanks a lot for your reply. But is there an alternative solution? Because I see adding has_body as data duplication. Imagine in that in a Relational DB you had to create extra columns because you can't do something like where body is not null If there's no other alternative I'll

Re: Solr 3.6.1: query performance is slow when asterisk is in the query

2012-08-22 Thread Michael Della Bitta
The name of the game for performance and functionality in Solr quite often *denormalization*, which might run against your RDBMS instincts, but once you embrace it, you'll find that things go a lot more smoothly. Michael Della Bitta Appinions | 18

Index version generation for Solr 3.5

2012-08-22 Thread Xin Li
Hi, I ran into an issue lately with Index version generation for Solr 3.5. In Solr 1.4., the index version of slave service increments upon each replication. However, I noticed it's not the case for Solr 3.5; the index version would increase 20, or 30 after replication. Does anyone know why and

Re: Solr 3.6.1: query performance is slow when asterisk is in the query

2012-08-22 Thread Jack Krupansky
You could also add a bodySize numeric (trie) field, which you can check for 0 for empty/missing bodies. And don't forget to check and see whether the [* TO *] range query might be faster. -- Jack Krupansky -Original Message- From: david3s Sent: Wednesday, August 22, 2012 12:37 PM

Re: Edismax parser weird behavior

2012-08-22 Thread Jack Krupansky
Don't have an immediate answer for you on #1, but for #2, mm does not override explicit operators - and - it only applies to terms that are not the immediate operand of an explicit operator. Note that by default lower-case operators are enabled in edismax - and is treated as AND - you can set

Re: Solr 3.6.1: query performance is slow when asterisk is in the query

2012-08-22 Thread david3s
Jack, sorry to forgot to answer you, we tried [* TO *] and the response times are the same as doing plain * -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-3-6-1-query-performance-is-slow-when-asterisk-is-in-the-query-tp4002496p4002708.html Sent from the Solr - User

Re: Which directories are required in Solr?

2012-08-22 Thread Erick Erickson
Why do you care? I suspect that the example directory can be removed assuming you're distributing the war file. But disk space is really cheap, I suspect that tidying up the directories for aesthetic reasons isn't worth the risk of removing something that you might need later... Best Erick On

Re: Which directories are required in Solr?

2012-08-22 Thread Geek Gamer
Hi, checkout : https://github.com/geek4377/jetty-solr you can remove exampledocs from the list to get only the required dirs for running solr. On Wed, Aug 22, 2012 at 1:02 PM, Alexander Cougarman acoug...@bwc.orgwrote: Hi. Which folders/files can be deleted from the default Solr package

Re: Solr Custom Filter Factory - How to pass parameters?

2012-08-22 Thread Erick Erickson
I'm reaching a bit here, haven't implemented one myself, but... It seems like you're just dealing with some shared memory. So say your filter recorded all the stuff you want to put into the DB. When you put stuff in to the shared memory, you probably have to figure out when you should commit the

Re: Scalability of Solr Result Grouping/Field Collapsing: Millions/Billions of documents?

2012-08-22 Thread Tom Burton-West
Hi Lance and Tirthankar, We are currently using Solr 3.6. I tried a search across our current 12 shards grouping by book id (record_no in our schema) and it seems to work fine (the query with the actual urls for the shards changed is appended below.) I then searched for the record_no of the

Solr 4.0 Beta missing example/conf files?

2012-08-22 Thread Tom Burton-West
Hello, Usually in the example/solr file in Solr distributions there is a populated conf file. However in the distribution I downloaded of solr 4.0.0-BETA, there is no /conf directory. Has this been moved somewhere? Tom ls -l apache-solr-4.0.0-BETA/example/solr total 107 drwxr-sr-x 2 tburtonw

RE: Solr 4.0 Beta missing example/conf files?

2012-08-22 Thread Markus Jelsma
Hi - The example has been moved to collection1/ -Original message- From:Tom Burton-West tburt...@umich.edu Sent: Wed 22-Aug-2012 20:59 To: solr-user@lucene.apache.org Subject: Solr 4.0 Beta missing example/conf files? Hello, Usually in the example/solr file in Solr

Cloud assigning incorrect port to shards

2012-08-22 Thread Buttler, David
Hi, I have set up a Solr 4 beta cloud cluster. I have uploaded a config directory, and linked it with a configuration name. I have started two solr on two computers and added a couple of shards using the Core Admin function on the admin page. When I go to the admin cloud view, the shards all

Full Text Indexing for DOCX files

2012-08-22 Thread Nguyen, Vincent (CDC/OD/OADS) (CTR)
Has anyone been able to index DOCX files? I get this error message when using office 2007 documents (Location of error unknown)org.apache.poi.poifs.filesystem.OfficeXmlFileException: The supplied data appears to be in the Office 2007+ XML. POI only supports OLE2 Office documents We're

Re: Does DIH commit during large import?

2012-08-22 Thread Erick Erickson
solrconfig.xml has a setting ramBufferSizeMB that can be set to limit the memory consumed during indexing. When this limit is reached, the buffers are flushed to the current segment. NOTE: the segment is NOT closed, there is no implied commit here, and the data will not be searchable until a

Re: Full Text Indexing for DOCX files

2012-08-22 Thread Jack Krupansky
I've indexed Office 2007 .docx using Solr 3.6. It sounds as if Solr 1.3 had an old release of Tika/POI. No big surprise there. -- Jack Krupansky -Original Message- From: Nguyen, Vincent (CDC/OD/OADS) (CTR) Sent: Wednesday, August 22, 2012 3:57 PM To: solr-user@lucene.apache.org

RE: Full Text Indexing for DOCX files

2012-08-22 Thread Nguyen, Vincent (CDC/OD/OADS) (CTR)
Thanks Jack, I'll give that version of SOLR a try. Vincent Vu Nguyen Web Applications Developer Division of Science Quality and Translation Office of the Associate Director for Science Centers for Disease Control and Prevention (CDC) 404-498-0384 v...@cdc.gov Century Bldg 2400 Atlanta, GA 30329

Re: Solr 4.0 Beta missing example/conf files?

2012-08-22 Thread Tom Burton-West
Thanks Markus! Should the README.txt file in solr/example be updated to reflect this? Is that something I need to enter a JIRA issue for? Tom On Wed, Aug 22, 2012 at 3:12 PM, Markus Jelsma markus.jel...@openindex.iowrote: Hi - The example has been moved to collection1/ -Original

Re: Solr 3.6.1: query performance is slow when asterisk is in the query

2012-08-22 Thread david3s
Ok, I'll take your suggestion, but I would still be really happy if the wildcard searches behaved a little more intelligent (body:* not looking for everything in the body). More like when you do q=*:* it doesn't really search for everything in every field. Thanks -- View this message in

RE: Solr 4.0 Beta missing example/conf files?

2012-08-22 Thread Markus Jelsma
Hi, I would think so. Perhaps something for: https://issues.apache.org/jira/browse/SOLR-3288 -Original message- From:Tom Burton-West tburt...@umich.edu Sent: Wed 22-Aug-2012 22:35 To: solr-user@lucene.apache.org Subject: Re: Solr 4.0 Beta missing example/conf files? Thanks

Re: Scalability of Solr Result Grouping/Field Collapsing: Millions/Billions of documents?

2012-08-22 Thread Tom Burton-West
Thanks Tirthankar, So the issue in memory use for sorting. I'm not sure I understand how sorting of grouping fields is involved with the defaults and field collapsing, since the default sorts by relevance not grouping field. On the other hand I don't know much about how field collapsing is

Re: Solr 4.0 Beta missing example/conf files?

2012-08-22 Thread Mark Miller
Yeah - we want fix that for sure. Sent from my iPhone On Aug 22, 2012, at 6:34 PM, Markus Jelsma markus.jel...@openindex.io wrote: Hi, I would think so. Perhaps something for: https://issues.apache.org/jira/browse/SOLR-3288 -Original message- From:Tom Burton-West

Re: Cloud assigning incorrect port to shards

2012-08-22 Thread Mark Miller
What container are you using? Sent from my iPhone On Aug 22, 2012, at 3:14 PM, Buttler, David buttl...@llnl.gov wrote: Hi, I have set up a Solr 4 beta cloud cluster. I have uploaded a config directory, and linked it with a configuration name. I have started two solr on two computers and

Re: Solr Custom Filter Factory - How to pass parameters?

2012-08-22 Thread ksu wildcats
Thanks Erick. I tried to do it all at the filter but the problem i am running into doing it at the filter is intercepting the final commit calls or in other words I am unable to figure out when the final commit should happen such that I don't miss out any data. One option I tried is to increase

Re: Scalability of Solr Result Grouping/Field Collapsing: Millions/Billions of documents?

2012-08-22 Thread Lance Norskog
Yes, distributed grouping works, but grouping takes a lot of resources. If you can avoid in distributed mode, so much the better. On Wed, Aug 22, 2012 at 3:35 PM, Tom Burton-West tburt...@umich.edu wrote: Thanks Tirthankar, So the issue in memory use for sorting. I'm not sure I understand how

Re: Solr - Index Concurrency - Is it possible to have multiple threads write to same index?

2012-08-22 Thread ksu wildcats
Thanks for the reply Mikhail. For our needs the speed is more important than flexibility and we have huge text files (ex: blogs / articles ~2 MB size) that needs to be read from our filesystem and then store into the index. We have our app creating separate core per client (dynamically) and

Re: Weighted Search Results / Multi-Value Value's Not Aggregating Weight

2012-08-22 Thread David Radunz
Hey, Please disregard this, I worked out what the actual problem was. I am going to post another query with something else I discovered. Thanks :) David On 22/08/2012 7:24 PM, David Radunz wrote: Hey, I have been having some problems getting good search results when using