AW: Will Solr fit our needs?

2010-03-18 Thread Moritz Maedler
Hi guys! Thanks alot for your suggestions and help - I really appreciate that! As we need e.g. the the price for sorting I think it must be in the index? Thus, I'm not shure that a key-value-store is the thing we are looking for as we need a good searchengine. Currently we are using serveral

Re: Will Solr fit our needs?

2010-03-18 Thread Lukáš Vlček
On Thu, Mar 18, 2010 at 8:45 AM, Moritz Maedler m...@moritz-maedler.dewrote: Hi guys! Thanks alot for your suggestions and help - I really appreciate that! As we need e.g. the the price for sorting I think it must be in the index? Thus, I'm not shure that a key-value-store is the thing we

HTTP Status 500 - null java.lang.IllegalArgumentException at java.nio.Buffer.limit(Buffer.java:249)

2010-03-18 Thread Marc Des Garets
Hi, I am doing a really simple query on my index (it's running in tomcat): http://host:8080/solr_er_07_09/select/?q=hash_id:123456 I am getting the following exception: HTTP Status 500 - null java.lang.IllegalArgumentException at java.nio.Buffer.limit(Buffer.java:249) at

solrj sends duplicate documents

2010-03-18 Thread Tim Terlegård
I'm using StreamingUpdateSolrServer to index a document. StreamingUpdateSolrServer server = new StreamingUpdateSolrServer(http://localhost:8983/solr/core0;, 20, 4); server.setRequestWriter(new BinaryRequestWriter()); SolrInputDocument doc = new SolrInputDocument(); doc.addField(id, 12121212);

Re: solrj sends duplicate documents

2010-03-18 Thread Erik Hatcher
The StreamingUpdateSolrServer does not support binary format, unfortunately. Erik On Mar 18, 2010, at 8:15 AM, Tim Terlegård wrote: I'm using StreamingUpdateSolrServer to index a document. StreamingUpdateSolrServer server = new

Return all Facets?

2010-03-18 Thread homerlex
I'm starting to play with Solr. I am looking at the API and see that there is an addFacetField on the SolrQuery Object that is required to specify which facet fields you want returned. Is there any way to specify that we want all facet fields with explicitly having to add them all via

Re: Term Highlighting without store text in index

2010-03-18 Thread Alexey Serba
Hey Dominique, See http://www.lucidimagination.com/search/document/5ea8054ed8348e6f/highlight_arbitrary_text#3799814845ebf002 Although it might be not good solution for huge texts, wildcard/phrase queries. http://issues.apache.org/jira/browse/SOLR-1397 On Mon, Mar 15, 2010 at 4:09 PM, dbejean

excluder filters and multivalued fields

2010-03-18 Thread Marc Sturlese
I don't think there's a way to do what has come to my mind but want to be sure. Let's say I have a doc with 2 fileds, one is multiValued doc1: name-john year-2009;year-2010;year-2011 And I query for: q=johnfq=-year:2010 Doc1 won't be in the matching results. Is there a way to make it appear

RE: XPath Processing Applied to Clob

2010-03-18 Thread Craig Christman
You could also do the xpath processing on the oracle end using the extract or extractValue functions. Here's a good reference: http://www.psoug.org/reference/xml_functions.html -Original Message- From: Neil Chaudhuri [mailto:nchaudh...@potomacfusion.com] Sent: Wednesday, March 17,

Re: solrj sends duplicate documents

2010-03-18 Thread Tim Terlegård
It would be nice if the documentation mentioned this. :) /Tim 2010/3/18 Erik Hatcher erik.hatc...@gmail.com: The StreamingUpdateSolrServer does not support binary format, unfortunately.        Erik On Mar 18, 2010, at 8:15 AM, Tim Terlegård wrote: I'm using StreamingUpdateSolrServer to

where can i get an synonym.txt and spellcheck.txt ?

2010-03-18 Thread stocki
Hello. I search an synonym and spellcheck.txt where can i find it in the laaarge internet ? or how, do you filled these two files with good names ? -- View this message in context: http://old.nabble.com/where-can-i-get-an-synonym.txt-and-spellcheck.txt---tp27946812p27946812.html Sent from

Re: Return all Facets?

2010-03-18 Thread Erik Hatcher
No, there isn't. How would one know what all the facet fields are, though? One trick, use the luke request handler to get the list of fields, then use that list to construct the facet fields request parameters. Erik On Mar 18, 2010, at 8:40 AM, homerlex wrote: I'm starting to

some snynonym clarifications

2010-03-18 Thread Mark Fletcher
Hi, Just needed some help to understand the following synonym mappings:- 1. aaa = does it mean:- if the user queries for aaa it is replaced with and documents matching are searched for or does it mean if the user queries for aaa, documents with aaa as well

Re: some snynonym clarifications

2010-03-18 Thread Mark Fletcher
Hi, Thanks for the mail. I had tried the WIKI. My doubts remaining were mainly:- 1. If we have synonyms specified and they replace your search keyword with the ones specified wouldn't we face a risk of our original keyword missed out. What i meant is if I have a keyword for search say

Recommended OS

2010-03-18 Thread blargy
Does anyone have any recommendations on which OS to use when setting up Solr search server? Any memory/disk space recommendations? Thanks -- View this message in context: http://old.nabble.com/Recommended-OS-tp27948306p27948306.html Sent from the Solr - User mailing list archive at

Opinions on Facet+Fulltext behavior?

2010-03-18 Thread Mark Bennett
Most sites allow you to search for some text, and then click on Facets (or Tags or Taxonomy branches) to drill down into your search. Most sites also show the search box in these search results, with the text previously entered, so that you can edit it and resubmit. Perhaps you want to add a

Re: Recommended OS

2010-03-18 Thread K Wong
http://wiki.apache.org/solr/FAQ#What_are_the_Requirements_for_running_a_Solr_server.3F I have Solr running on CentOS 5.4. It runs fine on the OpenJDK 1.6.0 and Tomcat 5. If I were to do it again, I'd probably just stick with Jetty. You really will need to read the docs to get the settings right

Re: where can i get an synonym.txt and spellcheck.txt ?

2010-03-18 Thread stocki
aha, okay thx. and how do you get yout spellcheck words from your productnames ? we have somtimes very looong names. how it is possible to use the spellchecker function or autosuggestion in the right way ? Erick Erickson wrote: You probably won't find a good synonyms file. The problem

Re: Recommended OS

2010-03-18 Thread Jean-Sebastien Vachon
On 2010-03-18, at 1:03 PM, K Wong wrote: http://wiki.apache.org/solr/FAQ#What_are_the_Requirements_for_running_a_Solr_server.3F I have Solr running on CentOS 5.4. It runs fine on the OpenJDK 1.6.0 and Tomcat 5. If I were to do it again, I'd probably just stick with Jetty. Would you mind

Re: Recommended OS

2010-03-18 Thread blargy
Beat me to the punch with that question. KWong, did you happen to install the Apache APR? Wondering if it is even worth the trouble. I am thinking about going with RedHat Enterprise 5 unless anyone has any objections? Jean-Sebastien Vachon wrote: On 2010-03-18, at 1:03 PM, K Wong wrote:

Re: Recommended OS

2010-03-18 Thread K Wong
We're running Solr to provide search services to a Drupal 6 installation. The site is very low traffic (35 uniques a day) and search doesn't get used very often. I was thinking that I could get away with running it on the Jetty that comes with Solr. It's just one less thing that has to be looked

Re: Solr query parser doesn't invoke analyzer for simple term query?

2010-03-18 Thread Chris Hostetter
: It seems that Solr's query parser doesn't pass a single term query no ... the query parser always uses the analyzer for text regardless of wether it's a single term or not (it doesnt' even know if it's a single term until the Analyzer tells it) cases where the analyzer isn't used are things

Re: Solr query parser doesn't invoke analyzer for simple term query?

2010-03-18 Thread Chris Hostetter
: : Thank you, Marco. I see the debug out put that looks like: : str name=rawquerystringtitle_jpn:2001年/str : str name=querystringtitle_jpn:2001年/str : str name=parsedqueryPhraseQuery(title_jpn:2001 年)/str : str name=parsedquery_toStringtitle_jpn:2001 年/str ... : Does this mean the

Re: dynamic categorization transactional data

2010-03-18 Thread caman
1) Took care of the first one by Transformer. 2) Any input on 2 please? I need to store # of views and popularity with each document and that can change pretty often. Recommended to use database or can this be updated to SOLr directly? My issue with DB is that with every SOLR search hit, will

Re: dynamic categorization transactional data

2010-03-18 Thread Smiley, David W.
You'll probably want to influence your relevancy on this popularity number that is changing often. ExternalFileField looks like a possibility though I haven't used it. Another would be using an in-memory cache which stores all popularity numbers for any data that has its popularity updated

Re: dynamic categorization transactional data

2010-03-18 Thread caman
David, Much appreciated. This gives me enough to work with. I missed one important point. Our data changes pretty frequently which mean we may be running deltas every 5-10 minutes. in-memory should work thanks David Smiley @MITRE.org wrote: You'll probably want to influence your

Re: Return all Facets?

2010-03-18 Thread homerlex
Thanks for the reply. Can someone point me to a sample on how to use the luke request handler to get this info? Erik Hatcher-4 wrote: No, there isn't. How would one know what all the facet fields are, though? One trick, use the luke request handler to get the list of fields, then

Issue with exact matching

2010-03-18 Thread Alex Thurlow
I'm trying to give a super boost to fields that match exactly, but it doesn't appear to be working. I have this: field name=artist_tight type=string_lower indexed=true stored=true/ field name=title_tight type=string_lower indexed=true stored=true/ copyField source=title dest=title_tight/

Re: dynamic categorization transactional data

2010-03-18 Thread Grant Ingersoll
On Mar 18, 2010, at 2:44 PM, caman wrote: 1) Took care of the first one by Transformer. This is often also something done by a classifier that is trained to deal with all the statistical variations in your text. Tools like Weka, Mahout, OpenNLP, etc. can be applied here. 2) Any input on

Re: [ANN] Zoie Solr Plugin - Zoie Solr Plugin enables real-time update functionality for Apache Solr 1.4+

2010-03-18 Thread brad anderson
Tried following their tutorial for plugging zoie into solr: http://snaprojects.jira.com/wiki/display/ZOIE/Zoie+Server It appears it only allows you to search on documents after you do a commit? Am I missing something here, or does plugin not doing anything. Their tutorial tells you to do a

trimfilterfactory on string fieldtype?

2010-03-18 Thread Tommy Chheng
Can the trim filter factory work on string fieldtypes? When I define a trim filter factory on a string fieldtype, i get an exception: org.apache.solr.common.SolrException: Unknown fieldtype 'string' specified on field id at

good spell dictionary

2010-03-18 Thread michaelnazaruk
Can anyone tell me, where I can buy or download free spell dictionary for solr? I need not simple dictionary! I need very good spell american-english dictionary(or only american)! -- View this message in context: http://old.nabble.com/good-spell-dictionary-tp27950854p27950854.html Sent from

DIH questions

2010-03-18 Thread Shawn Heisey
Below is my data-config.xml file, which I am using to build an index for my first shard. I have a couple of questions. Can Solr include the hostname (short version) it's running on in the query? Alternatively, is there a way to override the query with a URL parameter before or when doing

Re: DIH questions

2010-03-18 Thread Lukas Kahwe Smith
On 18.03.2010, at 23:12, Shawn Heisey wrote: Below is my data-config.xml file, which I am using to build an index for my first shard. I have a couple of questions. Can Solr include the hostname (short version) it's running on in the query? Alternatively, is there a way to override the

Re: Boundary match as part of query language?

2010-03-18 Thread Chris Hostetter
: Now, I know how to work-around this, by appending some unique character : sequence at each end of the field and then include this in my search in : the front end. However, I wonder if any of you have been planning a : patch to add a native boundary match feature to Solr that would :

Re: Facet pagination

2010-03-18 Thread Chris Hostetter
: Is there a way to get *total count of facets* per field? sorry, no. you can skip ahead, but the only way to know when you're done is when you stop getting constraints back for that field. -Hoss

Re: Generating a sitemap

2010-03-18 Thread Chris Hostetter
: Been testing nutch to crawl for solr and I was wondering if anyone had : already worked on a system for getting the urls out of solr and generating : an XML sitemap for Google. it's pretty easy to just paginate through all docs in solr, so you could do that -- but I'd be really suprised if

Re: Multi valued fields

2010-03-18 Thread Chris Hostetter
: Can I build a query such as : : : -field: A : : which will return all documents that do not have exclusive A in the : their field's values. By exclusive I mean that I don't want documents : that only have A in their list of values. In my sample case, the query : would return doc A

Re: Filtering search results

2010-03-18 Thread Chris Hostetter
: For example, in dice.com, the visitor can search by some keyword and filter : further by Skill, Country, Province, City, Telecommute, Travel Required : (shown on the left pane on dice.com). We were wondering if there is some : built-in feature/functionality that can be used from Solr to

Re: Issue with exact matching

2010-03-18 Thread Erick Erickson
I only have time for a quick glance, but what jumps out is that this part: title:rude boy^100 probably isn't matching boy against your title field, it's matching rude against title, but boy against your default field and boosting the boy part. Try parenthesizing (at least that works in

Re: DIH questions

2010-03-18 Thread Shawn Heisey
That looks very useful. So does this mean that this will work? URL text: ?command=full-importnumShards=6modValue=0minDid=229615984 XML: query=SELECT * FROM [table] WHERE (did % ${dataimporter.request.numShards}) = ${dataimporter.request.modValue} AND ${dataimporter.request.minDid} = did

stream.url Contention

2010-03-18 Thread Giovanni Fernandez-Kincade
I recently switched from posting a file (PDFs in this case) to the Extract handler, to using the Stream.URL parameter. I've noticed a huge amount of contention around opening URL connections: http-8080-Processor36 [BLOCKED] CPU time: 0:47 sun.net.www.protocol.file.Handler.openConnection(URL)

Re: good spell dictionary

2010-03-18 Thread Erick Erickson
Spellcheck is generally more useful it it's derived from words already *in* your index. It's of little use to a user to have spellcheck/autosuggest show terms that aren't in the index... HTH Erick On Thu, Mar 18, 2010 at 6:00 PM, michaelnazaruk michaelnaza...@gmail.comwrote: Can anyone tell

Re: Generating a sitemap

2010-03-18 Thread Jon Baer
It's also possible to try and use the Velocity contrib response writer and paging it w/ the sitemap elements. BTW generating a sitemap was a big reason of a switch we did from GSA to Solr because (for some reason) the map took way too long to generate (even simple requests). If you page

Re: DIH questions

2010-03-18 Thread Shawn Heisey
I gave this config idea a try, looks like it works perfectly. I thought at first that it wasn't working, but as is usual with such things, my XML was faulty. Many many thanks! Shawn On 3/18/2010 5:19 PM, Shawn Heisey wrote: That looks very useful. So does this mean that this will work?

[POLL] Users of abortOnConfigurationError ?

2010-03-18 Thread Chris Hostetter
Due to some issues with the (lack of) functionality behind the abortOnConfigurationError option in solrconfig.xml, I'd like to take a quick poll of the solr-user community... * If you have never heard of the abortOnConfigurationError option prior to this message, please ignore this

Re: Return all Facets?

2010-03-18 Thread Erik Hatcher
David - sounds kinda like this one: http://issues.apache.org/jira/browse/SOLR-1280 :) Maybe you'd be up for rounding this issue out with your enhancements and get this committable? Erik On Mar 18, 2010, at 4:06 PM, Smiley, David W. wrote: Coincidentally I'm working on something

Re: [ANN] Zoie Solr Plugin - Zoie Solr Plugin enables real-time update functionality for Apache Solr 1.4+

2010-03-18 Thread Erik Hatcher
When I don't do the commit, I cannot search the documents I've indexed. - that's exactly how Solr without Zoie works, and it's how Lucene itself works. Gotta commit to see the documents indexed. Erik On Mar 18, 2010, at 5:41 PM, brad anderson wrote: Tried following their

How many facet values are too many?

2010-03-18 Thread Andy
My understanding is that too many facet values will decrease performance How many is too many? Are there any rules of thumb for this? 2 related questions: - I expect a facet field to have many values (values are user generated), any thing I can do to minimize the performance impact? - Any way

Re: Weired behaviour for certain search terms

2010-03-18 Thread Akash Sahu
I tired adding hl.maxAnalyzedChars=-1 to my search query but it didnt helped. Just wanted to know if there are limitations on the certain search terms. Its bit strange that solr is not behaving properly for certain terms (especially returning the excerpts in highlighting dictionary). The terms

Re: PDFBox/Tika Performance Issues

2010-03-18 Thread Mattmann, Chris A (388J)
Hi Giovanni, Let's try and isolate the problem. Can you try parsing the PDF file with tika-app as a standalone? Take your tika-app jar file then run java -jar tika-app-0.7-SNAPSHOT.jar -m /path/to/pdf/file That should give you something like: Content-Type: application/pdf created: Thu Sep 06