CachedSqlEntityProcessor And Delta Imports

2010-02-26 Thread KirstyS
Hi, I am on the 1.4 Nightly build from September (still need to upgrade). I am using CachedSqlEntityProcessor for my main queries but was hoping to use it for my delta-imports as well. Is this possible? I have a main entity called 'Article' and then entity name=body pk=CmsArticleBodyPageId

Spellcheck in mulitlanguage

2010-02-26 Thread Sudhakar_Thangavel
Hi I need Spell check suggestions for user queries (In Italian).how can i get this...in my solr -- View this message in context: http://old.nabble.com/Spellcheck-in-mulitlanguage-tp27717787p27717787.html Sent from the Solr - User mailing list archive at Nabble.com.

RE: Solr Cell RTF Woes

2010-02-26 Thread David.Dankwerth
Are you running on a Linux/Unix box that has no X ... Did you try with headless options ? http://java.sun.com/developer/technicalArticles/J2SE/Desktop/headless/ Tika's RTF is using Swing and AWT to analyze the rtf, these in turn will attempt to use Graphics libraries, unless you use headless.

Content Extraction

2010-02-26 Thread Lee Smith
Hey All Hope someone can advise. I followed the example in the wiki on how to extract a html page i.e curl 'http://localhost:8983/solr/update/extract?literal.id=doc1uprefix=attr_fmap.content=attr_contentcommit=true' -F myfi...@tutorial.html And it displayed a html page but with a 404 and did

Re: Index size

2010-02-26 Thread Jean-Sebastien Vachon
Hi, All the document can be up to 10K. Most if it comes from a single field which is both indexed and stored. The data is uncompressed because it would eat up to much CPU considering the volume we have. We have around 30 fields in all. We also need to compute some facets as well as collapse

Auto suggestion

2010-02-26 Thread Suram
Hi AutoSuggestion not found for newly indexed data ,how can i configure that anyone help me Thans in advance -- View this message in context: http://old.nabble.com/Auto-suggestion-tp27718858p27718858.html Sent from the Solr - User mailing list archive at Nabble.com.

Highest frequency

2010-02-26 Thread pcmanprogrammeur
Hello all (sorry if my english is bad, i'm french) ! I have a Solr Index with ads which contain a title and a description ! For exemple : add 1 : title = test / description = [empty] add 2 : title = test on test / description = this is a test And now, if I execute the request test in

Re: new/first searcher

2010-02-26 Thread Marc Sturlese
There's no problem about having the same warming in both cases. First queries are use to warm the index once you start the solr instance. New queries warm the index once a commit in executed, for example. In first queries warming there was no previous IndexSearcher opened. In new queries there

Re: Highest frequency

2010-02-26 Thread Marc Sturlese
As far as I know it's not suported by default. I thing you should implement your custom Lucene Similarity class and plug it into Solr via solrconfig.xml pcmanprogrammeur wrote: Hello all (sorry if my english is bad, i'm french) ! I have a Solr Index with ads which contain a title and a

Re: Content Extraction

2010-02-26 Thread Erick Erickson
You really have to provide more details of a what you did. b what the results were. Have you looked at you r index with the admin page and/or Luke? Have you tried querying in the admin page? Have you examined the logs to see what they report? Best Erick On Fri, Feb 26, 2010 at 7:54 AM, Lee

Re: Auto suggestion

2010-02-26 Thread Erick Erickson
Have you reopened the index after you added the data? Erick On Fri, Feb 26, 2010 at 9:23 AM, Suram reactive...@yahoo.com wrote: Hi AutoSuggestion not found for newly indexed data ,how can i configure that anyone help me Thans in advance -- View this message in context:

SEVERE: java.lang.NumberFormatException: For input string: 104708

2010-02-26 Thread Pablo Mercado
Hello, Solr is raising the following exception when processing queries that sort on integer attribute. The same queries and sorts have been running fine in production for almost a year now. If I run the query without the sort on the integer attribute, the query runs fine. If I run a query

replication. when the slave goes down...

2010-02-26 Thread Matthieu Labour
Hi I have 2 solr machine. 1 master, 1 slave replicating the index from the master The machine on which the slave is running went down while the replication was running I suppose the index must be corrupted. Can I safely remove the index on the slave and restart the slave and the slave will start

Re: Solr Cell RTF Woes

2010-02-26 Thread Bill Engle
Thanks. Headless put me in the right direction. I am running on a headless Mac OSX 10.6 Server. I added the below to my {CATALINA_HOME}/bin/setenv.sh file and now I am indexing RTF. export JAVA_OPTS=-d64 -server -Xmx1024m -XX:MaxPermSize=512m -Djava.awt.headless=true

Re: SEVERE: java.lang.NumberFormatException: For input string: 104708

2010-02-26 Thread Yonik Seeley
One of your field values isn't a valid integer, it's 104708 You're probably using the straight integer type in 1.3, which is meant for back compat with existing lucene indexes and currently doesn't do validation on it's input. For Solr 1.4, int is a new field type (example schema maps it to

Re: Highest frequency

2010-02-26 Thread Erick Erickson
The underlying Lucene automatically takes this into account.the term frequency in relation to the length of the field rather than just a term count. So in your example doc 1 has a complete field match on title, so it scores higher. Also, depending upon how you set things up you may not be

Re: Content Extraction

2010-02-26 Thread Lee Smith
Hi Erik I did a post with more details yesterday with no response. I have a screen shot of what it does: http://screencast.com/t/MGRiZTU5M After running it I have done a query with 0 results and have checked to see how many docs are indexed with 0 being the value. Hope you can shed some more

Re: SEVERE: java.lang.NumberFormatException: For input string: 104708

2010-02-26 Thread Pablo Mercado
Thank you for taking the time to look at my issue and respond. Do you have any suggestions for purging the document with this field from the index? Would that even help? I do not know which document has the corrupt value, and searching for the document with something like pk_i:104708 does

Re: Solr Cell and Deduplication - Get ID of doc

2010-02-26 Thread Bill Engle
Any thoughts on this? I would like to get the id back in the request after indexing. My initial thoughts were to do a search to get the docid based on the attr_stream_name after indexing but now that I reread my message I mentioned the attr_stream_name (file_name) may be different so that is

Re: SEVERE: java.lang.NumberFormatException: For input string: 104708

2010-02-26 Thread Mark Miller
You have to find the document with the bad value somehow. In the past I have used Luke to help with this. Then you need to delete the document. Finally, you have to get the deleted document out of the index through a merge (else the bad term will still be loaded by the FieldCache) - easiest

Re: Changing term frequency according to value of one of the fields

2010-02-26 Thread Joe Calderon
extend the similarity class, compile it against the jars in lib, put in a path solr can find and set your schema to use it http://wiki.apache.org/solr/SolrPlugins#Similarity On 02/25/2010 10:09 PM, Pooja Verlani wrote: Hi, I want to modify Similarity class for my app like the following- Right

Solrsharp

2010-02-26 Thread Frederico Azeiteiro
Hi, I don't know if this list includes this kind of help, but I'm using Solrsharp with C# to operate SOLR. Please advise if this is off-topic please. I'm having a little trouble to make a search with exclude terms using the query parameters. Does anyone uses Solrsharp around here? Do

Re: SEVERE: java.lang.NumberFormatException: For input string: 104708

2010-02-26 Thread Yonik Seeley
On Fri, Feb 26, 2010 at 10:59 AM, Mark Miller markrmil...@gmail.com wrote: You have to find the document with the bad value somehow. In the past I have used Luke to help with this. Then you need to delete the document. You can also find the document with a raw term query. q={!raw

Re: Free Webinar: Mastering Solr 1.4 with Yonik Seeley

2010-02-26 Thread Jay Hill
Yes, it will be recorded and available to view after the presentation. -Jay On Thu, Feb 25, 2010 at 2:19 PM, Bernadette Houghton bernadette.hough...@deakin.edu.au wrote: Yonk, can you please advise whether this event will be recorded and available for later download? (It starts 5am our time

Re: Solr 1.4 distributed search configuration

2010-02-26 Thread Jeffrey Zhao
Now I got it, just forgot put qt=search in query. By the way, in solr 1.3, I used shards.txt under conf directory and distributed=true in query for distributed search. In that way,in my java application, I can hard code solr query with distributed=true and control the using of distributed

RE: Extended stats via JMX

2010-02-26 Thread Dan Trainor
-Original Message- From: Matthew Runo [mailto:mr...@zappos.com] Sent: Thursday, February 25, 2010 12:18 PM To: solr-user@lucene.apache.org Subject: Re: Extended stats via JMX https://issues.apache.org/jira/browse/SOLR-1750 might help you, since I don't think that all of stats.jsp is

Re: Solr 1.4 distributed search configuration

2010-02-26 Thread Joe Calderon
you can set a default shard parameter on the request handler doing distributed search, you can set up two different request handlers one with shards default and one without On Thu, Feb 25, 2010 at 1:35 PM, Jeffrey Zhao jeffrey.z...@metalogic-inc.com wrote: Now I got it, just forgot put qt=search

Question on Facets and Multiple values (confusion from the Wiki)

2010-02-26 Thread Mark Bennett
Certainly lots of matches on Solr and facets. Contrived example: * Solr 1.4, etc. * Yellow pages, business listings. * Business listings have a zip code that I will use in Faceted search. * Companies with multiple stores/outlets/offices still only have one record, but all applicable zip codes are

replication issue

2010-02-26 Thread Matthieu Labour
Hi I am still having issues with the replication and wonder if things are working properly So I have 1 master and 1 slave On the slave, I deleted the data/index directory and data/replication.properties file and restarted solr. When slave is pulling data from master, I can see that the size

ConcurrentModificationException

2010-02-26 Thread Dan Hertz (Insight 49, LLC)
Hi guys, SOLR 1.4 (final) and 1.5 nightly work fine on a Windows box, but on our Centos 5 box, we're getting a ConcurrentModificationException when starting Tomcat 6. Any tips on how to solve this and/or troubleshoot? Made sure there are no duplicate libs in Tomcat and solr/lib, and tried

Re: ConcurrentModificationException

2010-02-26 Thread Yonik Seeley
Could you open a JIRA issue for this? After a quick look, it could be firstSearcher / newSearcher events that are being executed concurrently that change the list? Could you try commenting out firstSearcher/newSearcher events in solrconfig.xml and see if that fixes it? It could be that a lazy

Re: ConcurrentModificationException

2010-02-26 Thread Yonik Seeley
Yep, definitely a bug. It looks like resourceLoader.newInstance() is fundamentally not thread safe. -Yonik http://www.lucidimagination.com On Fri, Feb 26, 2010 at 2:48 PM, Yonik Seeley yo...@lucidimagination.com wrote: Could you open a JIRA issue for this? After a quick look, it could be

Re: replication issue

2010-02-26 Thread Shalin Shekhar Mangar
On Sat, Feb 27, 2010 at 12:13 AM, Matthieu Labour matthieu_lab...@yahoo.com wrote: Hi I am still having issues with the replication and wonder if things are working properly So I have 1 master and 1 slave On the slave, I deleted the data/index directory and data/replication.properties

Re: HTTP ERROR: 404 missing core name in path after integrating nutch

2010-02-26 Thread Ian Evans
Just wanted to give an update on my efforts. I installed the Feb. 26 update this morning. Was able to access /solr/admin. Copied over the nutch schema.xml. restarted solr and was able to access /solr/admin Edited solrconfig.xml to add the nutch requesthandler snippet from lucidimagination.

Re: replication issue

2010-02-26 Thread Matthieu Labour
Shalin Thank you so much for your answer This is the case here How can I find out which temp directory Solr replication is using? Do you have a way to set up the source (temp directory? used by solr) and target directory via solr config file so that they live on the same partition ? Thank you

Re: SEVERE: java.lang.NumberFormatException: For input string: 104708

2010-02-26 Thread Pablo Mercado
A big thanks to Yonik and Mark. Using the raw term query I was able to find the range(!) of documents that had bad integer field values. Deleting those documents, committing and optimizing cleared up the issue. Still not sure how the bad values were inserted in the first place, but that is

Re: ConcurrentModificationException

2010-02-26 Thread Dan Hertz (Insight 49, LLC)
On 2010-02-26 12:55 PM, Yonik Seeley wrote: Yep, definitely a bug. It looks like resourceLoader.newInstance() is fundamentally not thread safe. -Yonik On Fri, Feb 26, 2010 at 2:48 PM, Yonik Seeley yo...@lucidimagination.com wrote: Could you open a JIRA issue for this? Yonik, Do you

Re: Question on Facets and Multiple values (confusion from the Wiki)

2010-02-26 Thread Jan Høydahl / Cominvent
Hi Mark, If (a) is wanted behaviour, i.e. have a business show up in facets for all ZIPs, you should define a multi-valued ZIP field. Since a ZIP is a number, I don't see any reason for any analysis on it, a String or a lightly normalized field type would do the job both for search and facets.

Re: If you could have one feature in Solr...

2010-02-26 Thread Don Werve
Realtime search, hands down.

Re: If you could have one feature in Solr...

2010-02-26 Thread Stephen Weiss
+1 I have several projects backburnered in the hope realtime search will come to solr soon... [m] On Feb 26, 2010, at 8:37 PM, Don Werve d...@madwombat.com wrote: Realtime search, hands down.

RE: If you could have one feature in Solr...

2010-02-26 Thread Stuart Yeates
The indexer looking for an xml:lang attribute on text fields and using the value to pick, tokeniser, dictionaries, etc, etc automatically (and knowing to look for them in the standard places). cheers stuart

Re: Solr Cell and Deduplication - Get ID of doc

2010-02-26 Thread Lance Norskog
You could create your own unique ID and pass it in with the literal.field=value feature. http://wiki.apache.org/solr/ExtractingRequestHandler#Input_Parameters On Fri, Feb 26, 2010 at 7:56 AM, Bill Engle billengle...@gmail.com wrote: Any thoughts on this? I would like to get the id back in the

Re: If you could have one feature in Solr...

2010-02-26 Thread Dave Searle
To have a coffee waiting for me every morning when I wake up. Marriage material indeed.

solr for reporting purposes

2010-02-26 Thread adeelmahmood
we are trying to use solr for somewhat of a reporting system too (along with search) .. since it provides such amazing control over queries and basically over the data that user wants .. they might as well be able to dump that data in an excel file too if needed .. our data isnt too much close to

Re: solr for reporting purposes

2010-02-26 Thread adeelmahmood
I just want to clarify if its not obvious .. that the reason I am concerned about the performance of solr is becaues for reporting requests I will probably have to request all result rows at the same time .. instead of 10 or 20 adeelmahmood wrote: we are trying to use solr for somewhat of a

indexing using inbuilt lucene in Solr

2010-02-26 Thread mamathahl
Hi I'm new to Solr. I have a database which consists of latitude, longitude and relevant news. This file has been imported using dataimport. I think it has been indexed successfully by Solr. Now I have to move ahead and give few queries as mentioned below. # hsin (great circle):