Re: Require some advice

2010-08-20 Thread Tommaso Teofili
Hi Pavan, you may want to plug UIMA as a particular UpdateRequestProcessor [1] while indexing data (I am working on such a use case). This way you could extract entities and add them either as dynamicFields or pre defined (fixed) fields. 2010/8/12 Michael Griffiths > > While there are some decen

Re: solr

2010-08-20 Thread Sumit Arora
Please follow guidelines from : http://lucene.apache.org/solr/tutorial.html /Sumit On Sat, Aug 21, 2010 at 11:25 AM, ankita shinde wrote: > Hello, > I am new to solr. > Can anyone please guide me how to install and use solr? > Reply. > -Ankita Shinde

solr

2010-08-20 Thread ankita shinde
hello, How to index data in solr? How to index text files? -Ankita Shinde

solr

2010-08-20 Thread ankita shinde
Hello, I am new to solr. Can anyone please guide me how to install and use solr? Reply. -Ankita Shinde

RE: facets - id and display value

2010-08-20 Thread Jonathan Rochkind
"A common way is to make a facet string of categoryId-2_name_imageurl. Then in your UI display the categoryId part of the facet." I've been thinking about doing something like this for the same purposes. Will having an "extra long" facet string like that have any effect on faceting performace?

Re: Fun with Spatial (Haversine formula)

2010-08-20 Thread Yonik Seeley
On Fri, Aug 20, 2010 at 11:05 PM, Lance Norskog wrote: > are latitudes equidistant on the surface of the sphere? Yes - each degree of latitude is ~69 miles. There is also a slight variation due to the earth not being a perfect sphere. -Yonik http://lucenerevolution.org Lucene/Solr Conference, B

Re: Deadlock in Server?

2010-08-20 Thread Lance Norskog
A major merge is an optimize. If you want to break up these major merges, there is an option to called 'maxSegments'. This lets you say "optimize a little". At least it makes stoppages controllable. Merge time plotted against the number of documents is a fractal sawtooth. The degree of the fracta

Re: Fun with Spatial (Haversine formula)

2010-08-20 Thread Lance Norskog
I copied a different formulation out of the Wikipedia article on Haversine. It's the same idea as in DistanceUtils, but turned inside out with cosines instead of sines. It gives exactly the same results. This is not with source data, just using round numbers in the latitude/longitude space. I do

Re: spellcheck index blown away during rebuild

2010-08-20 Thread Lance Norskog
The first question is about your use cases. How many words are in the eventual 3GB spelling index? Do you really need that many? Spell-checking is a more controllable UI if you make it from a dictionary. What you're talking about is effectively promoting the spellcheck index to a first-class Solr

Re: multiple values

2010-08-20 Thread Lance Norskog
The most basic test is a direct search against Solr and look at the XML output for the values: http://localhost:8983/solr/select?q=*:* Perhaps an xpath without the 'Author' element /PublishedArticles/Article/AuthorList will give Authors as a multivalued field. It is also possible that the DIH cre

Re: facets - id and display value

2010-08-20 Thread Lance Norskog
Sort of. A common way is to make a facet string of categoryId-2_name_imageurl. Then in your UI display the categoryId part of the facet. On Thu, Aug 19, 2010 at 12:25 PM, Satish Kumar wrote: > Hi, > > Is it possible to associate properties to a facet? For example, facet on > categoryId (1, 2, 3

Re: Confused about highlighting

2010-08-20 Thread Koji Sekiguchi
(10/08/21 9:04), Mark E. Haase wrote: I have highlighting working on my project (indexing content for a web app), but the idea of highlighting with tags doesn't make sense to me. It seems that it opens up the system to XSS attacks if you echo search result data (with highlights) into a web page

Re: SolrIndex / LuceneIndex

2010-08-20 Thread Lance Norskog
Yes, a LuceneIndex. Solr has absolutely nothing special added to the Lucene index files. On Thu, Aug 19, 2010 at 8:19 AM, stockii wrote: > > Hello. > > in > http://lucene.apache.org/solr/api/index.html?org/apache/solr/common/SolrDocument.html > > Is the talk about "SolrIndex" -->  "A concrete rep

Re: Solrj ContentStreamUpdateRequest Slow

2010-08-20 Thread Lance Norskog
There are no unit tests for stream.file or stream.url. Tests in org.apache.solr.handler.TestCSVLoader.filename:loadLocal() intercept them and do its own thing, feeding a local file instead of the stream.file parameter. I see no proof that stream.file/stream.url should work in SolrJ or in EmbeddedSo

Re: Using postCommit event to swap cores

2010-08-20 Thread Mike
On 8/20/2010 9:52 PM, Lance Norskog wrote: Another way to do this is to set up the "live" core to do replication from the "standby" core. Replication should work this way, between different cores in the same Solr instance. This is cleaner than swapping the two cores. On Thu, Aug 19, 2010 at 7:2

Re: Fun with Spatial (Haversine formula)

2010-08-20 Thread Mattmann, Chris A (388J)
It might have something to do with the source data and its spatial reference system. For example, if the data is in WGS84 then the haversine (great circle) distance precision gets worse the farther away two cities are from each other or for particular regions (e.g. further away from equator). Chee

Re: Using postCommit event to swap cores

2010-08-20 Thread Lance Norskog
Another way to do this is to set up the "live" core to do replication from the "standby" core. Replication should work this way, between different cores in the same Solr instance. This is cleaner than swapping the two cores. On Thu, Aug 19, 2010 at 7:25 AM, simon wrote: > Hi there, > > I have sol

Re: Fun with Spatial (Haversine formula)

2010-08-20 Thread Yonik Seeley
Lance, have you figured out what the issue is? Anyone know if this is a haversine limitation, or a bug? -Yonik http://lucenerevolution.org Lucene/Solr Conference, Boston Oct 7-8 On Wed, Aug 18, 2010 at 1:54 AM, Lance Norskog wrote: > The Haversine formula in o.a.s.s.f.d.DistanceUtils.java gives

Re: Solr for multiple websites

2010-08-20 Thread Lance Norskog
Some Solr terms: Lucene Index: actual Lucene index on the disk Solr core: Solr wrapper for a Lucene Index- core is generally used when talking about configuring it Solr shard: A Lucene index with a Solr wrapper, that is part of a distributed Solr index. 'shard' specifically means a part of the whol

Confused about highlighting

2010-08-20 Thread Mark E. Haase
I have highlighting working on my project (indexing content for a web app), but the idea of highlighting with tags doesn't make sense to me. It seems that it opens up the system to XSS attacks if you echo search result data (with highlights) into a web page. Example: Index the following string:

proper query handling for multiValued queries that are also polyFields?

2010-08-20 Thread Thomas Joiner
I am wondering...is there currently any way for queries to properly handle multiValued polyFields? For instance, if you have a And if you added two values to that field such as "1,2" and "3,4", that would match both "1,4", and "3,2" as well as "1,2" and "3,4". So I'm wondering if that is somet

Re: specifying the doc id in clustering component

2010-08-20 Thread Tommy Chheng
Yes, that's the approach I'm taking right now. I do a lookup the doc ids in the resultset to find the matching document. I can live with the manual lookup, I wanted to see if it would be possible to pick a custom field to represent the document in the docs array. Thanks for contributing the

spellcheck index blown away during rebuild

2010-08-20 Thread Shawn Heisey
I am just delving into the spellcheckcomponent on a test server running a 3.1 build from June 29th. I have noticed that when you ask for a rebuild of the spell check index, it deletes it before starting the rebuild. It takes about 39 minutes to build one (3GB), which is a long time to do wit

Re: Deadlock in Server?

2010-08-20 Thread Devin Foley
Thanks Yonik! I recognize your name from Googling for Solr issues. Your help is greatly appreciated. I will attempt to tune maxMergeDocs as a first step. On Fri, Aug 20, 2010 at 12:39 PM, Yonik Seeley wrote: > This sounds like perhaps a major merge was triggered. > > You could do a nightly opt

Re: Deadlock in Server?

2010-08-20 Thread Yonik Seeley
This sounds like perhaps a major merge was triggered. You could do a nightly optimize - which will take just as long, but you control when it happens. The other option is to prevent too big of segments being greated (at the expense of search speed) with options such as maxMergeDocs. -Yonik http:/

Deadlock in Server?

2010-08-20 Thread Devin Foley
Hi All, I'm currently using Solr to run a search engine that is pulling in new data 24 hours a day. Currently, I'm indexing about 5 million documents per day, each document being around 2k in size. Every few days, the server locks up, and calls to /update stop working. During this lock, CPU, me

Re: Logging in Embedded SolrServer - What a nightmare.

2010-08-20 Thread Ahmet Arslan
> So, Embedded Solr Server keeps logging queries and other > stuff in my stdout. I came across same problem. While looking for a solution I read your post. I was able to find a solution by chance, so i wanted to share. When I run my program with this parameter and logs disappeared. java -Djava.

Re: Doing Shingle but also keep special single word

2010-08-20 Thread Ahmet Arslan
> I am building index with Shingle > filter. We know it's minimum 2-gram but I also want keep > some special single word, e.g. IBM, Microsoft, etc. i.e. I > want to do a minimum 2-gram but also want to have these > single word in my index, Is it possible? outputUnigrams="true" parameter does not w

Re: Autosuggest on PART of cityname

2010-08-20 Thread PeterKerk
@Markus: thanks, will try to work with that. @Gijs: I've looked at the site and the search function on your homepage is EXACTLY what I need! Do you have some Solr code samples for me to study perhaps? (I just need the relevant fields in the schema.xml and the query url) It would help me a lot! :)

Tokenising on Each Letter

2010-08-20 Thread Scottie
Just getting ready to launch Solr on one of our websites. Unfortunately, we can't work out one little issue; how do I configure Solr such that it can search our model numbers easily? For example: ADS12P2 If somebody searched for ADS it would match, because currently its split into tokens when i

Re: Autosuggest on PART of cityname

2010-08-20 Thread gwk
On 8/19/2010 4:45 PM, PeterKerk wrote: I want to have a Google-like autosuggest function on citynames. So when user types some characters I want to show cities that match those characters but ALSO the amount of locations that are in that city. Now with Solr I now have the parameter: "&fq=title:

Re: Proper Escaping of Ampersands

2010-08-20 Thread Yonik Seeley
On Thu, Aug 19, 2010 at 11:33 AM, Nikolas Tautenhahn wrote: > Hi, > > I have a problem with, for example, company names like "AT&S". > A Job is sending data to the solr 1.4 (also tested it with 1.4.1) index > via python in XML, everything is escaped properly ("&" becomes "&"). > > When I search fo

Re: Doing Shingle but also keep special single word

2010-08-20 Thread scott chu
Hi, Brendan, Thanks for reply. The real case is that I can't predict when there's a new important special word that users are interesting cause I am building a daily news article data. Therefore, I don't know when & what single words should include into that new field. I've ever thought ab

Re: Proper Escaping of Ampersands

2010-08-20 Thread Nikolas Tautenhahn
Hi all, just some further information: https://issues.apache.org/jira/browse/SOLR-1250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel seems to be the same problem - but searching the archives yielded nothing I could use. Any hints on this? best regards, Nikolas Tautenhahn Am

Wild card based filter queries

2010-08-20 Thread Nemani, Raj
Hello all, I was wondering If you all can help me with the following. Is it possible to filter search queries using wild cards. Here is what I am thinking.. >From Solr admin's full search interface I can enter the following into "Filter Query" text box to filter the results Ex: que

Re: Doing Shingle but also keep special single word

2010-08-20 Thread Brendan Grainger
Hi Scott, Is there a reason why you wouldn't just index these special words into another field and then search over both fields? That would also have the nice property of being able to boost on the special word field if you wanted. HTH Brendan On Aug 20, 2010, at 6:19 AM, scott chu (朱炎詹) wrote

RE: Basic conceptual questions about solr

2010-08-20 Thread Shaun McArthur
Very useful - thanks very much. I'll have a look at DIH too. Best, Shaun -Original Message- From: Jan Høydahl / Cominvent [mailto:jan@cominvent.com] Sent: Thursday, August 19, 2010 8:02 PM To: solr-user@lucene.apache.org Subject: Re: Basic conceptual questions about solr Hi, You

Re: How to get most indexed keyword from SOLR

2010-08-20 Thread Jan Høydahl / Cominvent
Check out the luke request handler: http://localhost:8983/solr/admin/luke?fl=my_ad_field&numTerms=100 - you'll find topTerms for the fields specified -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Training in Europe - www.solrtraining.com On 20. aug. 2010, at 11.39,

RE: sort order of "missing" items

2010-08-20 Thread Brad Dewar
Just to close this thread: Missing values are sorted as though equal to each other, as you would expect, and ties are broken only after all explicit sort criteria are evaluated. In my specific case, the problem was that the application was querying field "a", but was in fact sorting by a copyFi

RE: Autosuggest on PART of cityname

2010-08-20 Thread Markus Jelsma
You can't, it's analyzed. And if you facet on a non-analyzed field, you cannot distinguish between upper- and lowercase tokens. If you want that, you must create a new field with an EdgeNGramTokenizer, search on it and then you can facet on a non-analyzed field. Your query will be a bit differen

RE: Autosuggest on PART of cityname

2010-08-20 Thread PeterKerk
Ok, I now do this (searching for "utr" in cityname): http://localhost:8983/solr/db/select/?wt=json&indent=on&q=*:*&rows=0&facet=true&facet.field=city&facet.prefix=utr In the DB there's 1 location with cityname 'Utrecht' and the other 1 is with 'Utrecht Overvecht' So in my dropdown I would like:

Write Solr Books - Packt Publishing.

2010-08-20 Thread Kshipra Singh
Hi All, I represent Packt Publishing, the publishers of computer related books. After publishing our first book on Solr in August 2009, we are planning to advance our line of books in this domain. Solr is one of the hot topics for new books at Packt this month. Currently we are inviting book

Doing Shingle but also keep special single word

2010-08-20 Thread 朱炎詹
I am building index with Shingle filter. We know it's minimum 2-gram but I also want keep some special single word, e.g. IBM, Microsoft, etc. i.e. I want to do a minimum 2-gram but also want to have these single word in my index, Is it possible? Scott

How to get most indexed keyword from SOLR

2010-08-20 Thread Pawan Darira
Hi I have daily 1000 of classifieds ads posting in my SOLR index. There is an ad description field where details of the respective ad is stored. I want to know the list of keywords which are used maximum no. of times across 1000 ads posted in a day. -- Thanks, Pawan Darira