Re: SOLR Thesaurus

2010-12-10 Thread Péter Király
I also try to define the problem. In the library world there are some general and special thesaurus, which reveal the relations between concepts. The relations have types as Lee described: Prefered Term (PT), Broader Terms (BT), Narrower Terms (NT) Related Terms (RT) and others. Some of these

Re: dismax: limiting term match to one field

2010-12-10 Thread Jan Kurella
On 09.12.2010 21:26, ext Chris Hostetter wrote: : doc1 is name=A B category=B : doc2 is name=A category=B : : when searching for the terms A and B I want doc2 to get a higher score. : to be more specific, I don't want the term B to influence doc1's score in : bothname andcategory, only in one

Re: SOLR Thesaurus

2010-12-10 Thread lee carroll
Hi Chris, Its all a bit early in the morning for this mined :-) The question asked, in good faith, was does solr support or extend to implementing a thesaurus. It looks like it does not which is fine. It does support synonyms and synonym rings which is again fine. The ski example was an

Re: SOLR Thesaurus

2010-12-10 Thread Peter Sturge
Hi Lee, Perhaps Solr's clustering component might be helpful for your use case? http://wiki.apache.org/solr/ClusteringComponent On Fri, Dec 10, 2010 at 9:17 AM, lee carroll lee.a.carr...@googlemail.com wrote: Hi Chris, Its all a bit early in the morning for this mined :-) The question

Working Chef Cookbook for Solr

2010-12-10 Thread György Frivolt
Hi, I tried to setup Solr by chef and so far found only the opscode one, but this one setup only the group and the user for solr, not the solr engine. Does anyone know about a maintained solr chef cookbook? Thanks for suggestion! Georg

Re: SOLR Thesaurus

2010-12-10 Thread lee carroll
Hi Peter, Thats way to clever for me :-) Discovering thesuarus relationships would be fantastic but its not clear what heuristics you would need to use to discover broader, narrower, related documents etc. Although I might be doing the clustering down i'm sceptical about the accuracy. cheers Lee

Re: Multicore and Replication (scripts vs. java, spellchecker)

2010-12-10 Thread Martin Grotzke
Hi, that there's no feedback indicates that our plans/preferences are fine. Otherwise it's now a good opportunity to feed back :-) Cheers, Martin On Wed, Dec 8, 2010 at 2:48 PM, Martin Grotzke martin.grot...@googlemail.com wrote: Hi, we're just planning to move from our replicated single

Re: Working Chef Cookbook for Solr

2010-12-10 Thread Upayavira
I will likely need to create one in the next week or two. Depends upon how soon you need one. The one you've found is probably designed to work with rails apps. It assumes you have solr installed already, and adds another instance/index. I certainly need one that can do something that'll create

Indexing documents with SOLR

2010-12-10 Thread pankaj bhatt
Hi All, I am a newbie to SOLR and trying to integrate TIKA + SOLR. Can anyone please guide me, how to achieve this. * My Req is:* I have a directory containing a lot of PDF,DOC's and i need to make a search within the documents. I am using SOLR web application. I just need

Re: Indexing documents with SOLR

2010-12-10 Thread Tommaso Teofili
Hi Pankaj, you can find the needed documentation right here [1]. Hope this helps, Tommaso [1] : http://wiki.apache.org/solr/ExtractingRequestHandler 2010/12/10 pankaj bhatt panbh...@gmail.com Hi All, I am a newbie to SOLR and trying to integrate TIKA + SOLR. Can anyone please guide me,

Erratic Behaviour From Filters

2010-12-10 Thread Lohrenz, Steven
Hi, I have implemented a QueryParser that queries another solr core and returns a list of values (resourceIds) that are the primary solr key on the main core. I then query the main core using the resourceId to retrieve the Lucene docId. I build up an array of ints of these doc ids. I put this

Re: SOLR Thesaurus

2010-12-10 Thread Péter Király
Hi Lee, according to my vision the user could decide which relationship types would he likes to attach to his search, and the application would call his attention to other possibilities. So there would be no heuristic method applied, because e.g. boarder terms would cause lots of misleading

Re: SOLR Thesaurus

2010-12-10 Thread lee carroll
Two Peters (or rather a stupid english bloke who can't work out how to type fancy accents :-) Sorry Péter (took me 10 minutes to work out i could cut and paste) my reply was to the clustering post by Peter Sturge. Clustering sounds great but being able to define a thesaurus scheme excatly would

Re: Working Chef Cookbook for Solr

2010-12-10 Thread György Frivolt
Although I access solr from rails by sunspot, the rails server runs on heroku, so on a different machine. I prefer to have solr as stand alone server and want to tell sunspot where it can find the running solr. I am quite new to chef, but if I can I could help with writing a cookbook I would. If

Re: SolJSON

2010-12-10 Thread alessandro.ri...@virgilio.it
Hi Lee, Thank you very much for your quick answer! It works fine! Ciao, Alessandro solr-user@lucene.apache.org -Original Message- From: lee carroll [mailto:lee.a.carr...@googlemail.com] Sent: 09 December 2010 18:46 To: solr-user@lucene.apache.org; alessandro.ri...@virgilio.it

Re: Indexing documents with SOLR

2010-12-10 Thread Adam Estrada
Nutch is also a great option if you want a crawler. I have found that you will need to use the latest version of PDFBox and a it's dependencies for better results. Also, make sure to set JAVA_OPT to something really large so that you won't exceed your heap size. Adam On Fri, Dec 10, 2010 at 6:27

[Multiple] RSS Feeds at a time...

2010-12-10 Thread Adam Estrada
All, Right now I am using the default DIH config that comes with the Solr examples. I update my index using the dataimport handler here http://localhost:8983/solr/admin/dataimport.jsp?handler=/dataimport This works fine but I want to be able to index more than just one feed at a time and more

SOLR geospatial

2010-12-10 Thread George Anthony
In looking at some of the docs support for geospatial search. I see this functionality is mostly scheduled for upcoming release 4.0 (with some playing around with backported code). I note the support for the bounding box filter, but will bounding box be one of the supported *data* types for

Re: multiple binary documents into a single solr document - Vignette/OpenText integration

2010-12-10 Thread briankous
Hi there, We are trying to replace opentext (V7.6) autonomy with solr so that we can index other contents, too. Due to lack of manpower and time, the management wants to buy the adapter if available. Do you know of any vendor who sells the adapter or professional service? Thank you. Brian Ko

OutOfMemory GC: GC overhead limit exceeded - Why isn't WeakHashMap getting collected?

2010-12-10 Thread John Russell
I have been load testing solr 1.4.1 and have been running into OOM errors. Not out of heap but with the GC overhead limit exceeded message meaning that it didn't actually run out of heap space but just spent too much CPU time trying to make room and gave up. I got a heap dump and sent it through

Re: SOLR Thesaurus

2010-12-10 Thread Chris Hostetter
: My imaginative use case: : - the user enters a term and maybe he turns on a flag to get not just : the term, but all terms, which related somehow with this (usually the : synonyms and narrower terms). : - Solr first find the queried term(s) in the thesaurus, then finds the : related terms,

Re: OutOfMemory GC: GC overhead limit exceeded - Why isn't WeakHashMap getting collected?

2010-12-10 Thread Tom Hill
Hi John, WeakReferences allow things to get GC'd, if there are no other references to the object referred to. My understanding is that WeakHashMaps use weak references for the Keys in the HashMap. What this means is that the keys in HashMap can be GC'd, once there are no other references to the

Re: OutOfMemory GC: GC overhead limit exceeded - Why isn't WeakHashMap getting collected?

2010-12-10 Thread John Russell
Thanks a lot for the response. Unfortunately I can't check the statistics page. For some reason the solr webapp itself is only returning a directory listing. This is sometimes fixed when I restart but if I do that I'll lose the state I have now. I can get at the JMX interface. Can I check my

Separate Lines Like Google

2010-12-10 Thread Alejandro Delgadillo
Hi everybody, I¹m having some troubles trying to figure out how to separate lines in a paragraph from a search result, I¹m indexing PDF¹s but when I search the highlight terms I can not know when the first line ends and the next one begins, Is there a way to put a [...] like google o a

Separate Lines Like Google

2010-12-10 Thread Alejandro Delgadillo
Hi everybody, I¹m having some troubles trying to figure out how to separate lines in a paragraph from a search result, I¹m indexing PDF¹s but when I search the highlight terms I can not know when the first line ends and the next one begins, Is there a way to put a [...] like google o a

Is it possible to assign default value for a particular record when using multivalued field type?

2010-12-10 Thread bbarani
Hi, I have a multivalued field for which some of the records have null or empty data in it. Since its difficult to parse and match empty XML tags in SOLR ouput, I thought I would assign a default value for those empty data as below. field name=related_uid type=string indexed=true stored=true

Tips for 'staggered date facets', i.e. 'last 24 hours, last week, last month, last year' , ala google news?

2010-12-10 Thread Will Milspec
hi all, We wish to implement date faceting with a 'sliding date range', 'last 24 hours, last week, last month, last year' . Google New currently implements such faceting when you search for a topic. As Solr's standard date faceting does not appear to meet this need, we will need to use

Re: SOLR Thesaurus

2010-12-10 Thread Chris Hostetter
: The question asked, in good faith, was does solr support or extend to : implementing a thesaurus. It looks like it does not which is fine. It does Well, my point was that thesaurus is not a feature description. it's a data structure, and depending on your goals, the existing SynonymFilter

Re: SOLR Thesaurus

2010-12-10 Thread Péter Király
Hi Chris, thanks for your description. I should think about this a little bit more, then I will ask some details. The main problem is that Synonyms are one kind of relations, and Thesaurus may contain 6-10 kinds of relations. And it is depending on the user, which types of relations he would like

Re: Tips for 'staggered date facets', i.e. 'last 24 hours, last week, last month, last year' , ala google news?

2010-12-10 Thread Chris Hostetter
: As Solr's standard date faceting does not appear to meet this need, we will : need to use faceting on arbitrary queries, i.e. by passing multiple values : for facet.query correct, facet.date is really just a convincence feature over using facet.query when you want lots of consistently sized

Re: Multicore and Replication (scripts vs. java, spellchecker)

2010-12-10 Thread Chris Hostetter
: #SOLR-433 MultiCore and SpellChecker replication [1]. Based on the : status of this feature request I'd asume that the normal procedure of : keeping the spellchecker index up2date would be running a cron job on : each node/slave that updates the spellchecker. : Is that right? i'm not 100%

Re: Query performance very slow even after autowarming

2010-12-10 Thread Chris Hostetter
: I made the field that is indexed with EdgeNGramFilterFactory as default : search field. All my query responses are very slow, some of them taking more : than 10seconds to respond. based on the info you've given, there's dozens of posisble reasons why you might see slow queries -- it's hard

Re: search problem after using EdgeNGramFilter

2010-12-10 Thread Chris Hostetter
: I thought that I have to use NGramFilter for wildcard search. : But It was the wrong idea. : Thanks, iorixxx your confusion may be because using EdgeNGramFilter is a way to make prefix queries faster by precomputing hte prefixes as index time instead of at query time. (trading disk space

Re: Using synonyms in combination with facets

2010-12-10 Thread Chris Hostetter
: I have a field that I use for facetting. I do not tokenize this field. It : has entries like: : : AWB artikel 2, lid 1 : AWB artikel 8:75 : Algemene Wet Bestuursrecht artikel 8:75 I assume those are names of laws, followed by page/paragram numbers in various formats? (and evidently lid is

Viewing query debug explanation with dismax and multicore

2010-12-10 Thread sara motahari
Hi All, I am trying to debug my queries and see how scoring is done. I have 6 cores and send the quesy to 6 shards and it's dismax handler (with search on various fields with different boostings). I enable debug, and view source but I'm unable to see the explanations. I'm returning ID and

Re: [Multiple] RSS Feeds at a time...

2010-12-10 Thread Lance Norskog
There is I believe no way to do this without separate copies of your script. Each 'handler=/dataimport' has to refer to a separate config file. You can make several copies and name them config1.xml, config2.xml etc. You'll have to call each one manually, so you have to manage your own thread

Re: Search based on images

2010-12-10 Thread Lance Norskog
Searching for an image with a painted query! Wow. On Wed, Dec 8, 2010 at 11:14 PM, Maciej Lisiewski c2h...@poczta.fm wrote: There is imgSeek ( http://www.imgseek.net/isk-daemon ), which while being far from perfect (can't handle rotated images) is quite simple and has already been added to

command line parameters for solr

2010-12-10 Thread Jack O
Hello, For starting solr, from where do i find the list of command line parameters. java -jar start.jar blahblah... I am especially looking for how to specify my own jetty config file. I want to allow access of solr from localhost only. I would really appreciate all your help. /J

singular/plurals

2010-12-10 Thread Jack O
Hello, Need one more help: What do I have to do so that search will work for singulars and plurals ? I would really appreciate all your help. /J

Re: Search based on images

2010-12-10 Thread Dennis Gearon
Threre is actually some image recognition search engine software somewhere I heard about. Take a picture of something, say a poster, upload it, and it will adjust for some lighting/angle/distortion, and try to find it on the web somewhere. You hear about crazy stuff like this at dev camps.

Re: singular/plurals

2010-12-10 Thread Tom Hill
Check out this page: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters Look, in particular, for stemming. On Fri, Dec 10, 2010 at 7:58 PM, Jack O jack_...@yahoo.com wrote: Hello, Need one more help: What do I have to do so that search will work for singulars and plurals ? I

Re: command line parameters for solr

2010-12-10 Thread Tom Hill
java -jar start.jar --help More docs here http://docs.codehaus.org/display/JETTY/A+look+at+the+start.jar+mechanism Personally, I usually limit access to localhost by using whatever firewall the machine uses. Tom On Fri, Dec 10, 2010 at 7:55 PM, Jack O jack_...@yahoo.com wrote: Hello, For

Re: command line parameters for solr

2010-12-10 Thread Jack O
thanks Tom, really appreciate. From: Tom Hill solr-l...@worldware.com To: solr-user@lucene.apache.org Sent: Fri, December 10, 2010 9:43:08 PM Subject: Re: command line parameters for solr java -jar start.jar --help More docs here

Re: Search based on images

2010-12-10 Thread Maciej Lisiewski
W dniu 2010-12-11 06:24, Dennis Gearon pisze: Threre is actually some image recognition search engine software somewhere I heard about. Take a picture of something, say a poster, upload it, and it will adjust for some lighting/angle/distortion, and try to find it on the web somewhere.

best way to configure DIH for multiple DBS

2010-12-10 Thread Geek Gamer
hi group, I have multiple document types indexed on a single core solr instance and each comes from a different DB. What is the best way to configure DIH to read each document type from corresponding DB. AS far as i could find DIH does not honour multiple document tags inside the data config.

Re: command line parameters for solr

2010-12-10 Thread Jack O
Tom, I would like to reachout to directly. Whats your email address? /j From: Tom Hill solr-l...@worldware.com To: solr-user@lucene.apache.org Sent: Fri, December 10, 2010 9:43:08 PM Subject: Re: command line parameters for solr java -jar start.jar --help

Re: Is it possible to assign default value for a particular record when using multivalued field type?

2010-12-10 Thread bbarani
Hi, Thanks a lot for your reply.. I am using database import handler to get the data (DIH) from DB. When I get a null data in single valued attribute the 'default' attribute seems to work perfectly fine. But seems like I need to validate the Null value (like using case when else statement) in

Re: best way to configure DIH for multiple DBS

2010-12-10 Thread bbarani
I am not sure whether I understand your question properly. If you are trying to get data from different database and dumping it to same index file then you need to specify a way to retrieve a particular data back from that XML (which actually contains the consolidated data from all Db's). For

Re: best way to configure DIH for multiple DBS

2010-12-10 Thread bbarani
Just to give you some more clarification.. you can create multiple database config file (separate) to extract the data from different sources and add the hardcoded identifier in SOLR select query corresponding to each source. So you will have multiple data import handler committing the data in

Re: Concurrent DIH calls

2010-12-10 Thread bbarani
Hi, As far as I know there is no queuing mechanism in SOLR for concurrent indexing request. It would simple ignore the concurrent request (first come first serve basis).. Solr experts, please correct me if I am wrong.. To achieve concurrency, we have implemented a queue using JMS and we send

Re: Delete by query or Id very slow

2010-12-10 Thread bbarani
Hi, As Tom suggested removing optimize and passing the ids as list (instead of for loop) will surely increase the speed of deletion. We have a program which fetches complete list of ID from back end (around 10 million) and compares it with the complete list of id's present in SOLR document and

Re: SOLR Config issue

2010-12-10 Thread bbarani
I am not sure if I understand your question correctly.. Are you saying that you are not able to start Jetty server in linux box? or SOLR application is not starting up even after server has started? Thanks, Barani -- View this message in context:

Re: Search based on images

2010-12-10 Thread Dennis Gearon
Tried one, of Perry Mason's secretary when she was young (and HOOOT), Barbara Hale. http://www.skylighters.org/ggparade/index8.html Didn't find it. 1.8 billion images indexed is probably a DROP in the bucket of what's out there. Dennis Gearon Signature Warning It is

Shards + dismax - scoring process?

2010-12-10 Thread bbarani
Hi, We are using 4 cores (starting from core 0 to core 4) for parallel indexing process. We use shards to do distributed indexing and we also use dismax request handler when doing search. I have configured core0 as Shards master core. When I issue a search query (with dismax request handler)