Field Collapsing (was Re: Schema for group/child entity setup)

2009-09-02 Thread R. Tan
I think this is what I'm looking for. What is the status of this patch? On Thu, Sep 3, 2009 at 12:00 PM, R. Tan wrote: > Hi Solrers, > I would like to get your opinion on how to best approach a search > requirement that I have. The scenario is I have a set of business listings > that may be grou

Re: How to use DataImportHandler with ExtractingRequestHandler?

2009-09-02 Thread Noble Paul നോബിള്‍ नोब्ळ्
unfortunately DIH is not yet integrated with ExtractingRequestHandler . see this https://issues.apache.org/jira/browse/SOLR-1358 On Thu, Sep 3, 2009 at 5:34 AM, Khai Doan wrote: > Hi all, > > My name is Khai.  I have a table in a relational database.  I have > successfully use DataImportHandler

Schema for group/child entity setup

2009-09-02 Thread R. Tan
Hi Solrers, I would like to get your opinion on how to best approach a search requirement that I have. The scenario is I have a set of business listings that may be group into one parent business (such as 7-eleven having several locations). On the results page, I only want 7-eleven to show up once

Return 2 fields per facet.. name and id, for example? / facet value search

2009-09-02 Thread R. Tan
Sorry for the duplicate post, if ever, can anyone share their experience on holding facet heading/value IDs in Solr? On Fri, Aug 28, 2009 at 3:27 AM, Rihaed Tan wrote: > Hi, > > I have a similar requirement to Matthew (from his post 2 years ago). Is > this still the way to go in storing both the

Re: Problem querying for a value with a "space"

2009-09-02 Thread Avlesh Singh
Spaces, if there in a term query, should be escaped before searching. myField:quick\ brown\ fox is the correct way to search for quick brown fox in myField using TermQuery. You can always add &debugQuery=on your search url. The response will contain a lot of helpful information on how the incomi

Re: Quickly view index files?

2009-09-02 Thread Koji Sekiguchi
rajan chandi wrote: Use Apache Luke. If you're using new Lucene. You might need to add Lucene 2.9 Jar files to the Luke and build it. Just an FYI. Luke can be launched by ant at the solr install directory: $ ant luke Koji Cheers Rajan On Wed, Sep 2, 2009 at 2:02 PM, Jason Rutherglen

Re: Problem querying for a value with a "space"

2009-09-02 Thread Adam Allgaier
I think I understand what happened. The query "+specific_LIST_s:For Sale" is processed and broken into "For" and "Sale". The specific_LIST_s field is a "string", so it is not tokenized, but remains indexed as "For Sale", which matches neither "For" nor "Sale". Hence, no results. This que

How to use DataImportHandler with ExtractingRequestHandler?

2009-09-02 Thread Khai Doan
Hi all, My name is Khai. I have a table in a relational database. I have successfully use DataImportHandler to import this data into Apache Solr. However, one of the column store the location of PDF file. How can I configure DataImportHandler to use ExtractingRequestHandler to extract the conte

Re: questions about solr

2009-09-02 Thread Jason Rutherglen
For HDFS, failover, sharding you may want to use Solr with Katta. There's an issue open at: http://issues.apache.org/jira/browse/SOLR-1301 Near realtime search needs to be added incrementally to Solr. Today I wouldn't recommend it. On Wed, Sep 2, 2009 at 10:14 AM, Zhenyu Zhong wrote: > Dear all,

Re: Problem querying for a value with a "space"

2009-09-02 Thread Dan A. Dickey
On Wednesday 02 September 2009 15:15:42 Adam Allgaier wrote: > Touch gently with the Solr newbieI've searched trying to find an answer > to this problem with no success. I'm sure it's something small and easy. > > I'm using Solr 1.3 with Solrj client > > omitNorms="true"/> > ... > > >

score = sum of boosts

2009-09-02 Thread Joe Calderon
hello *, what would be the best approach to return the sum of boosts as the score? ex: a dismax handler boosts matches to field1^100 and field2^50, a query matches both fields hence the score for that row would be 150 is this something i could do with a function query or do i need to hack up Di

Re: Does the default operator affect phrase searching?

2009-09-02 Thread Dan A. Dickey
On Wednesday 02 September 2009 16:37:03 Gérard Dupont wrote: > > > > Yes, it does - thanks! > > Back to translating legacy search queries into Solr search queries. :) > > -Dan > > > > Just curious : what legacy system is it ? Sorry, but at the moment - I don't think I'm at liberty to say

Re: Does the default operator affect phrase searching?

2009-09-02 Thread Gérard Dupont
> > Yes, it does - thanks! > Back to translating legacy search queries into Solr search queries. :) > -Dan > Just curious : what legacy system is it ?

Re: Does the default operator affect phrase searching?

2009-09-02 Thread Dan A. Dickey
On Wednesday 02 September 2009 16:00:55 Gérard Dupont wrote: > Hi Dan, > > Phrase search (ie using quote) in Lucene does exact match or your expression > so if you type ["david pdf"] (brackets are there to limit the query in my > mail only) the system search for a document that contain the term 'd

Re: Does the default operator affect phrase searching?

2009-09-02 Thread Walter Underwood
Is "pdf" inside the file or part of the file name? What legacy system? I've helped write a couple of them. Some systems, like Ultraseek, add parts of the filename as searchable text. wunder On Sep 2, 2009, at 1:49 PM, Dan A. Dickey wrote: I'm having a problem with doing a phrase search of

Re: Does the default operator affect phrase searching?

2009-09-02 Thread Gérard Dupont
Hi Dan, Phrase search (ie using quote) in Lucene does exact match or your expression so if you type ["david pdf"] (brackets are there to limit the query in my mail only) the system search for a document that contain the term 'david' and the term 'pdf' separated by a space (well in the classic case

Does the default operator affect phrase searching?

2009-09-02 Thread Dan A. Dickey
I'm having a problem with doing a phrase search of "david pdf". When I search for just "david", I get 7 hits. When I search for "pdf" I get 73 hits. On a legacy system, searching for "david pdf" I get 78 hits. And on Solr (1.4 - one of the nightly builds) - when searching for "david pdf" I get 0

Problem querying for a value with a "space"

2009-09-02 Thread Adam Allgaier
Touch gently with the Solr newbieI've searched trying to find an answer to this problem with no success. I'm sure it's something small and easy. I'm using Solr 1.3 with Solrj client ... I am indexing the "specific_LIST_s" with the value "For Sale". The document indexes just fine. A qu

Re: Viewing xml in Safari

2009-09-02 Thread Lucas F. A. Teixeira
http://www.entropy.ch/software/MacOSX/xmlviewplugin/ Lucas Frare Teixeira .·. - lucas...@gmail.com - blog.lucastex.com - twitter.com/lucastex On Wed, Sep 2, 2009 at 3:28 PM, Paul Tomblin wrote: > Slightly off topic, but I'm getting tired of hitting the 'view source' > keyboard shortcut every t

Viewing xml in Safari

2009-09-02 Thread Paul Tomblin
Slightly off topic, but I'm getting tired of hitting the 'view source' keyboard shortcut every time I do a solr query.  Is there a way to make Safari display xml as-is? -- Sent from my Palm Prē

Re: Quickly view index files?

2009-09-02 Thread rajan chandi
I would recommend using the IndexReader class. That could be the fastest possible :) Cheers Rajan On Wed, Sep 2, 2009 at 2:22 PM, Jason Rutherglen wrote: > I needed to mention through the web UI. Solr Luke takes ages to load. > > On Wed, Sep 2, 2009 at 11:05 AM, rajan chandi > wrote: > > Use A

Re: Quickly view index files?

2009-09-02 Thread Jason Rutherglen
I needed to mention through the web UI. Solr Luke takes ages to load. On Wed, Sep 2, 2009 at 11:05 AM, rajan chandi wrote: > Use Apache Luke. > > If you're using new Lucene. You might need to add Lucene 2.9 Jar files to > the Luke and build it. > > Cheers > Rajan > > > On Wed, Sep 2, 2009 at 2:02

Re: Quickly view index files?

2009-09-02 Thread rajan chandi
Use Apache Luke. If you're using new Lucene. You might need to add Lucene 2.9 Jar files to the Luke and build it. Cheers Rajan On Wed, Sep 2, 2009 at 2:02 PM, Jason Rutherglen wrote: > Is there a quick way to view index files? >

Quickly view index files?

2009-09-02 Thread Jason Rutherglen
Is there a quick way to view index files?

Re: A very complex search problem.

2009-09-02 Thread rajan chandi
Great Thanks Aakash for your inputs! We'll try to do some research and possibly bench-marks before we move forward. Regards Rajan On Wed, Sep 2, 2009 at 1:27 PM, Aakash Dharmadhikari wrote: > hi Rajan, > > More knowledgeable people might be able to provide better insight into > the performance

Re: Using SolrJ with Tika

2009-09-02 Thread Grant Ingersoll
Hi Angel, I'm looking into it. Might need a new SolrRequest, but still playing around and will let you know... -Grant On Sep 2, 2009, at 4:56 AM, Angel Ice wrote: Hi everybody. I hope it's the right place for questions, if not sorry. I'm trying to index rich documents (PDF, MS docs etc)

Re: A very complex search problem.

2009-09-02 Thread Aakash Dharmadhikari
hi Rajan, More knowledgeable people might be able to provide better insight into the performance issues, but I have a doubt around this ORing business. The best option I see is storing all my friends IDs in my documents as multi valued field. This in contrast to OR queries would make queryin

Re: Polish Stemmer

2009-09-02 Thread David Espinosa
Thanks very much! I suppose I’m still very dummy in Solr, I was supposting I could do it directly. I did what you said and it seems to work perfectly! *public* *class* PolishStemFilterFactory *extends* BaseTokenFilterFactory { *public* StempelFilter create(TokenStream in) { *

Re: date field type problem

2009-09-02 Thread Chris Hostetter
: solr.DateField compatible format. I wrote a new : definition inside the solrconfig.xml, which creates : eg. 1991-01-01T00:00:01Z from the input '[c1991.]' string. is only supported when the class of the is TextField ... it would be nice if it worked with any other field type (i think it wou

questions about solr

2009-09-02 Thread Zhenyu Zhong
Dear all, I am very interested in Solr and would like to deploy Solr for distributed indexing and searching. I hope you are the right Solr expert who can help me out. However, I have concerns about the scalability and management overhead of Solr. I am wondering if anyone could give me some guidanc

RE: SOLR vs SQL

2009-09-02 Thread Fuad Efendi
> Just execute 20 SQL queries with filters > Same with SOLR vs. Lucene, standard Lucene queries "filter1:value1 AND > facet2:value2" ... "filter1:value1 AND facet2:value99" are functionally the > same as SOLR faceting (99 docset intersections in RAM) and (sooner or later) > implementation

RE: SOLR vs SQL

2009-09-02 Thread Fuad Efendi
> This article explains in-depth why calculating facets is not practical in > pure SQL: http://www.kimbly.com/blog/000239.html -> "The problem is that SQL isn't really capable of expressing set intersections." But this article is not applicable to described use case: we are _faceting_on_filtered_

RE: SOLR vs SQL

2009-09-02 Thread Fuad Efendi
Hi gwk, Thanks for reply! Yes, SOLR gives out-of-the-box - indexes - implicit data normalization - fault-tolerance, replication, scalability - performance (so that we can save _huge_ money & time) But from just an engineering viewpoint, forgetting cost&time, SELECT COUNT(*) ... WHERE ... seems

Re: Re : Using SolrJ with Tika

2009-09-02 Thread rajan chandi
I have not used these APIs but Actually, You don't need CURL to POST the document to Solr. You can execute an HTTP POST using only Java. http://www.jguru.com/faq/view.jsp?EID=62798 You might want to look at SolrInputDocument. No matter what mechanism you may use to post the document. The point i

Re: Polish Stemmer

2009-09-02 Thread Shalin Shekhar Mangar
On Wed, Sep 2, 2009 at 8:10 PM, David Espinosa wrote: > My problem appears when I try to create a Polish stemmed index. There isn’t > a Snowball implementation for Polish, but I found a lucene one: > > http://www.getopt.org/stempel/index.html#distrib > > I included the jar into Solr lib folder a

Re : Using SolrJ with Tika

2009-09-02 Thread Angel Ice
Hi Rajan. As mentioned in my message, I don't want tu use Curl to post documents and can't use an HTTP POST (the document has already been posted to my JEE webapp for other purposes). All I can use is just java. In fact, I'd like the user to post the document to my webapp with an HTML POST (it

Polish Stemmer

2009-09-02 Thread David Espinosa
Hi, I’m developing a multi language Solr index, where I have a single core for each one. I use SnowballPorterFilterFactory for German, French and Italian languages with excellent results. My problem appears when I try to create a Polish stemmed index. There isn’t a Snowball implementation for Po

Re: Using SolrJ with Tika

2009-09-02 Thread rajan chandi
Laurent, Check-out Solr 1.4. You can download the trunk and Build it on your box. The Solr 1.4 does this out-of-the-box. No configuration required. You can use HTTP POST to post the document using some Linux utility like Curl and the PDF/Word/RTF/PPT/XLS etc. will be indexed. We tested this las

Re: date field type problem

2009-09-02 Thread Peter Kiraly
Hi, the exception I received: SEVERE: org.apache.solr.common.SolrException: Error while creating field 'date_df{type=trickyDate,properties=indexed,stored,omitNorms,omitTf,multiValued,sortMissingLast}' from value 'c1991.' at org.apache.solr.schema.FieldType.createField(FieldType.java:19

Re: date field type problem

2009-09-02 Thread Grant Ingersoll
What's the exception? On Sep 2, 2009, at 3:00 AM, Peter Kiraly wrote: Hi Solr users, I have a lots of dates from a library catalog in not solr.DateField compatible format. I wrote a new definition inside the solrconfig.xml, which creates eg. 1991-01-01T00:00:01Z from the input '[c1991.]' stri

Re: SOLR vs SQL

2009-09-02 Thread Mauricio Scheffer
This article explains in-depth why calculating facets is not practical in pure SQL: http://www.kimbly.com/blog/000239.html Cheers, Mauricio On Wed, Sep 2, 2009 at 5:30 AM, gwk wrote: > Fuad Efendi wrote: > >> "No results found for 'surface area 377', displaying all properties." >> - why do we n

Using SolrJ with Tika

2009-09-02 Thread Angel Ice
Hi everybody. I hope it's the right place for questions, if not sorry. I'm trying to index rich documents (PDF, MS docs etc) in SolR/Lucene. I have seen a few examples explaining how to use tika to solve this. But most of these examples are using curl to send documents to Solr or an HTML POST wi

Re: A very complex search problem.

2009-09-02 Thread rajan chandi
Thank you Birger for the pointer to HBase. HBase sounds interesting. We will consider this for - "people you may know". We are trying to address a different problem of searching from a well defined list of contacts. A huge ORed query sounds good at this point as a solution. Thanks and regards R

Re: A very complex search problem.

2009-09-02 Thread rajan chandi
Hi Gwk, Thanks for the pointers. The only concern will be the relevance. Lucene has the best relevance capability so far. CouchDB sounds to be interesting though. May be We'll try to find some bench-marks on relevance score of CouchDB. Thanks and Regards Rajan Chandi On Wed, Sep 2, 2009 at 4:04

RE: A very complex search problem.

2009-09-02 Thread Lie, Birger
HI, I might be unclear in what I mean. Usually people have friends in common, so if you 1) create and store a relationship between user x and y, and give that an id. 2) x knows z than there is a probability that y might know z as well. If that is the case than add z to the relation and you d

Re: A very complex search problem.

2009-09-02 Thread gwk
Hello Rajan, I might be mistaken, but isn't CouchDB or a similar map/reduce database ideal for situations like this? Regards, gwk rajan chandi wrote: Hi All, We are dealing with a very complex problem of person specific search. We're building a social network where people will post stuff

Re: A very complex search problem.

2009-09-02 Thread rajan chandi
Gerald and Birger, Thank your for your quick responses. In our situation, Users will tend to upload more than finding new friends. We are currently considering doing the ORing or the contacts on the fly as part of the search query. Please correct me if I am wrong but here is what I understand fr

RE: A very complex search problem.

2009-09-02 Thread Lie, Birger
Hi, If you store all mutual relations in a database, a lot of the relations will overlap. This is easily done using distinct clauses in sql. Use the overlapped values as tags on documents. That way you gain tremendous performance in search time, Obviously updating documents are a performance los

date field type problem

2009-09-02 Thread Peter Kiraly
Hi Solr users, I have a lots of dates from a library catalog in not solr.DateField compatible format. I wrote a new definition inside the solrconfig.xml, which creates eg. 1991-01-01T00:00:01Z from the input '[c1991.]' string. It works fine when I tried it with the typical values in the http://l

Re: A very complex search problem.

2009-09-02 Thread Gérard Dupont
Hi, The big OR query should be the easiest way and it may work up to ~1000 users (ie you can specific by default 1024 boolean clause so up to N users in the OR where N = 1024 - (boolean clause in your query)). You can increase this limit of boolean clauses in the configuration but I guess too much

A very complex search problem.

2009-09-02 Thread rajan chandi
Hi All, We are dealing with a very complex problem of person specific search. We're building a social network where people will post stuff and other users should be able to see the content only from their contacts. e.g. There are 10,000 users in the system and there are only 150 users in my netw

Re: Why dismax isn't the default with 1.4 and why it doesn't support fuzzy search ?

2009-09-02 Thread Erwin
On Wed, Sep 2, 2009 at 12:44 AM, Chris Hostetter wrote: > : The wiki says "As of Solr 1.3, the DisMaxRequestHandler is simply the > : standard request handler with the default query parser set to the > : DisMax Query Parser (defType=dismax).". I just made a checkout of svn > : and dismax doesn't se

Logging solr requests

2009-09-02 Thread Licinio Fernández Maurelo
Hi there, i need to log solr requests on the fly , filter, transform them and finally put them into an index. Any advice on best way to implement such this behaviour? Key points: - I think that the use of log files is discouraged, but i don't know if i can modify solr settings to log to a serv

Re: SOLR vs SQL

2009-09-02 Thread gwk
Fuad Efendi wrote: "No results found for 'surface area 377', displaying all properties." - why do we need SOLR then... Hi Fuad, The search box is only used for geographical search, i.e. country/region/city searches. The watermark on the homepage indicates this but the "search again" box

Re: Date Faceting and Double Counting

2009-09-02 Thread gwk
Chris Hostetter wrote: : When I added numerical faceting to my checkout of solr (solr-1240) I basically : copied date faceting and modified it to work with numbers instead of dates. : With numbers I got a lot of doulbe-counted values as well. So to fix my : problem I added an extra parameter to n