Re: Full-search index for the database

2011-09-11 Thread Jamie Johnson
You should create separate fields in your solr schema for each field in your database that you want recognized separately. You can use a query parser like edismax to do a weighted query across all of your fields and then provide highlighting on the specific field which matched. 2011/9/10 Eugeny B

Re: indexing data from rich documents - Tika with solr3.1

2011-09-11 Thread scorpking
oh, it is good for me. Thank Erik Hatcher-4 very much. I have done to index from https. -- View this message in context: http://lucene.472066.n3.nabble.com/indexing-data-from-rich-documents-Tika-with-solr3-1-tp3322555p3326971.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Using multivalued field in map function

2011-09-11 Thread Erick Erickson
Hmmm, would it be simpler to do something like append a clause like this? BloggerId:12304^10 OR CoBloggerId:123404^5? Best Erick On Fri, Sep 9, 2011 at 2:14 AM, tkamphuis wrote: > Well, I'd like to do the following: > > I've got a website full of blogposts and every blogpost has an owner, this >

Re: NRT and commit behavior

2011-09-11 Thread Erick Erickson
Hmm, OK. You might want to look at the non-cached filter query stuff, it's quite recent. The point here is that it is a filter that is applied only after all of the less expensive filter queries are run, One of its uses is exactly ACL calculations. Rather than calculate the ACL for the entire doc s

Solr messing up the UK GBP (pound) symbol in response, even though Java environment variabe has file encoding is set to UTF 8....

2011-09-11 Thread Ravish Bhagdev
Any idea why solr is unable to return the pound sign as-is? I tried typing in £ 1 million in Solr admin GUI and got following response. 0 5 on 0 £ 1 million 10 2.2 Here is my Java Properties I got also from admin interface: java.runtime.name = Java(TM) SE Runtime Environment sun.boot.li

Re: Running solr on small amounts of RAM

2011-09-11 Thread Erick Erickson
Well, this answer isn't much more satisfactory than "get more memory", but about all I can say is "try it and see". Sure, make your caches very small and monitor memory and test it out. You'll get a sense of how fast (or slow) the queries are pretty quickly. Or you can get a ballpark estimate of

Re: solr equivalent of "select distinct"

2011-09-11 Thread Erick Erickson
This smells like an XY problem, can you back up and give a higher-level reason *why* you want this behavior? Because given your problem description, this seems like you are getting correct behavior no matter how you define the problem. You're essentially saying that you have two records with ident

Re: searching for terms containing embedded spaces

2011-09-11 Thread Erick Erickson
Try escaping it for a start. But why do you want to? If it's a phrase query, enclose it in double quotes. You really have to provide more details, because there are too many possibilities to answer. For instance: If you're entering field:a b then 'b' will be searched against your default text fie

Re: How to write this query?

2011-09-11 Thread Erick Erickson
So are you still having a problem, and if so what? Best Erick On Sat, Sep 10, 2011 at 5:48 AM, crisfromnova wrote: > Hi, > > key:value1^8 key:value2^4 key:value3^2 is correct. > > Sorry for bad query written. > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/How-to-wri

Re: Nested documents

2011-09-11 Thread Erick Erickson
Does this JIRA apply? https://issues.apache.org/jira/browse/LUCENE-3171 Best Erick On Sat, Sep 10, 2011 at 8:32 PM, Andy wrote: > Hi, > > Does Solr support nested documents? If not is there any plan to add such a > feature? > > Thanks.

Re: solr equivalent of "select distinct"

2011-09-11 Thread Mark juszczec
Erick Thanks very much for the reply. I typed this late Friday after work and tried to simplify the problem description. I got something wrong. Hopefully this restatement is better: My PK is FLD1, FLD2 and FLD3 concatenated together. In some cases FLD1 and FLD2 can be the same. The ONLY diff

Re: searching for terms containing embedded spaces

2011-09-11 Thread Mark juszczec
Erick My field contains "a b" (without ") We are trying to assemble the query as a String by appending the various values. I think that is a large part of the problem and our lives would be easier if we let the Solr api do this work. We've experimented with our "query assembler" producing fiel

Re: searching for terms containing embedded spaces

2011-09-11 Thread Yonik Seeley
On Sun, Sep 11, 2011 at 12:56 PM, Mark juszczec wrote: > We've also tried making it create > > field:a\ b > > The first case just does not work and I'm unsure why. > > The second case ends up url encoding the \ and I'm unsure if that will cause > it to be used in the query or not. URL encoding is

Re: searching for terms containing embedded spaces

2011-09-11 Thread Mark juszczec
> > But as Erick says, it's not clear that's really what you want (to > search on a single term with a space in it). If it's a normal text > field, each word will be indexed separately, so you really want a > phrase query or a boolean query: > > field:"a b" > or > field:(a b) > > I am looking for

Re: searching for terms containing embedded spaces

2011-09-11 Thread Yonik Seeley
On Sun, Sep 11, 2011 at 1:15 PM, Mark juszczec wrote: > I am looking for a text string with a single, embedded space.  For the > purposes of this example, it is "a b" and its stored in the index in a field > called field. > > Am I incorrect in assuming the query field:"a b" will match the the stri

Re: searching for terms containing embedded spaces

2011-09-11 Thread Mark juszczec
That's what I thought. The problem is, its not and I am unsure what is wrong. On Sun, Sep 11, 2011 at 1:35 PM, Yonik Seeley wrote: > On Sun, Sep 11, 2011 at 1:15 PM, Mark juszczec > wrote: > > I am looking for a text string with a single, embedded space. For the > > purposes of this example,

Re: searching for terms containing embedded spaces

2011-09-11 Thread Yonik Seeley
On Sun, Sep 11, 2011 at 1:39 PM, Mark juszczec wrote: > That's what I thought.  The problem is, its not and I am unsure what is > wrong. What is the fieldType definition for that field? Did you change it without re-indexing? -Yonik http://www.lucene-eurocon.com - The Lucene/Solr User Conference

Re: searching for terms containing embedded spaces

2011-09-11 Thread Mark juszczec
The field's properties are: field name="CUSTOMER_TYPE_NM" type="string" indexed="true" stored="true" required="true" default="CUSTOMER_TYPE_NM_MISSING" There have been no changes since I last completely rebuilt the index. Is re-indexing done when an index is completely rebuilt with a a dataimpor

Re: solr equivalent of "select distinct"

2011-09-11 Thread Erick Erickson
Hmmm, there's no good way I can think of off the top of my head to do this. Whenever people find themselves thinking in terms of RDBMSs, I have to ask whether the problem is really appropriate for a search engine. And/or what the problem you're trying to solve with this approach is from a higher le

Re: searching for terms containing embedded spaces

2011-09-11 Thread Erick Erickson
OK, there are several issues here: q= *:* AND CUSTOMER_TYPE_NM:Network Advertiser AND ACTIVE_IND:1&defType=edismax&rows=500&sort=ACCOUNT_CUSTOMER_ID asc&start=0 the *:* is doing you no good, I'd just remove it. defType=edismax probably isn't doing what you expect, you're not specifying any field

Re: solr equivalent of "select distinct"

2011-09-11 Thread Michael Sokolov
You can get what you want - unique lists of values from docs matching your query - for a single field (using facets), but not for the co-occurrence of two field values. So you could combine the two fields together, if you know what they are going to be "in advance." Facets also give you count

Re: SolrCloud Feedback

2011-09-11 Thread Mark Miller
On Sep 9, 2011, at 1:09 PM, Pulkit Singhal wrote: > I think I understand it a bit better now but wouldn't mind some validation. > > 1) solr.xml does not become part of ZooKeeper Right - currently it does not. Info is put there to tell Solr how to connect to zookeeper and register the cores. >

Solr and DateTimes - bug?

2011-09-11 Thread Nicklas Overgaard
Hi everybody, I just started playing around with solr, however i'm facing some trouble. The test data i'm indexing with solr is, amongst other things, containing date and times. By the way, I'm using mono and i'm talking to solr through the SolrNet library. The issue i'm facing: Some of t

Re: Example Solr Config on EC2

2011-09-11 Thread Pulkit Singhal
Just to clarify, that link doesn't do anything to promote an already running slave into a master. One would have to bounce the Solr node which has that slave and then make the shift. It is not something that happens at runtime live. On Wed, Aug 10, 2011 at 4:04 PM, Akshay wrote: > Yes you can p

Re: Stemming and other tokenizers

2011-09-11 Thread Jan Høydahl
Hi, You'll not be able to detect language and change stemmer on the same field in one go. You need to create one fieldType in your schema per language you want to use, and then use LanguageIdentification (SOLR-1979) to do the magic of detecting language and renaming the field. If you set langid

Re: Running solr on small amounts of RAM

2011-09-11 Thread Jan Høydahl
Hi, Beware that Solr4.0 branch has multiple RAM conserving optimizations which may cause your index to take considerably less space, so try it out. Also, of course, prune your schema to turn off everything you don't need, and also your OS to stop services you don't use. Consider disallowing cert

Re: Full-search index for the database

2011-09-11 Thread Eugeny Balakhonov
Hello, Thanks for answer! I have created separate fields in mysolr schema for each field in database (more than 500!). How to ask parser for search via all these fields? By default Solr schema should contain explicit declaration of default search field like following: TEXT I tried to use follow

Re: Solr and DateTimes - bug?

2011-09-11 Thread Jan Høydahl
Hi, Can you try to make a plain HTTP query from the admin GUI on your index and tell us what the XML response is for that date field? http://localhost:8983/solr/select?q=*:* If that date output is wrong as well, there may be a bug with Solr. If it is correct, you have a problem in SolrNet. Btw,

Re: Nested documents

2011-09-11 Thread Michael McCandless
Even if it applies, this is for Lucene. I don't think we've added Solr support for this yet... we should! Mike McCandless http://blog.mikemccandless.com On Sun, Sep 11, 2011 at 12:16 PM, Erick Erickson wrote: > Does this JIRA apply? > > https://issues.apache.org/jira/browse/LUCENE-3171 > > Bes

Re: Solr and DateTimes - bug?

2011-09-11 Thread Nicklas Overgaard
Hi, The XML output when performing a query via the solr interface is like this: 1-01-01T00:00:00Z It's solr 3.3.0 on an ArchLinux desktop machine with "OpenJDK 6.b22_1.10.3-1" as my java runtime environment. /Nicklas On 2011-09-12 00:26, Jan Høydahl wrote: Hi, Can you try to make a plain H

Re: Full-search index for the database

2011-09-11 Thread Eugeny Balakhonov
My task is very simple: I have a big database with a lot tables and fields. This database has dynamic structure and can be extended or changed in any time. I need a tool for full-search possibility via all fields in all tables of my database. On the input of this tool - some text for search. On th

Re: Full-search index for the database

2011-09-11 Thread Erick Erickson
How much search-specific stuff are we talking here? Do you want to do stemming? Plurals? Or are you talking exact match? Phrases? multi-word queries? If exact match on individual terms is all you want, you could hack something together like this: index each term into a catch-all field with the fie

select query does not find indexed pdf document

2011-09-11 Thread Michael Dockery
I am new to solr.   I tried to upload a pdf file via curl to my solr webapp (on tomcat) curl "http://www/SearchApp/update/extract?stream.file=c:\dmvpn.pdf&stream.contentType=application/pdf&literal.id=pdf&commit=true"; 0860 but http://www/SearchApp/select/?q=vpn does not find the docu

Will Solr/Lucene crawl multi websites (aka a mini google with faceted search)?

2011-09-11 Thread dpt9876
Hi all, I am wondering if Solr will do the following for a project I am working on. I want to create a search engine with facets for potentially hundreds of websites. Similar to say crawling amazon + buy.com + ebay and someone can search these 3 sites from my 1 website. (I realise there are better

Re: Will Solr/Lucene crawl multi websites (aka a mini google with faceted search)?

2011-09-11 Thread Erick Erickson
Nope, there's nothing in Solr that crawls anything, you have to feed documents in yourself from the websites. Or, look at the Nutch project, see: http://nutch.apache.org/about.html which is designed for this kind of problem. Best Erick On Sun, Sep 11, 2011 at 8:53 PM, dpt9876 wrote: > Hi all,

Re: Will Solr/Lucene crawl multi websites (aka a mini google with faceted search)?

2011-09-11 Thread dpt9876
Hi thanks for the reply. How does nutch/solr handle the scenario where 1 website calls price, "price" and another website calls it "cost". Same thing different name, yet I would want the facet to handle that and not create a different facet. Is this combo of nutch and Solr that intelligent and or

Re: Will Solr/Lucene crawl multi websites (aka a mini google with faceted search)?

2011-09-11 Thread Ken Krugler
On Sep 11, 2011, at 7:04pm, dpt9876 wrote: > Hi thanks for the reply. > > How does nutch/solr handle the scenario where 1 website calls price, "price" > and another website calls it "cost". Same thing different name, yet I would > want the facet to handle that and not create a different facet. >

Parameter not working for master/slave

2011-09-11 Thread William Bell
I am using 3.3 SOLR. I tried passing in -Denable.master=true and -Denable.slave=true on the Slave machine. Then I changed solrconfig.xml to reference each as per: http://wiki.apache.org/solr/SolrReplication#enable.2BAC8-disable_master.2BAC8-slave_in_a_node But this is not working. The enable para

Re: Solr and DateTimes - bug?

2011-09-11 Thread Chris Hostetter
: The XML output when performing a query via the solr interface is like this: : 1-01-01T00:00:00Z i think you mean: 1-01-01T00:00:00Z : > > So my question is: Is this a bug in the solr output engine, or should mono : > > be able to parse the date as given from solr? I have not yet tried it out :

Re: Using multivalued field in map function

2011-09-11 Thread Chris Hostetter
: Hmmm, would it be simpler to do something like append : a clause like this? : BloggerId:12304^10 OR CoBloggerId:123404^5? Definitely, but that won't garuntee you a strict ordering if there is a particularly good relevany match. There's a bunch of ways to go about something like this, but tryi

Re: Adding Query Filter custom implementation to Solr's pipeline

2011-09-11 Thread Chris Hostetter
: When I was using Lucene directly I used a custom implementation of query : filter to enforce entitlements of search results. Now, that I'm : switching my infrastructure from custom host to Solr, what is the best : way to configure Solr to use my custom query filter for every request? It depe

Re: Stemming and other tokenizers

2011-09-11 Thread Patrick Sauts
I can't create one field per language, that is the problem but I'll dig into it following your indications. I let you know what I could come out with. Patrick. 2011/9/11 Jan Høydahl > Hi, > > You'll not be able to detect language and change stemmer on the same field > in one go. You need to cre