Re: Displaying highlights in formatted HTML document

2011-06-08 Thread Ahmet Arslan
--- On Thu, 6/9/11, Bryan Loofbourrow wrote: > From: Bryan Loofbourrow > Subject: Displaying highlights in formatted HTML document > To: solr-user@lucene.apache.org > Date: Thursday, June 9, 2011, 2:14 AM > Here is my use case: > > > > I have a large number of HTML documents, sizes in the >

Code for getting distinct facet counts across shards(Distributed Process).

2011-06-08 Thread rajini maski
In solr 1.4.1, for getting "distinct facet terms count" across shards, The piece of code added for getting count of distinct facet terms across distributed process is as followed: Class: facetcomponent.java Function: -- finishStage(ResponseBuilder rb) for (DistribFieldFacet dff : fi.

Re: Query regarding Solr-2242 patch for getting distinct facet counts.

2011-06-08 Thread rajini maski
In solr 1.4.1, for getting "distinct facet terms count" across shards, The piece of code added for getting count of distinct facet terms across distributed process is as followed: Class: facetcomponent.java Function: -- finishStage(ResponseBuilder rb) for (DistribFieldFacet dff : fi.

Re: Tokenising based on known words?

2011-06-08 Thread Gora Mohanty
On Thu, Jun 9, 2011 at 4:37 AM, Mark Mandel wrote: > Not sure if this possible, but figured I would ask the question. > > Basically, we have some users who do some pretty rediculous things ;o) > > Rather than writing "red jacket", they write "redjacket", which obviously > returns no results. [...]

Multiple Values not getting Indexed

2011-06-08 Thread Pawan Darira
Hi I am trying to index 2 fields with multiple values. BUT, it is only putting 1 value for each & ignoring rest of the values after comma(,). I am fetching query through DIH. It works fine if i have only 1 value each of the 2 fields E.g. Field1 - 150,178,461,151,310,306,305,179,137,162 & Field2 -

Re: tika integration exception and other related queries

2011-06-08 Thread Naveen Gupta
Hi Gary It started working .. though i did not test for Zip files, but for rar files, it is working fine .. only thing what i wanted to do is to index the metadata (text mapped to content) not store the data Also in search result, i want to filter the stuffs ... and it started working fine .

Displaying highlights in formatted HTML document

2011-06-08 Thread Bryan Loofbourrow
Here is my use case: I have a large number of HTML documents, sizes in the 0.5K-50M range, most around, say, 10M. I want to be able to present the user with the formatted HTML document, with the hits tagged, so that he may iterate through them, and see them in the context of the document, wit

Re: FilterQuery and Ors

2011-06-08 Thread Erick Erickson
try fq=age:[1 TO 10] OR age:[10 TO 20] I'm pretty sure fq=age:([1 TO 10] OR [10 TO 20]) will work too. But you're right, multiple fq clauses are intersections, so specifying more than one fq clause on the SAME field results in what you're seeing. Best Erick On Wed, Jun 8, 2011 at 5:34 PM, Jam

Tokenising based on known words?

2011-06-08 Thread Mark Mandel
Not sure if this possible, but figured I would ask the question. Basically, we have some users who do some pretty rediculous things ;o) Rather than writing "red jacket", they write "redjacket", which obviously returns no results. Is there any way, with Solr, to go hunting for known words (maybe

FilterQuery and Ors

2011-06-08 Thread Jamie Johnson
I'm looking for a way to do a filter query and Ors. I've done a bit of googling and found an open jira but nothing indicating this is possible. I'm looking to do something like the search at http://www.lucidimagination.com/search/?q=test where you can do multi selects for the facets. I've read ab

RE: Does MultiTerm highlighting work with the fastVectorHighlighter?

2011-06-08 Thread Burton-West, Tom
Hi Erick, Thanks for asking, yes we have termVectors=true set: I guess I should also mention that highlighting works fine using the fastVectorHighLighter as long as we don't do a MultiTerm query. For example see the query and results appended below (using the same hl parameters listed in t

Re: Does MultiTerm highlighting work with the fastVectorHighlighter?

2011-06-08 Thread Erick Erickson
Just to check, does the field have termVectors="true" set? I think it's required for FVH to work. Best Erick On Wed, Jun 8, 2011 at 3:24 PM, Burton-West, Tom wrote: > We are trying to implement highlighting for wildcard (MultiTerm) queries.   > This seems to work find with the regular highlighter

Re: wildcard search

2011-06-08 Thread Ahmet Arslan
> > I don't use it myself  (but I will soon), so I > may be wrong, but did you try > > to use the ComplexPhraseQueryParser : > > > > ComplexPhraseQueryParser > >          QueryParser which > permits complex phrase query syntax eg "(john > > jon jonathan~) peters*". > > > > It seems that you coul

Does MultiTerm highlighting work with the fastVectorHighlighter?

2011-06-08 Thread Burton-West, Tom
We are trying to implement highlighting for wildcard (MultiTerm) queries. This seems to work find with the regular highlighter but when we try to use the fastVectorHighlighter we don't see any results in the highlighting section of the response. Appended below are the parameters we are using.

RE: huge shards (300GB each) and load balancing

2011-06-08 Thread Burton-West, Tom
Hi Dmitry, I am assuming you are splitting one very large index over multiple shards rather than replicating and index multiple times. Just for a point of comparison, I thought I would describe our experience with large shards. At HathiTrust, we run a 6 terabyte index over 12 shards. This is

Re: wildcard search

2011-06-08 Thread Thomas Fischer
Hi Ludovic, > I don't use it myself (but I will soon), so I may be wrong, but did you try > to use the ComplexPhraseQueryParser : > > ComplexPhraseQueryParser > QueryParser which permits complex phrase query syntax eg "(john > jon jonathan~) peters*". > > It seems that you could do su

Re: solr 3.1 java.lang.NoClassDEfFoundError org/carrot2/core/ControllerFactory

2011-06-08 Thread Stanislaw Osinski
Hi Bryan, You'll also need to make sure the your ${solr.dir}/contrib/clustering/lib directory is in the classpath; that directory contains the Carrot2 JARs that provide the classes you're missing. I think the example solrconfig.xml has the relevant declarations. Cheers, S. On Tue, Jun 7, 2011

Re: huge shards (300GB each) and load balancing

2011-06-08 Thread Dmitry Kan
Hi, Bill. Thanks, always nice to have options! Dmitry On Wed, Jun 8, 2011 at 4:47 PM, Bill Bell wrote: > Re Amazon elb. > > This is not exactly true. The ELB does load balancer internal IPs. But the > ELB IP address must be external. Still a major issue unless you use > authentication. Nginx

Re: Sorting on solr.TextField

2011-06-08 Thread Yonik Seeley
On Wed, Jun 8, 2011 at 1:21 PM, Jamie Johnson wrote: > Thanks exactly what I was looking for. > > With this new field used just for sorting is there a way to have it be case > insensitive? >From the example schema: -Yonik http://www.lucidimaginati

Re: Sorting on solr.TextField

2011-06-08 Thread Jamie Johnson
Thanks exactly what I was looking for. With this new field used just for sorting is there a way to have it be case insensitive? On Wed, Jun 8, 2011 at 12:50 PM, Ahmet Arslan wrote: > > Is there any documentation which > > details sorting behaviors on the different > > types of solr fields? My

Re: Sorting on solr.TextField

2011-06-08 Thread Ahmet Arslan
> Is there any documentation which > details sorting behaviors on the different > types of solr fields?  My question is specifically > about solr.TextField but > I'd just like to know in general at this point.  http://wiki.apache.org/solr/FAQ#Why_Isn.27t_Sorting_Working_on_my_Text_Fields.3F

Sorting on solr.TextField

2011-06-08 Thread Jamie Johnson
Is there any documentation which details sorting behaviors on the different types of solr fields? My question is specifically about solr.TextField but I'd just like to know in general at this point. Currently when executing a query and I say to sort on a text field I am getting results as follows

Re: KeywordTokenizerFactory and stopwords

2011-06-08 Thread Matt Mitchell
Hi Erik. Yes something like what you describe would do the trick. I did find this: http://lucene.472066.n3.nabble.com/Concatenate-multiple-tokens-into-one-td1879611.html I might try the pattern replace filter with stopwords, even though that feels kinda clunky. Matt On Wed, Jun 8, 2011 at 11:04

Re: solr index losing entries

2011-06-08 Thread Marius Hanganu
We have an API built in 2007 which at the lowest level submits requests with . We haven't changed anything to the API, and it worked well until the beginning of this year. Unique key is solr_id with this definition: The number of documents is determined using this HTTP request: http://server/app

Re: solr index losing entries

2011-06-08 Thread Tomás Fernández Löbbe
That's rare. How do you add documents to Solr? what do you have as primary key? How do you determine the number of documents in the index? The value of "maxDoc" of the stats page considers deleted documents too, which are eliminated at merging. On Wed, Jun 8, 2011 at 12:18 PM, Marius Hanganu wro

Re: wildcard search

2011-06-08 Thread lboutros
Hi Thomas, I don't use it myself (but I will soon), so I may be wrong, but did you try to use the ComplexPhraseQueryParser : ComplexPhraseQueryParser QueryParser which permits complex phrase query syntax eg "(john jon jonathan~) peters*". It seems that you could do such type of querie

solr index losing entries

2011-06-08 Thread Marius Hanganu
Hello, We've been using for 1.5 years now solr 1.4 for one of the indexes in our application with a special configuration with maxDocs=1 and maxTime=1. The number of documents is 10.000, with index size around 10MB. For a few monhts now, SOLR has this strange behavior. Our code did not change, ho

Re: KeywordTokenizerFactory and stopwords

2011-06-08 Thread Erik Hatcher
This seems like it deserves some kind of "collecting" TokenFilter(Factory) that will slurp up all incoming tokens and glue them together with a space (and allow separator to be configurable). Hmmm surprised one of those doesn't already exist. With something like that you could have a stan

KeywordTokenizerFactory and stopwords

2011-06-08 Thread Matt Mitchell
Hi, I have an "autocomplete" fieldType that works really well, but because the KeywordTokenizerFactory (if I understand correctly) is emitting a single token, the stopword filter will not detect any stopwords. Anyone know of a way to strip out stopwords when using KeywordTokenizerFactory? I did tr

Re: wildcard search

2011-06-08 Thread Erick Erickson
Hmmm, have you tried EdgeNGrams? This works for me (at the expense of a somewhat larger index, of course)... and a field of type "edge" named "thomasfield" Now searches like thomasfield:"GOK IA 3" (include quotes

Re: Problem with boosting function

2011-06-08 Thread Denis Kuzmenok
try: q=title:Unicamp&defType=dismax&bf=question_count^5.0 "title:Unicamp" in any search handler will search only in requested field > The queries I am trying to do are > q=title:Unicamp > and > q=title:Unicamp&bf=question_count^5.0 > The boosting factor (5.0) is just to verify if it was really

Re: Problem with boosting function

2011-06-08 Thread Alex Grilo
The queries I am trying to do are q=title:Unicamp and q=title:Unicamp&bf=question_count^5.0 The boosting factor (5.0) is just to verify if it was really used. Thanks Alex On Wed, Jun 8, 2011 at 10:25 AM, Denis Kuzmenok wrote: > Show your full request to solr (all params) > > > Hi, > > I'm t

Re: huge shards (300GB each) and load balancing

2011-06-08 Thread Bill Bell
Re Amazon elb. This is not exactly true. The ELB does load balancer internal IPs. But the ELB IP address must be external. Still a major issue unless you use authentication. Nginx and others can also do load balancing. Bill Bell Sent from mobile On Jun 8, 2011, at 3:32 AM, "Upayavira" wrot

Re: Problem with boosting function

2011-06-08 Thread Yonik Seeley
The boost qparser should do the trick if you want a multiplicative boost. http://lucene.apache.org/solr/api/org/apache/solr/search/BoostQParserPlugin.html -Yonik http://www.lucidimagination.com On Wed, Jun 8, 2011 at 9:22 AM, Alex Grilo wrote: > Hi, > I'm trying to use bf parameter in solr que

Re: Solr Cloud and Range Facets

2011-06-08 Thread Jamie Johnson
One last piece of informationregular range queries seem to work fine, it's only date ranges which seem to be intermittent. On Wed, Jun 8, 2011 at 9:03 AM, Jamie Johnson wrote: > Some more information > > I am currently doing the following: > > SolrQuery query = new SolrQuery(); >

Re: wildcard search

2011-06-08 Thread Thomas Fischer
Hi Erick, I have a multivalued field "GOK" (local classification scheme) with separate entries of the sort IA 300; IC 330; IA 317; IA 318, i.e. 1 to 3 capital characters, space, 3 digits. I want to be able to perform a truncated search on that field: either just the string before the space, or

Re: huge shards (300GB each) and load balancing

2011-06-08 Thread Dmitry Kan
Hi Upayavira, Thanks for sharing insights and experience on this. As we have 6 shards at the moment, it is pretty hard (=almost impossible) to keep them on a single box, so that's why we decided to shard. On the other hand, we have never tried multicore architecture, so that's a good point, thank

Re: Problem with boosting function

2011-06-08 Thread Denis Kuzmenok
Show your full request to solr (all params) > Hi, > I'm trying to use bf parameter in solr queries but I'm having some problems. > The context is: I have some topics and a integer weight of popularity > (number of users that follow the topic). I'd like to boost the documents > according to this w

Problem with boosting function

2011-06-08 Thread Alex Grilo
Hi, I'm trying to use bf parameter in solr queries but I'm having some problems. The context is: I have some topics and a integer weight of popularity (number of users that follow the topic). I'd like to boost the documents according to this weight field, and it changes (users may start following

Re: how to Index and Search non-Eglish Text in solr

2011-06-08 Thread Erick Erickson
This page is a handy reference for individual languages... http://wiki.apache.org/solr/LanguageAnalysis But the usual approach, especially for Chinese/Japanese/Korean (CJK) is to index the content in different fields with language-specific analyzers then spread your search across the language-spec

Re: Solr Cloud and Range Facets

2011-06-08 Thread Jamie Johnson
Some more information I am currently doing the following: SolrQuery query = new SolrQuery(); query.setQuery(test"); query.setParam("distrib", true); query.setFacet(true); query.setParam(FacetParams.FACET_RANGE, "dateTime"); query.setParam("f

Re: Getting a query on an "fl" parameter value ?

2011-06-08 Thread Erick Erickson
Hmmm, fl is the list of fields to return, it has nothing to do with what's searched. Are you looking for something like q=keyword AND id:(12 OR 45 OR 32)&version. Best Erick On Tue, Jun 7, 2011 at 11:14 AM, duddy67 wrote: > Hi all, > > I'd like to know if it's possible to get a query on an "

how to Index and Search non-Eglish Text in solr

2011-06-08 Thread Mohammad Shariq
Hi, I had setup solr( solr-1.4 on Ubuntu 10.10) for indexing news articles in English, but my requirement extend to index the news of other languages too. This is how my schema looks : And the "text" Field in schema.xml looks like :

Re: solr speed issues..

2011-06-08 Thread Mohammad Shariq
How frequently you Optimize your solrIndex ?? Optimization also helps in reducing search latency. -- View this message in context: http://lucene.472066.n3.nabble.com/solr-speed-issues-tp2254823p3038794.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Boosting result on query.

2011-06-08 Thread Koji Sekiguchi
(11/06/08 16:20), Denis Kuzmenok wrote: If you could move to 3.x and your "linked item" boosts could be calculated offline in batch periodically you could use an external file field to store the doc boost. a few If's though I have 3.2 and external file field doesn't work without solr resta

Re: Re: Can I update a specific field in solr?

2011-06-08 Thread Mohammad Shariq
Solr dont support partial updates. On 8 June 2011 16:04, ZiLi wrote: > > Thanks very much , I'll re-index a whole document : ) > > > > > 发件人: Chandan Tamrakar > 发送时间: 2011-06-08 18:25:37 > 收件人: solr-user > 抄送: > 主题: Re: Can I update a specific field in solr? > > I think You can do that but you

Re: Re: Can I update a specific field in solr?

2011-06-08 Thread ZiLi
Thanks very much , I'll re-index a whole document : ) 发件人: Chandan Tamrakar 发送时间: 2011-06-08 18:25:37 收件人: solr-user 抄送: 主题: Re: Can I update a specific field in solr? I think You can do that but you need to re-index a whole document again. note that there is nothing like "update" ,

Re: Can I update a specific field in solr?

2011-06-08 Thread Chandan Tamrakar
I think You can do that but you need to re-index a whole document again. note that there is nothing like "update" , its usually delete and then add. thanks On Wed, Jun 8, 2011 at 4:00 PM, ZiLi wrote: > Hi, I try to update a specific field in solr , but I didn't find anyway to > implement this

AW: How to deal with many files using solr external file field

2011-06-08 Thread Bohnsack, Sven
Hi, I could not provide a stack trace and IMHO it won't provide some useful information. But we've made a good progress in the analysis. We took a deeper look at what happened, when an "external-file-field"-Request is sent to SOLR: * SOLR looks if there is a file for the requested query, e.g.

Can I update a specific field in solr?

2011-06-08 Thread ZiLi
Hi, I try to update a specific field in solr , but I didn't find anyway to implement this . Anyone who knows how to ? Any suggestions will be appriciate : ) 2011-06-08 ZiLi

Re: Question about tokenizing, searching and retrieving results.

2011-06-08 Thread Luis Cappa Banda
Hello again! Thank you very much for answering. The problem was the defaultOperator, which was setted as AND. Damn, I was blind :-/ Thank you again.

Re: huge shards (300GB each) and load balancing

2011-06-08 Thread Upayavira
On Wed, 08 Jun 2011 10:42 +0300, "Dmitry Kan" wrote: > Hello list, > > Thanks for attending to my previous questions so far, have learnt a lot. > Here is another one, I hope it will be interesting to answer. > > > > We run our SOLR shards and front end SOLR on the Amazon high-end > machines.

Re: 400 MB Fields

2011-06-08 Thread Alexander Kanarsky
Otis, Not sure about the Solr, but with Lucene It was certainly doable. I saw fields way bigger than 400Mb indexed, sometimes having a large set of unique terms as well (think something like log file with lots of alphanumeric tokens, couple of gigs in size). While indexing and querying of such thi

Re: tika integration exception and other related queries

2011-06-08 Thread Gary Taylor
Naveen, For indexing Zip files with Tika, take a look at the following thread : http://lucene.472066.n3.nabble.com/Extracting-contents-of-zipped-files-with-Tika-and-Solr-1-4-1-td2327933.html I got it to work with the 3.1 source and a couple of patches. Hope this helps. Regards, Gary. On 08/

huge shards (300GB each) and load balancing

2011-06-08 Thread Dmitry Kan
Hello list, Thanks for attending to my previous questions so far, have learnt a lot. Here is another one, I hope it will be interesting to answer. We run our SOLR shards and front end SOLR on the Amazon high-end machines. Currently we have 6 shards with around 200GB in each. Currently we have o

Re: Getting query fields in a custom SearchHandler

2011-06-08 Thread Marc SCHNEIDER
Hi, I reply to myself :-) The solution is to use this utility class : org.apache.solr.search.QueryParsing. Then you can do: Query luceneQuery = QueryParsing.parseQuery(req.getParams().get("q"), req.getSchema()); Then with luceneQuery you can use the extractTerms method. Marc. On Fri, Jun 3, 20

Re: Boosting result on query.

2011-06-08 Thread Denis Kuzmenok
> If you could move to 3.x and your "linked item" boosts could be > calculated offline in batch periodically you could use an external > file field to store the doc boost. > a few If's though I have 3.2 and external file field doesn't work without solr restart (on multicore instance).

Re: Getting a query on an "fl" parameter value ?

2011-06-08 Thread lee carroll
try http://wiki.apache.org/solr/CommonQueryParameters#fq On 7 June 2011 16:14, duddy67 wrote: > Hi all, > > I'd like to know if it's possible to get a query on an "fl" value. > For now my url query looks like that: > > /solr/select/?q=keyword&version=2.2&start=0&rows=10&indent=on&fl=id+name+title

Re: Boosting result on query.

2011-06-08 Thread lee carroll
If you could move to 3.x and your "linked item" boosts could be calculated offline in batch periodically you could use an external file field to store the doc boost. a few If's though On 8 June 2011 03:23, Jeff Boul wrote: > Hi, > > I am trying to figure out options for the following problem.

Re: solr 3.1 java.lang.NoClassDEfFoundError org/carrot2/core/ControllerFactory

2011-06-08 Thread Stanislaw Osinski
Hi Bryan, You'll also need to make sure the your ${solr.home}/contrib/clustering/lib directory is in the classpath; that directory contains the Carrot2 JARs that provide the classes you're missing. I think the example solrconfig.xml has the relevant declarations. Cheers, S. On Tue, Jun 7, 2011

Getting a query on an "fl" parameter value ?

2011-06-08 Thread duddy67
Hi all, I'd like to know if it's possible to get a query on an "fl" value. For now my url query looks like that: /solr/select/?q=keyword&version=2.2&start=0&rows=10&indent=on&fl=id+name+title it works but I need request also on a "fl" parameter value. I'd like to add to my initial query a kind o