Re: NRT or similar for Solr 3.5?

2011-12-12 Thread vikram kamath
@Steven .. try some alternate email address(besides google/yahoo) and check your spam [image: twitter] [image: facebook][image: google-buzz] [image: linkedin] R

Generic RemoveDuplicatesTokenFilter

2011-12-12 Thread pravesh
Hi All, Currently, the SOLR's existing http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.RemoveDuplicatesTokenFilterFactory RemoveDuplicatesTokenFilter filters the duplicate tokens with the same text and logical at the same position. In my case, if the same term appears duplicat

RE: Trim and copy a solr field

2011-12-12 Thread Swapna Vuppala
Hi Juan, Thanks for the reply. I tried using this, but I don't see any effect of the analyzer/filter. I tried copying my Solr field to another field of the type defined below. Then I indexed couple of documents with the new schema, but I see that both fields have got the same value. Am looking

Re: Reducing heap space consumption for large dictionaries?

2011-12-12 Thread Chris Male
Hi, Its good to hear some feedback on using the Hunspell dictionaries. Lucene's support is pretty new so we're obviously looking to improve it. Could you open a JIRA issue so we can explore whether there is some ways to reduce memory consumption? On Tue, Dec 13, 2011 at 5:37 PM, Maciej Lisiewsk

Re: Reducing heap space consumption for large dictionaries?

2011-12-12 Thread Maciej Lisiewski
Hi, in my index schema I has defined a DictionaryCompoundWordTokenFilterFactory and a HunspellStemFilterFactory. Each FilterFactory has a dictionary with about 100k entries. To avoid an out of memory error I have to set the heap space to 128m for 1 index. Is there a way to reduce the memory con

Re: sub query parsing bug???

2011-12-12 Thread Erick Erickson
Well, your query below becomes ref_expertise:(nonlinear OR soliton) AND default_search:"optical lattice:" The regular Solr/Lucene query should handle pretty much anything you can throw at it. But do be aware that Solr/Lucene syntax is not true boolean logic, you have to think in terms of SHOULD, M

Re: server down caused by complex query

2011-12-12 Thread Jason
Hellow, Hoss We're using ComplexPhraseQueryParser and maxBooleanClauses setting is 100. I know maxBooleanClauses is so big. But we are expert search organization and queries are very complex and include wildcard. So we need it. Our application receives type of queries like ((A* OR B* OR C*,...

highlighting questions

2011-12-12 Thread Bent Jensen
I am trying to figure out how to display search query fields highlighted in html. I can enable the highlighting in the query, and I think I get the correct response back (See below: I search using 'Contents' and the highlighting is shown with and . However, I can't figure out what to add to

Solr-3.5.0/Nutch-1.4 - SolrDeleteDuplicates fails

2011-12-12 Thread Patrick Durusau
Greetings! On the Nutch Tutorial: I can run the following commands with Solr-3.5.0/Nutch-1.4: bin/nutch crawl urls -dir crawl -depth 3 -topN 5 then: bin/nutch solrindex http://127.0.0.1:8983/solr/ crawl/crawldb -linkdb crawl/linkdb crawl/segments/* successfully. But, if I run: bin/nutc

FTP mount crash when crawling with solrj

2011-12-12 Thread hadi
I have a lots of files in my FTP account,and i use the curlftpfs to mount them to folder and then start index them with solrj api, but after a minutes pass something strange happen and the mounted folder is not accessible and crash,also i can not unmount it and the message "device is in use" appear

Re: server down caused by complex query

2011-12-12 Thread Chris Hostetter
: Because our user send very long and complex queries with asterisk and near : operator. : Sometimes near operator exceeds 1,000 and keywords almost include asterisk. : If such query is sent to server, jvm memory is full. (our jvm memory "near" operator isn't something I know of as a built in fea

Re: Images for the DataImportHandler page

2011-12-12 Thread Chris Hostetter
: There is some very useful information on the : http://wiki.apache.org/solr/DataImportHandler page about indexing : database contents, but the page contains three images whose links are : broken. The descriptions of those images sound like it would be quite : handy to see them in the page. Co

Re: MySQL data import

2011-12-12 Thread Erick Erickson
Here's a quick demo I wrote at one point. I haven't run it in a while, but you should be able to get the idea. package jdbc; import org.apache.solr.client.solrj.SolrServerException; import org.apache.solr.client.solrj.impl.StreamingUpdateSolrServer; import org.apache.solr.client.solrj.impl.XMLR

RE: Removing whitespace

2011-12-12 Thread Devon Baumgarten
Thanks Alireza, Steven and Koji for the quick responses! I'll read up on those and give it a shot. Devon Baumgarten

RE: Removing whitespace

2011-12-12 Thread Devon Baumgarten
Thanks Alireza, Steven and Koji for the quick responses! I'll read up on those and give it a shot. Devon Baumgarten -Original Message- From: Alireza Salimi [mailto:alireza.sal...@gmail.com] Sent: Monday, December 12, 2011 4:08 PM To: solr-user@lucene.apache.org Subject: Re: Removing whi

Re: Removing whitespace

2011-12-12 Thread Koji Sekiguchi
(11/12/13 6:51), Devon Baumgarten wrote: Hello, I am having trouble finding how to remove/ignore whitespace when indexing. The only answer I have found suggested that it is necessary to write my own tokenizer. Is this true? I want to remove whitespace and special characters from the phrase an

RE: Removing whitespace

2011-12-12 Thread Steven A Rowe
Hi Devon, Something like this should work for you (untested!): Steve > -Original Message- > From: Devon Baumgarten [mailto:dbaumgar...@nationalcorp.com] > Sent: Monday, December 12, 2011 4:52 PM > To: 'solr-user@lucene.apache.org' > Subject: Removing whitespace > > Hel

Re: Removing whitespace

2011-12-12 Thread Alireza Salimi
That sounds strange requirement, but I think you can use CharFilters instead of implementing your own Tokenizer. Take a look at this section, maybe it helps. http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#CharFilterFactories The On Mon, Dec 12, 2011 at 4:51 PM, Devon Baumgarten < d

Removing whitespace

2011-12-12 Thread Devon Baumgarten
Hello, I am having trouble finding how to remove/ignore whitespace when indexing. The only answer I have found suggested that it is necessary to write my own tokenizer. Is this true? I want to remove whitespace and special characters from the phrase and create N-grams from the result. Ultimate

Re: NRT or similar for Solr 3.5?

2011-12-12 Thread Steven Ou
Yeah, running Chrome on OSX and doesn't do anything. Just switched to Firefox and it works. *But*, also don't seem to be receiving confirmation email. -- Steven Ou | 歐偉凡 *ravn.com* | Chief Technology Officer steve...@gmail.com | +1 909-569-9880 2011/12/12 vikram kamath > The Onclick handler d

Re: sub query parsing bug???

2011-12-12 Thread Steve Fuchs
Thanks for the reply! I do believe I have set (or have tried setting) all of those options for the default query and none of them seem to help. Anytime an OR appears inside the query the default for that query becomes OR. At least thats the anecdotal evidence I've encountered. Also in this case

Re: Possible to configure the fq caching settings on the server?

2011-12-12 Thread Chris Hostetter
: Is it possible to configure solr such that the filter query cache : settings is set to fq={!cache=false} by default? well, you could always disable the filterCache -- but i get the impression you want *most* "fq" filters to not be cached, but sometimes you'll specify some thta you *do* want

Re: MySQL data import

2011-12-12 Thread Brian Lamb
Thanks all. Erick, is there documentation on doing things with SolrJ and a JDBC connection? On Mon, Dec 12, 2011 at 1:34 PM, Erick Erickson wrote: > You might want to consider just doing the whole > thing in SolrJ with a JDBC connection. When things > get complex, it's sometimes more straightforw

Re: Facet on same date field multiple times

2011-12-12 Thread Chris Hostetter
: Eventually the goal is to do different ranges on the same field. Month by : day. Day by hour. Year by week. Something to that effect. But I thought : I'd start simple to see if I could get the syntax right and what I have : above doesn't seem to work. ... : So it doesn't seem intere

Facet on same date field multiple times

2011-12-12 Thread dbashford
I've Googled around a bit and seen this referenced a few times, but cannot seem to get it to work I have a query that looks like this: facet=true &facet.date={!key=foo}date &f.foo.facet.date.start=2010-12-12T00:00:00Z &f.foo.facet.date.end=2011-12-12T00:00:00Z &f.foo.facet.date.gap=%2B1DAY Event

Re: SmartChineseAnalyzer

2011-12-12 Thread Chris Hostetter
: Subject: SmartChineseAnalyzer : References: : : : : In-Reply-To: : http://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please do not reply to an existing message, instead start a fresh email. Even i

Re: performance of json vs xml?

2011-12-12 Thread Mark Miller
On Sun, Dec 11, 2011 at 3:16 PM, Jason Toy wrote: > I'm thinking about modifying my index process to use json because all my > docs are originally in json anyway . Are there any performance issues if I > insert json docs instead of xml docs? A colleague recommended to me to > stay with xml becau

Re: Solr Load Testing

2011-12-12 Thread Otis Gospodnetic
Hi, 1000 *concurrent* *queries* is a lot.  If your index is small relatively to hw specs, sure.  If not, then tuning may be needed, including maybe Tomcat and JVM level tuning.  The error below is from Tomcat, not really tied to Solr... Otis Sematext :: http://sematext.com/ :: Solr - Lucen

Re: MySQL data import

2011-12-12 Thread Erick Erickson
You might want to consider just doing the whole thing in SolrJ with a JDBC connection. When things get complex, it's sometimes more straightforward. Best Erick... P.S. Yes, it's pretty standard to have a single field be the destination for several copyField directives. On Mon, Dec 12, 2011 at 12

Re: Trim and copy a solr field

2011-12-12 Thread Juan Grande
Hi Swapna, You could try using a copyField to a field that uses PatternReplaceFilterFactory: The regular expression may not be exactly what you want, but it will give you an idea of how to do it. I'm pretty sure there must be some other ways of doing thi

Re: MySQL data import

2011-12-12 Thread Gora Mohanty
On Mon, Dec 12, 2011 at 2:24 AM, Brian Lamb wrote: > Hi all, > > I have a few questions about how the MySQL data import works. It seems it > creates a separate connection for each entity I create. Is there any way to > avoid this? Not sure, but I do not think that it is possible. However, from yo

Possible to configure the fq caching settings on the server?

2011-12-12 Thread Andrew Lundgren
Is it possible to configure solr such that the filter query cache settings is set to fq={!cache=false} by default? -- Andrew Lundgren lundg...@familysearch.org NOTICE: This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information.

URLDataSource delta import

2011-12-12 Thread Brian Lamb
Hi all, According to http://wiki.apache.org/solr/DataImportHandler#Usage_with_XML.2BAC8-HTTP_Datasource a delta-import is not "currently" implemented for URLDataSource. I say "currently" because I've noticed that such documentation is out of date in many places. I wanted to see if this feature had

Re: MySQL data import

2011-12-12 Thread Brian Lamb
Hi all, Any tips on this one? Thanks, Brian Lamb On Sun, Dec 11, 2011 at 3:54 PM, Brian Lamb wrote: > Hi all, > > I have a few questions about how the MySQL data import works. It seems it > creates a separate connection for each entity I create. Is there any way to > avoid this? > > By nature

Re: Virtual Memory very high

2011-12-12 Thread Yury Kats
On 12/11/2011 4:57 AM, Rohit wrote: > What are the difference in the different DirectoryFactory? http://lucene.apache.org/java/3_3_0/api/core/org/apache/lucene/store/MMapDirectory.html http://lucene.apache.org/java/3_3_0/api/core/org/apache/lucene/store/NIOFSDirectory.html

Solr Load Testing

2011-12-12 Thread Kissue Kissue
Hi, I ran some jmeter load testing on my solr instance version 3.5.0 running on tomcat 6.6.29 using 1000 concurrent users and the error below is thrown after a certain number of requests. My solr configuration is basically the default configuration at this time. Has anybody done soemthing similar?

Re: limiting the content of content field in search results

2011-12-12 Thread Juan Grande
Hi, It sounds like highlighting might be the solution for you. See http://wiki.apache.org/solr/HighlightingParameters *Juan* On Mon, Dec 12, 2011 at 4:42 AM, ayyappan wrote: > I am developing n application which indexes whole pdfs and other documents > to solr. I have completed a working ver

Re: RegexQuery performance

2011-12-12 Thread Jay Luker
On Sat, Dec 10, 2011 at 9:25 PM, Erick Erickson wrote: > My off-the-top-of-my-head notion is you implement a > Filter whose job is to emit some "special" tokens when > you find strings like this that allow you to search without > regexes. For instance, in the example you give, you could > index so

Re: cache monitoring tools?

2011-12-12 Thread Justin Caratzas
Dmitry, The only added stress that munin puts on each box is the 1 request per stat per 5 minutes to our admin stats handler. Given that we get 25 requests per second, this doesn't make much of a difference. We don't have a sharded index (yet) as our index is only 2-3 GB, but we do have slave s

manipulate the results coming back from SOLR? (was: possible to do arithmetic on returned values?)

2011-12-12 Thread Gabriel Cooper
I'm hoping I just got lost in the shuffle due to posting on a Friday night. Is there a way to change a field's data via some function, e.g. add, subtract, product, etc.? On 12/9/11 4:17 PM, Gabriel Cooper wrote: Is there a way to manipulate the results coming back from SOLR? I have a SOLR 3.

Re: Setting group.ngroups=true considerable slows down queries

2011-12-12 Thread Martijn v Groningen
I'd not make a subtaks onder SOLR-236 b/c it is related to a completely different implementation which was never committed. SOLR-2205 is related to general result grouping and think should be closed. I'd make a new issue for improving the performance of group.ngroups=true when there are a lot of un

Re: Setting group.ngroups=true considerable slows down queries

2011-12-12 Thread Michael Jakl
Hi! On Mon, Dec 12, 2011 at 13:57, Martijn v Groningen wrote: > As as I know currently there isn't another way. Unfortunately the > performance degrades badly when having a lot of unique groups. > I think an issue should be opened to investigate how we can improve this... > > Question: Does Solr

Re: performance of json vs xml?

2011-12-12 Thread Erick Erickson
How are you getting your documents into Solr? Because if you're using SolrJ it's a moot point because a binary format is used. I haven't done any specific comparisons, but I'd be surprised if JSON took longer. And removing a whole operation from your update chain that had to be kept fed and water

ExtractingRequestHandler and HTML

2011-12-12 Thread Michael Kelleher
I am submitting HTML document to Solr using the ERH. Is it possible to store the contents of the document (including all markup) into a field? Using fmap.content (I am assuming this comes from Tika) stores the extracted text of the document in a field, but not the markup. I want the whole un

Re: Setting group.ngroups=true considerable slows down queries

2011-12-12 Thread Martijn v Groningen
Hi! As as I know currently there isn't another way. Unfortunately the performance degrades badly when having a lot of unique groups. I think an issue should be opened to investigate how we can improve this... Question: Does Solr have a decent chuck of heap space (-Xmx)? Because grouping requires

Re: InvalidTokenOffsetsException in conjunction with highlighting and ICU folding and edgeNgrams

2011-12-12 Thread Max
Robert, thank you for creating the issue in JIRA. However, I need ngrams on that field – is there an alternative to the EdgeNGramFilterFactory ? Thanks! On Mon, Dec 12, 2011 at 1:25 PM, Robert Muir wrote: > On Mon, Dec 12, 2011 at 5:18 AM, Max wrote: > >> It seems like there is some weird stuf

Re: InvalidTokenOffsetsException in conjunction with highlighting and ICU folding and edgeNgrams

2011-12-12 Thread Robert Muir
On Mon, Dec 12, 2011 at 5:18 AM, Max wrote: > It seems like there is some weird stuff going on when folding the > string, it can be seen in the analysis view, too: > > http://i.imgur.com/6B2Uh.png > I created a bug here, https://issues.apache.org/jira/browse/LUCENE-3642 Thanks for the screensho

Re: InvalidTokenOffsetsException in conjunction with highlighting and ICU folding and edgeNgrams

2011-12-12 Thread Robert Muir
On Mon, Dec 12, 2011 at 5:18 AM, Max wrote: > The end offset remains 11 even after folding and transforming "æ" to > "ae", which seems wrong to me. End offsets refer to the *original text* so this is correct. What is wrong, is EdgeNGramsFilter. See how it turns that 11 to a 12? > > I also stum

Ask about the question of solr cache

2011-12-12 Thread JiaoyanChen
When I have delete or add data by application through solrj, or have import index through command nutch solrindex, the cache of solr are not changed if I do not restart solr. Could anyone tell me how could I update solr cache without restarting using shell command? When I recreate the index by nutc

InvalidTokenOffsetsException in conjunction with highlighting and ICU folding and edgeNgrams

2011-12-12 Thread Max
Hi there, when highlighting a field with this definition:

Re: cache monitoring tools?

2011-12-12 Thread Dmitry Kan
Justin, in terms of the overhead, have you noticed if Munin puts much of it when used in production? In terms of the solr farm: how big is a shard's index (given you have sharded architecture). Dmitry On Sun, Dec 11, 2011 at 6:39 PM, Justin Caratzas wrote: > At my work, we use Munin and Nagio fo

Re: Solr 3.4 problem with words separated by coma without space

2011-12-12 Thread elisabeth benoit
Thanks for the answer. yes in fact when I look at debugQuery output, I notice that name and number are never treated as single entries. I have (((text:name text:number)) (text:ru) (text:tain) (text:paris))) so name and number are in same parenthesis, but not exactlly treated as a phrase, as far

limiting the content of content field in search results

2011-12-12 Thread ayyappan
I am developing n application which indexes whole pdfs and other documents to solr. I have completed a working version of my application. But there are some problems. The main one is that when I do a search the indexed whole document is shown. I have used solrj and need some help to reduce this con

Re: cache monitoring tools?

2011-12-12 Thread Dmitry Kan
Paul, have you checked solrmeter and zabbix? Dmitry On Fri, Dec 9, 2011 at 11:16 PM, Paul Libbrecht wrote: > Allow me to chim in and ask a generic question about monitoring tools for > people close to developers: are any of the tools mentioned in this thread > actually able to show graphs of lo

Re: cache monitoring tools?

2011-12-12 Thread Dmitry Kan
Hoss, I can't see why Network IO is the issue as the shards and the front end SOLR resided on the same server. I said "resided", because I got rid of the front end (which according to my measurements, was taking at least as much time for merging as it took to find the actual data in the shards) and

Re: NRT or similar for Solr 3.5?

2011-12-12 Thread vikram kamath
The Onclick handler does not seem to be called on google chrome (Ubuntu ). Also , I dont seem to receive the email with the confirmation link on registering (I have checked my spam) Regards Vikram Kamath 2011/12/12 Nagendra Nagarajayya > Steven: > > There is an onclick handler that allows