RE: Nodes cannot recover and become unavailable

2012-09-20 Thread Markus Jelsma
having troubles understanding the reason for that NPE. First you could try removing the line #102 in HttpClientUtility so that logging does not prevent creation of the http client in SyncStrategy. -- Sami Siren On Wed, Sep 19, 2012 at 5:29 PM, Markus Jelsma markus.jel

RE: Backup strategy for SolrCloud

2012-09-20 Thread Markus Jelsma
Hi, Why do you want to back up? With enough machines and a decent replication factor (3 or higher) there is usually little need to back it up. If you have the space it's better to launch a second cluster in another DC. You can also choose to increase the number of maxCommitsToKeep but it'll

RE: Backup strategy for SolrCloud

2012-09-20 Thread Markus Jelsma
If reindexing from raw XML files is feasible (less than 30 minutes) it would be the easiest option. The problem with recovering with old snapshots is that you have to remove bad indices from all cores and possible stale (or recoveries in progress) indices and replace it with your snapshot and

RE: Nodes cannot recover and become unavailable

2012-09-24 Thread Markus Jelsma
, Sep 19, 2012 at 5:29 PM, Markus Jelsma markus.jel...@openindex.io wrote: Hi, Since the 2012-09-17 11:10:41 build shards start to have trouble coming back online. When i restart one node the slices on the other nodes are throwing exceptions and cannot be queried. I'm not sure how

RE: Solr - Remove specific punctuation marks

2012-09-24 Thread Markus Jelsma
-Original message- From:Daisy omnia.za...@gmail.com Sent: Mon 24-Sep-2012 15:09 To: solr-user@lucene.apache.org Subject: RE: Solr - Remove specific punctuation marks Yes I am trying to index Arabic document. There is a problem that the regex couldn't be understood in the

RE: Indexing in Solr: invalid UTF-8

2012-09-25 Thread Markus Jelsma
Hi - you need to get rid of all non-character code points. http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[:Noncharacter_Code_Point=True:] -Original message- From:Patrick Oliver Glauner patrick.oliver.glau...@cern.ch Sent: Tue 25-Sep-2012 18:47 To: solr-user@lucene.apache.org

RE: How to run Solr Cloud using Tomcat?

2012-09-27 Thread Markus Jelsma
Hi - on Debian systems there's a /etc/default/tomcat properties file you can use to set your flags. -Original message- From:Benjamin, Roy rbenja...@ebay.com Sent: Thu 27-Sep-2012 19:57 To: solr-user@lucene.apache.org Subject: How to run Solr Cloud using Tomcat? I've gone through

RE: Indexed without position data - strange exception in eDisMax (Solr 4.0beta)

2012-10-01 Thread Markus Jelsma
-Original message- From:Alexandre Rafalovitch arafa...@gmail.com Sent: Mon 01-Oct-2012 17:58 To: solr-user@lucene.apache.org Subject: quot;Indexed without position dataquot; - strange exception in eDisMax (Solr 4.0beta) I am getting a very strange exception when I use edismax

RE: Zookeeper setup for solr cloud

2012-10-01 Thread Markus Jelsma
Hi Varun, Running many Zookeeper instances improves read time but has a negative impact on writing states to Zookeeper. Having a node only talk to the local Zookeeper instance limits availability, your Zookeeper daemon will die at some point and that will cut off your Solr node from the entire

RE: Problem with spellchecker

2012-10-02 Thread Markus Jelsma
The problem is your stray double quote: str name=queryAnalyzerFieldTypetext_general_fr/str I'd think this would throw an exception somewhere. -Original message- From:Jose Aguilar jagui...@searchtechnologies.com Sent: Tue 02-Oct-2012 01:40 To: solr-user@lucene.apache.org Subject:

RE: how to specify default sort fields in solr schema?

2012-10-03 Thread Markus Jelsma
No, not in the schema but in the solrconfig. Look for your request handler such as the default: requestHandler name=/select class=solr.SearchHandler you can add a default value for the sort parameter str name=sortfield desc/str -Original message- From:A Geek dw...@live.com

RE: Search in body

2012-10-09 Thread Markus Jelsma
Hi - You should stick to Nutch' schema.xml and not manually add a text or body field that aren't going to be populated anyway. Nutch sends data, by default, to the content field. -Original message- From:Rafał Kuć r@solr.pl Sent: Tue 09-Oct-2012 14:32 To:

RE: Wild card searching - well sort of

2012-10-10 Thread Markus Jelsma
Hi - The WordDelimiterFilter can help you get *-BAAN-* for A100-BAAN-C20 but only because BAAN is surrounded with characters the filter splits and combines upon. -Original message- From:Kissue Kissue kissue...@gmail.com Sent: Wed 10-Oct-2012 14:20 To: solr-user@lucene.apache.org

RE: SLOR And OpenNlp integration

2012-10-11 Thread Markus Jelsma
Hi - the wiki page will get you up and running quickly: http://wiki.apache.org/solr/OpenNLP -Original message- From:ahmed ahmed.missaoui...@gmail.com Sent: Thu 11-Oct-2012 13:32 To: solr-user@lucene.apache.org Subject: SLOR And OpenNlp integration Hello, I am a new user of

RE: Regional indexing/retrieval

2012-10-18 Thread Markus Jelsma
Hi - combining two stemmers in one filter chain will lead to unexpected results. It's best to define to different text_ fields even though you'd like to avoid setting this up. It's not very hard. You can even use the LandID update processor to sent spanish text to a spanish field and there

RE: Solr - Use Regex in the Query Phrase

2012-10-23 Thread Markus Jelsma
Hi - Regex is not available in Solr 3.6: https://issues.apache.org/jira/browse/LUCENE-2604 -Original message- From:Daisy omnia.za...@gmail.com Sent: Tue 23-Oct-2012 14:13 To: solr-user@lucene.apache.org Subject: Solr - Use Regex in the Query Phrase Hi; I am working with

Failure to open existing log file (non fatal)

2012-10-23 Thread Markus Jelsma
Hi, We're testing a 10 node cluster running trunk and write a few million documents to it from Hadoop. We just saw a node die for no apparent reason. Tomcat was completely dead before it was automatically restarted again. Indexing failed when it received the typical Internal Server Error. The

RE: Failure to open existing log file (non fatal)

2012-10-23 Thread Markus Jelsma
Hi, I checked the logs and it confirms the error is not fatal, it was logged just a few seconds before it was restarted. The node runs fine after it was restarted but logged this non fatal error replayed the log twice. This leaves the question why it died, there is no log of it dying anywhere.

RE: Failure to open existing log file (non fatal)

2012-10-24 Thread Markus Jelsma
crash (on the next startup). Yes, the node recovered itself in the minutes after the exception and continued to run fine. At least other users can now google for the error and ignore it for now. Thanks! - Mark On Tue, Oct 23, 2012 at 6:48 PM, Markus Jelsma markus.jel...@openindex.io wrote

RE: how solr to boost term value at the start of ther field?

2012-10-26 Thread Markus Jelsma
Hi, One trick is to index a special token at the beginning of the content and do a phrase query for your terms and the special token with little or no slop. You can also use Lucene's SpanFirstQuery but it's not yet exposed in Solr. There's a patch for trunk exposing the SpanFirstQuery in

Remove entries from search result, custom collector

2012-10-29 Thread Markus Jelsma
Hi, We want to remove some results from the result set based on the result of some algorithms on some fields in adjacent documents. For example, if doc2 resembles or doc1 we want to remove it. We cannot do this in a search component because of problems with paging, maintaining rows=N results

RE: Having problem runing apache-solr on my linux server

2012-10-29 Thread Markus Jelsma
Hi - Detach it from the terminal: java -jar start.jar -Original message- From:zakari mohammed metzaka...@yahoo.com Sent: Mon 29-Oct-2012 15:22 To: solr-user@lucene.apache.org Subject: Having problem runing apache-solr on my linux server hello dear, I try running

Unable to build trunk

2012-10-30 Thread Markus Jelsma
Hi, Since yesterday we're unable to build trunk and also a clean check out from trunk. We can compile the sources but not the example or dist. It hangs on resolve and after a while prints the following: resolve: [ivy:retrieve] [ivy:retrieve] :: problems summary :: [ivy:retrieve]

RE: Unable to build trunk

2012-10-30 Thread Markus Jelsma
will download a bunch of jars... FWIW, Erick On Tue, Oct 30, 2012 at 5:38 AM, Markus Jelsma markus.jel...@openindex.io wrote: Hi, Since yesterday we're unable to build trunk and also a clean check out from trunk. We can compile the sources but not the example or dist. It hangs

trunk is unable to replicate between nodes ( Unable to download ... completely)

2012-10-30 Thread Markus Jelsma
Hi, We're testing again with today's trunk and using the new Lucene 4.1 format by default. When nodes are not restarted things are kind of stable but restarting nodes leads to a lot of mayhem. It seems we can get the cluster back up and running by clearing ZK and restarting everything (another

RE: trunk is unable to replicate between nodes ( Unable to download ... completely)

2012-10-30 Thread Markus Jelsma
Ah, we're also seeing Solr lookup an unexisting directory: 2012-10-30 16:32:26,578 ERROR [handler.admin.CoreAdminHandler] - [http-8080-exec-2] - : IO error while trying to get the size of the Directory:org.apache.lucene.store.NoSuchDirectoryException: directory

RE: Unable to build trunk

2012-10-31 Thread Markus Jelsma
again? Of course your next build will download a bunch of jars... FWIW, Erick On Tue, Oct 30, 2012 at 5:38 AM, Markus Jelsma markus.jel...@openindex.io wrote: Hi, Since yesterday we're unable to build trunk and also a clean check out from trunk. We can compile the sources

No lockType configured for NRTCachingDirectory

2012-10-31 Thread Markus Jelsma
Hi, Besides replication issues (see other thread) we're also seeing these warnings in the logs on all 10 nodes and for all cores using today's or yesterday's trunk. 2012-10-31 11:01:03,328 WARN [solr.core.CachingDirectoryFactory] - [main] - : No lockType configured for

RE: No lockType configured for NRTCachingDirectory

2012-10-31 Thread Markus Jelsma
That's 5, the actual trunk/ -Original message- From:Mark Miller markrmil...@gmail.com Sent: Wed 31-Oct-2012 16:29 To: solr-user@lucene.apache.org Subject: Re: No lockType configured for NRTCachingDirectory By trunk do you mean 4X or 5X? On Wed, Oct 31, 2012 at 7:47 AM, Markus

RE: trunk is unable to replicate between nodes ( Unable to download ... completely)

2012-11-01 Thread Markus Jelsma
on the leader is corrupt, and that should not happen even on power loss. Any hints? Thanks Markus -Original message- From:Markus Jelsma markus.jel...@openindex.io Sent: Wed 31-Oct-2012 14:14 To: solr-user@lucene.apache.org; Markus Jelsma markus.jel...@openindex.io Subject: RE: trunk

Continuous Ping query caused exception: java.util.concurrent.RejectedExecutionException

2012-11-01 Thread Markus Jelsma
Hi, Using this week's trunk we sometime see nodes entering a some funky state where it continuously reports exceptions. Replication and query handling is still possible but there is an increase in CPU time: 2012-11-01 09:24:28,337 INFO [solr.core.SolrCore] - [http-8080-exec-4] - :

trouble instantiating CloudSolrServer

2012-11-02 Thread Markus Jelsma
Hi, We use trunk but got SolrJ 4.0 from Maven. Creating an instance of CloudSolrServer fails because its constructor calls a not existing LBServer constructor, it attempts to create an instance by only passing a HttpClient. How is LBHttpSolrServer supposed to work without passing a SolrServer

RE: Difference Between Indexing and Reindexing

2013-04-04 Thread Markus Jelsma
I assume you're using Nutch 2.x? Nutch 1.x does not have such an option and i find it strange to hear 2.x does. It really makes no sense to have a -reindex option and it should be removed. I'd recommend to stick to plain indexing. -Original message- From:Jack Krupansky

RE: Listing Priority

2013-04-14 Thread Markus Jelsma
You can use boost queries to boost documents that match some query e.g. suffix:co.uk but you'll need to have URL suffixes indexed. Nutch knows about URL suffixes but does not index them. You would need to add a custom indexing filter or patch an existing filter to add a suffix field. URLUtil

RE: EdgeGram filter

2013-04-23 Thread Markus Jelsma
Always check the javadocs. There's a lot of info to be found there: http://lucene.apache.org/core/4_0_0-BETA/analyzers-common/org/apache/lucene/analysis/miscellaneous/LimitTokenCountFilterFactory.html -Original message- From:alx...@aim.com alx...@aim.com Sent: Tue 23-Apr-2013 21:06

RE: DF is not updated when a document is marked for deletion note

2013-05-02 Thread Markus Jelsma
DF uses maxDoc which is updated when segments merge so DF is almost never accurate in a dynamic index. -Original message- From:Furkan KAMACI furkankam...@gmail.com Sent: Thu 02-May-2013 14:05 To: solr-user@lucene.apache.org Subject: DF is not updated when a document is marked for

RE: Pros and Cons of Using Deduplication of Solr at Huge Data Indexing

2013-05-02 Thread Markus Jelsma
Distributed deduplication does not work right now: https://issues.apache.org/jira/browse/SOLR-3473 We've chosen not do use update processors for deduplication anymore and rely on several custom mapreduce jobs in Nutch and some custom collectors in Solr to do some on-demand online deduplication.

RE: How to get Term Vector Information on Distributed Search

2013-05-07 Thread Markus Jelsma
hi - this is a known issue: https://issues.apache.org/jira/browse/SOLR-4479 -Original message- From:meghana meghana.rav...@amultek.com Sent: Tue 07-May-2013 14:28 To: solr-user@lucene.apache.org Subject: How to get Term Vector Information on Distributed Search Hi, I am using

RE: Solr 1.4 - Proximity Search - Where is configuration for storing positions?

2013-05-07 Thread Markus Jelsma
Hi - they are indexed by default but can be omitted since 3.4: http://wiki.apache.org/solr/SchemaXml#Common_field_options -Original message- From:KnightRider ksu.wildc...@gmail.com Sent: Tue 07-May-2013 14:41 To: solr-user@lucene.apache.org Subject: Solr 1.4 - Proximity Search -

RE: CJK question

2013-05-13 Thread Markus Jelsma
Hi, It uses the StandardAnalyzer which does split on IDEOGRAPHIC SPACE. Cheers, Markus -Original message- From:Bernd Fehling bernd.fehl...@uni-bielefeld.de Sent: Mon 13-May-2013 13:36 To: solr-user@lucene.apache.org Subject: CJK question A question about CJK, how will U+3000

Thai language and removed position filter

2013-05-21 Thread Markus Jelsma
Hi, The wiki mentions to use the removed position filter at query time for the Thai language. How is it supposed to work now without the position filter? http://wiki.apache.org/solr/LanguageAnalysis#Thai Thanks, Markus

RE: Overlapping onDeckSearchers=2

2013-05-27 Thread Markus Jelsma
forceMerge is very useful if you delete a significant portion of an index. It can take a very long time before any merge policy decides to finally merge them all away, especially for a static or infrequently changing index. Also, having a lot of deleted docs in the index can be an issue if your

RE: Note on The Book

2013-05-29 Thread Markus Jelsma
Jack, I'd prefer tons of information instead of a meager 300 page book that leaves a lot of questions. I'm looking forward to a paperback or hardcover book and price doesn't really matter, it is going to be worth it anyway. Thanks, Markus -Original message- From:Jack Krupansky

RE: PostingsHighlighter and analysis

2013-06-17 Thread Markus Jelsma
Hi, Any intelligent suggestions for this issue? Thanks, Markus -Original message- From:Trey Hyde th...@centraldesktop.com Sent: Mon 11-Mar-2013 21:44 To: solr-user@lucene.apache.org Subject: PostingsHighlighter and analysis debug=timing has told me for a very long time that 99%

RE: Solr File System Search

2013-06-24 Thread Markus Jelsma
You can use Apache Nutch to crawl local file systems as well and indexing to Solr as one would otherwise do. Cheers -Original message- From:Sourabh107 sourabh.jain@gmail.com Sent: Sunday 23rd June 2013 17:12 To: solr-user@lucene.apache.org Subject: Solr File System Search I

RE: undefined field http:// while searchi query

2013-07-02 Thread Markus Jelsma
colons need to be escaped cheers -Original message- From:aniljayanti aniljaya...@yahoo.co.in Sent: Tuesday 2nd July 2013 12:35 To: solr-user@lucene.apache.org Subject: undefined field http:// while searchi query Hi, I am using solr 3.3 version. After indexing I am querying

RE: simple date query

2013-07-10 Thread Markus Jelsma
hi - check the examples for range queries and date math: http://wiki.apache.org/solr/SolrQuerySyntax http://lucene.apache.org/solr/4_3_1/solr-core/org/apache/solr/util/DateMathParser.html -Original message- From:Marcos Mendez mar...@aimrecyclinggroup.com Sent: Wednesday 10th July

RE: Why Sort Doesn't Work?

2013-07-17 Thread Markus Jelsma
Remove the WDF from the analysis chain, it's not going to work with multiple tokens. -Original message- From:Furkan KAMACI furkankam...@gmail.com Sent: Wednesday 17th July 2013 11:55 To: solr-user@lucene.apache.org Subject: Why quot;Sortquot; Doesn't Work? I run a query at my

RE: Why Sort Doesn't Work?

2013-07-17 Thread Markus Jelsma
Work? Hi Markus; This is default schema at Nutch. Do you mean there is a bug with schema? 2013/7/17 Markus Jelsma markus.jel...@openindex.io Remove the WDF from the analysis chain, it's not going to work with multiple tokens. -Original message- From:Furkan KAMACI

RE: Why Sort Doesn't Work?

2013-07-17 Thread Markus Jelsma
? It is not listed at schema. Is it document boost? 2013/7/17 Markus Jelsma markus.jel...@openindex.io No, there is no bug in the schema, it is just an example and provides the most common usage only; sort by score. -Original message- From:Furkan KAMACI furkankam...@gmail.com

RE: boost docs if token matches happen in the first 5 words

2013-07-18 Thread Markus Jelsma
You must implement a SpanFirst query yourself. These are not implemented in any Solr query parser. You can easily expand the (e)dismax parsers and add support for it. -Original message- From:Anatoli Matuskova anatoli.matusk...@gmail.com Sent: Thursday 18th July 2013 11:54 To:

RE: boost docs if token matches happen in the first 5 words

2013-07-18 Thread Markus Jelsma
You'll need the import org.apache.lucene.search.spans package in Solr's ExtendedDismaxQParserPlugin and add SpanFirstQuery's to the main query. Something like: query.add(new SpanFirstQuery(new SpanTermQuery(field, clause), distance), BooleanClause.Occur.SHOULD); -Original message-

RE: How can I learn the total count of how many documents indexed and how many documents updated?

2013-07-18 Thread Markus Jelsma
Not your updateHandler, that only shows number about what it's doing and it can be restarted. Check your cores: host:port/solr/admin/cores -Original message- From:Furkan KAMACI furkankam...@gmail.com Sent: Thursday 18th July 2013 15:46 To: solr-user@lucene.apache.org Subject: Re:

RE: How can I learn the total count of how many documents indexed and how many documents updated?

2013-07-18 Thread Markus Jelsma
count of how many documents indexed and how many documents updated? Hi Markus; It doesn't give me how many documents updated from last commit. 2013/7/18 Markus Jelsma markus.jel...@openindex.io Not your updateHandler, that only shows number about what it's doing and it can

RE: IDNA Support For Solr

2013-07-19 Thread Markus Jelsma
Hi - What kind of support would you expect Solr to provide? IDN is only about conversion between Unicode in your address bas and ASCII in the DNS. -Original message- From:Furkan KAMACI furkankam...@gmail.com Sent: Friday 19th July 2013 11:09 To: solr-user@lucene.apache.org Subject:

RE: IDNA Support For Solr

2013-07-19 Thread Markus Jelsma
includes that word: *çorba.* 2013/7/19 Markus Jelsma markus.jel...@openindex.io Hi - What kind of support would you expect Solr to provide? IDN is only about conversion between Unicode in your address bas and ASCII in the DNS. -Original message- From:Furkan KAMACI furkankam

RE: Solr 4.3.1 - SolrCloud nodes down and lost documents

2013-07-22 Thread Markus Jelsma
It is possible: https://issues.apache.org/jira/browse/SOLR-4260 I rarely see it and i cannot reliably reproduce it but it just sometimes happens. Nodes will not bring each other back in sync. -Original message- From:Neil Prosser neil.pros...@gmail.com Sent: Monday 22nd July 2013

RE: Solr 4.3.1 - SolrCloud nodes down and lost documents

2013-07-22 Thread Markus Jelsma
You should increase your ZK time out, this may be the issue in your case. You may also want to try the G1GC collector to keep STW under ZK time out. -Original message- From:Neil Prosser neil.pros...@gmail.com Sent: Monday 22nd July 2013 14:38 To: solr-user@lucene.apache.org Subject:

RE: facet.maxcount ?

2013-07-23 Thread Markus Jelsma
Hi - No but there are two unresolved issues about this topic: https://issues.apache.org/jira/browse/SOLR-4411 https://issues.apache.org/jira/browse/SOLR-4411 Cheers -Original message- From:Jérôme Étévé jerome.et...@gmail.com Sent: Tuesday 23rd July 2013 12:58 To:

RE: facet.maxcount ?

2013-07-23 Thread Markus Jelsma
Eeh, here's the other one: https://issues.apache.org/jira/browse/SOLR-1712 -Original message- From:Markus Jelsma markus.jel...@openindex.io Sent: Tuesday 23rd July 2013 13:18 To: solr-user@lucene.apache.org Subject: RE: facet.maxcount ? Hi - No but there are two unresolved

RE: Usage Of Real Time Get Handler Of Solr

2013-07-24 Thread Markus Jelsma
Because it's a get and not a search handler. It takes the id parameter and returns the latest stored fields of document with the specified ID. -Original message- From:Furkan KAMACI furkankam...@gmail.com Sent: Wednesday 24th July 2013 11:07 To: solr-user@lucene.apache.org Subject:

RE: How to Make That Domains Should Be First?

2013-07-27 Thread Markus Jelsma
Hi - To make this work you'll need a homepage flag and some specific hostname analysis and function query boosting. I assume you're still using Nutch so getting detecting homepages is easy using NUTCH-1325. To actually get the homepage flag in Solr you need to modify the indexer to ingest the

RE: Unexpected character '' (code 60) expected '='

2013-07-31 Thread Markus Jelsma
This file is malformed: *SEVERE: org.apache.solr.common.SolrException: Unexpected character '' (code 60) expected '='* * at [row,col {unknown-source}]: [20281,18]* Check row 20281 column 18 -Original message- From:Vineet Mishra clearmido...@gmail.com Sent: Wednesday 31st July 2013

RE: Measuring SOLR performance

2013-07-31 Thread Markus Jelsma
Did you also test indexing speed? With default G1GC settings we're seeing a slightly higher latency for queries than CMS. However, G1GC allows for much higher throughput than CMS when indexing. I haven't got the raw numbers here but it is roughly 45 minutes against 60 in favour of G1GC! Load

Large config files in SolrCloud

2013-08-02 Thread Markus Jelsma
Hi, I have a few very large configuration files but it doens't work in cloud mode due to the KeeperException$ConnectionLossException. All 10 Solr nodes run trunk and have jute.maxbuffer set to 5242880 (5MB). I can confirm it is set properly by looking at the args in the Solr GUI. All

RE: Large config files in SolrCloud

2013-08-02 Thread Markus Jelsma
wondering if your setup would work with, say, 2M configs as a check that it's something else rather than just the 1M limit. FWIW, Erick On Fri, Aug 2, 2013 at 8:18 AM, Markus Jelsma markus.jel...@openindex.iowrote: Hi, I have a few very large configuration files but it doens't work

RE: Large config files in SolrCloud

2013-08-02 Thread Markus Jelsma
of 1M for ZK files, and I'm wondering if your setup would work with, say, 2M configs as a check that it's something else rather than just the 1M limit. FWIW, Erick On Fri, Aug 2, 2013 at 8:18 AM, Markus Jelsma markus.jel...@openindex.iowrote: Hi, I have a few very

RE: SOLR matching keywords with / without whitespace

2013-08-03 Thread Markus Jelsma
Perhaps it's not the correct tool here but decompounding using a simple dictionary decompounder token filter will fix this problem. -Original message- From:Erick Erickson erickerick...@gmail.com Sent: Saturday 3rd August 2013 13:33 To: solr-user@lucene.apache.org Subject: Re:

RE: entity classification solr

2013-08-07 Thread Markus Jelsma
Yes, you can copyField the source's contents to another field, use the KeepWordTokenFilter to keep only those words you really care about. Using (e)dismax you can then apply a heavy boost on the field. All special words in that field will show up higher if queried for. -Original

RE: Large config files in SolrCloud

2013-08-07 Thread Markus Jelsma
limit. FWIW, Erick On Fri, Aug 2, 2013 at 8:18 AM, Markus Jelsma markus.jel...@openindex.iowrote: Hi, I have a few very large configuration files but it doens't work in cloud mode due to the KeeperException$ConnectionLossException. All 10 Solr

RE: get term frequency, just only keywords search

2013-08-14 Thread Markus Jelsma
Try the TermsComponent. It will return one or more terms and their counts for a given field only. -Original message- From:danielitos85 danydany@gmail.com Sent: Wednesday 14th August 2013 11:30 To: solr-user@lucene.apache.org Subject: get term frequency, just only keywords

RE: get term frequency, just only keywords search

2013-08-14 Thread Markus Jelsma
Why? Using terms.limit or a ^term$ regex should limit the response to the exact term right? -Original message- From:danielitos85 danydany@gmail.com Sent: Wednesday 14th August 2013 12:20 To: solr-user@lucene.apache.org Subject: RE: get term frequency, just only keywords search

RE: Autosuggest on very large index

2013-08-20 Thread Markus Jelsma
I am not entirely sure but the Suggester's FST uses prefixes so you may be able to prefix the value you otherwise use for the filter query when you build the suggester. -Original message- From:Greg Preston gpres...@marinsoftware.com Sent: Tuesday 20th August 2013 20:00 To:

RE: How to set discountOverlaps=true in Solr 4x schema.xml

2013-08-22 Thread Markus Jelsma
Hi Tom, Don't set it as attributes but as lists as Solr uses everywhere: similarity class=solr.SchemaSimilarityFactory bool name=discountOverlapstrue/bool /similarity For BM25 you can also set k1 and b which is very convenient! Cheers -Original message- From:Tom Burton-West

RE: How to set discountOverlaps=true in Solr 4x schema.xml

2013-08-23 Thread Markus Jelsma
Yes, discountOverlaps is used in computeNorm which is used at index time. You should see a change after reindexing. Cheers, Markus -Original message- From:Tom Burton-West tburt...@umich.edu Sent: Thursday 22nd August 2013 23:32 To: solr-user@lucene.apache.org Subject: Re: How to

RE: Concat 2 fields in another field

2013-08-27 Thread Markus Jelsma
You may be more interested in the ConcatFieldUpdateProcessorFactory: http://lucene.apache.org/solr/4_1_0/solr-core/org/apache/solr/update/processor/ConcatFieldUpdateProcessorFactory.html -Original message- From:Alok Bhandari alokomprakashbhand...@gmail.com Sent: Tuesday 27th August

RE: SOLR 4.2.1 - High Resident Memory Usage

2013-08-27 Thread Markus Jelsma
Hi -Original message- From:Shawn Heisey s...@elyograg.org Sent: Wednesday 28th August 2013 0:50 To: solr-user@lucene.apache.org Subject: Re: SOLR 4.2.1 - High Resident Memory Usage On 8/27/2013 4:17 PM, Erick Erickson wrote: Ok, this whole topic usually gives me heartburn. So

RE: Data Centre recovery/replication, does this seem plausible?

2013-08-28 Thread Markus Jelsma
Hi - You're going to miss unstored but indexed fields. We stop any indexing process, kill the servlets on the down DC and copy over the files using scp, then remove the lock file and start it up again. Always works but it's a manual process at this point but should be easy to automate using

RE: SOLR 4.2.1 - High Resident Memory Usage

2013-08-28 Thread Markus Jelsma
Hi - it's certainly not a rule of thumb but usually RES always grows higher than Xmx so keep an eye on it. -Original message- From:vsilgalis vsilga...@gmail.com Sent: Wednesday 28th August 2013 2:53 To: solr-user@lucene.apache.org Subject: Re: SOLR 4.2.1 - High Resident Memory Usage

RE: SolrCloud 4.x hangs under high update volume

2013-09-04 Thread Markus Jelsma
Hi Mark, Got an issue to watch? Thanks, Markus -Original message- From:Mark Miller markrmil...@gmail.com Sent: Wednesday 4th September 2013 16:55 To: solr-user@lucene.apache.org Subject: Re: SolrCloud 4.x hangs under high update volume I'm going to try and fix the root cause for

RE: Dynamic analizer settings change

2013-09-11 Thread Markus Jelsma
-Original message- From:maephisto my_sky...@yahoo.com Sent: Wednesday 11th September 2013 14:34 To: solr-user@lucene.apache.org Subject: Re: Dynamic analizer settings change Thanks, Erik! I might have missed mentioning something relevant. When querying Solr, I wouldn't

RE: Near Duplicate Document Detection at Solr

2013-09-22 Thread Markus Jelsma
-Original message- From:Furkan KAMACI furkankam...@gmail.com Sent: Sunday 22nd September 2013 21:15 To: solr-user@lucene.apache.org Subject: Re: Near Duplicate Document Detection at Solr I've also know that there is another mechanism at Solr:

RE: Exact Word Match Search comes in first come In Solr4.3

2013-09-26 Thread Markus Jelsma
That won't boost order but Lucene's SpanFirstQuery does. You do have to make a custom query parser plugin for it but that's trivial. -Original message- From:Otis Gospodnetic otis.gospodne...@gmail.com Sent: Thursday 26th September 2013 13:24 To: solr-user@lucene.apache.org Subject:

RE: Prevent public access to Solr Admin Page

2013-09-26 Thread Markus Jelsma
As Shawn said, do not expose your Solr server to the internet. Do your internet users access the server directly or via some frontend application? Almost all web based applications connect to Solr via some frontend. Usually Solr is hidden from the internet just as some DBMS is. Do not expose

RE: Effect of multiple white space at WhiteSpaceTokenizer

2013-10-08 Thread Markus Jelsma
Result is the same and performance difference should be negligible, unless you're uploading megabytes of white space. Consecutive white space should be collapsed outside of Solr/Lucene anyway because it'll end up in your stored field. Index size will be slightly bigger but not much due to

RE: EdgeNGramFilterFactory and Faceting

2013-10-08 Thread Markus Jelsma
Facets do not return the stored constraints, it's usually bad idea to tokenize or do some have analysis on facet fields. You need to copy your field instead. -Original message- From:Tyler Foster tfos...@cloudera.com Sent: Tuesday 8th October 2013 19:28 To: solr-user@lucene.apache.org

RE: New query-time multi-word synonym expander

2013-10-23 Thread Markus Jelsma
Nice, but now we got three multi-word synonym parsers? Didn't the LUCENE-4499 or SOLR-4381 patches work? I know the latter has had a reasonable amount of users and committers on github, but it was never brought back to ASF it seems. -Original message- From:Otis Gospodnetic

RE: Replacing Google Mini Search Appliance with Solr?

2013-10-30 Thread Markus Jelsma
Hi Eric, We have also helped some government institution to replave their expensive GSA with open source software. In our case we use Apache Nutch 1.7 to crawl the websites and index to Apache Solr. It is very effective, robust and scales easily with Hadoop if you have to. Nutch may not be the

RE: Exclude urls without 'www' from Nutch 1.7 crawl

2013-11-01 Thread Markus Jelsma
Hi - Use the domain-urlfilter for host, domain and TLD filtering. Also, please ask questions on the Nutch list, you're on Solr now :) -Original message- From:Reyes, Mark mark.re...@bpiedu.com Sent: Friday 1st November 2013 17:24 To: solr-user@lucene.apache.org Subject: Exclude

RE: how can i disable coord?

2013-11-04 Thread Markus Jelsma
You cannot disable coordination factor at query time at this moment so you need to change your Similarity in the schema. Easiest to do this is to set the SchemaSimilarityFactory. It defaults to TFIDF but without queryNorm and coord or use another similarity implementation. -Original

RE: 2 replicas with different num of documents

2013-11-04 Thread Markus Jelsma
Hi - we've seen that issue as well (SOLR-4260) and it happend many times with older versions. The good thing is that we haven't seen it for a very long time now so i silently assumed other fixes already solved the problem. We don't know how to reproduce the problem but in older versions it

RE: eDisMax, multiple language support and stopwords

2013-11-07 Thread Markus Jelsma
This is an ancient problem. The issue here is your mm-parameter, it gets confused because for separate fields different amount of tokens are filtered/emitted so it is never going to work just like this. The easiest option is not to use the stopfilter.

RE: Nutch 1.7 solrdedup error

2013-11-18 Thread Markus Jelsma
You got a 404 for that URL http://localhost:8983/solr/rockies/. Your Solr core is not there. Caused by: org.apache.solr.common.SolrException: Not Found Not Found request: http://localhost:8983/solr/rockies/select?q=*:*fl=idrows=1wt=javabinversion=2 at

RE: How To Use Multivalued Field Payload at Boosting?

2013-11-25 Thread Markus Jelsma
Solr has no query parsers that support payloads. You would have make your own query parser and also create a custom similarity implementing scorePayload for it to work. -Original message- From:Furkan KAMACI furkankam...@gmail.com Sent: Sunday 24th November 2013 19:07 To:

RE: Multiple data/index.YYYYMMDD.... dirs == bug?

2013-11-25 Thread Markus Jelsma
-Original message- From:Otis Gospodnetic otis.gospodne...@gmail.com Sent: Wednesday 20th November 2013 16:40 To: solr-user@lucene.apache.org Subject: Multiple data/index.MMDD dirs == bug? Hi, When full index replication is happening via SnapPuller, a temporary

RE: Client-side proxy for Solr 4.5.0

2013-11-26 Thread Markus Jelsma
I don't think you mean client-side proxy. You need a server side layer such as a normal web application or good proxy. We use Nginx, it is very fast and very feature rich. Its config scripting is usually enough to restrict access and limit input parameters. We also use Nginx's embedded Perl and

RE: Inconsistent numFound in SC when querying core directly

2013-12-04 Thread Markus Jelsma
https://issues.apache.org/jira/browse/SOLR-4260 Join the club Tim! Can you upgrade to trunk or incorporate the latest patches of related issues? You can fix it by trashing the bad node's data, although without multiple clusters it may be difficult to decide which node is bad. We use the latest

RE: SolrCloud 4.6.0 - leader election issue

2013-12-09 Thread Markus Jelsma
I can confirm i've seen this issue as well on trunk, a very recent build. -Original message- From:Elodie Sannier elodie.sann...@kelkoo.fr Sent: Monday 9th December 2013 16:43 To: solr-user@lucene.apache.org Cc: search5t...@lists.kelkoo.com Subject: SolrCloud 4.6.0 - leader

RE: Branch/Java questions re: contributing code

2014-01-06 Thread Markus Jelsma
Trunk (5.x) requires Java 1.7, 4.x still works with 1.6. Check the CHANGES.txt, you'll see it near the top. -Original message- From:Ryan Cutter ryancut...@gmail.com Sent: Monday 6th January 2014 16:27 To: solr-user@lucene.apache.org Subject: Branch/Java questions re:

Re:Indexing URLs from websites

2014-01-07 Thread Markus Jelsma
You need to use the invertlinks command to build a database with docs with inlinks and anchors. Then use the index-anchor plugin when indexing. Then you will have a multivalued field with anchors pointing to your document. Teague James teag...@insystechinc.com schreef:I am trying to index a

<    4   5   6   7   8   9   10   11   12   13   >