Re: Indexing issue - index get deleted

2015-06-10 Thread Alessandro Benedetti
Let me answer in line, to get more info : 2015-06-10 10:59 GMT+01:00 Midas A test.mi...@gmail.com: Hi Alessandro, Please find the answers inline and help me out to figure out this problem. 1) Solr version : *4.2.1* 2) Solr architecture :* Master -slave/ Replication with requestHandler*

Re: Assign rich-text document's title name from clustering results

2015-06-10 Thread Upayavira
It depends a lot on what the documents are. Some document formats have metadata that stores a title. Perhaps you can just extract that. If not, once you've extracted the content, perhaps you could just have a special field that is the first n words (followed by an ellipsis). If you use a

Re: Indexing issue - index get deleted

2015-06-10 Thread Alessandro Benedetti
Wow, Upaya, I didn't know that clean was default=true in the delta import as well! I did know it was default in the full import, but I agree with you that having a default to true for delta import is very dangerous ! But assuming the user was using the delta import so far, if cleaning every time,

Re: Indexing issue - index get deleted

2015-06-10 Thread Midas A
Hi Alessandro, Please find the answers inline and help me out to figure out this problem. 1) Solr version : *4.2.1* 2) Solr architecture :* Master -slave/ Replication with requestHandler* 3) Kind of data source indexed : *Mysql * 4) What happened to the datasource ? any change in there ? : *No

Re: Indexing issue - index get deleted

2015-06-10 Thread Upayavira
Note the clean= parameter to the DIH. It defaults to true. It will wipe your index before it runs. Perhaps it succeeded at wiping, but failed to connect to your database. Hence an empty DB? clean=true is, IMO, a very dangerous default option. Upayavira On Wed, Jun 10, 2015, at 10:59 AM, Midas A

Re: Indexing issue - index get deleted

2015-06-10 Thread Upayavira
I was only speaking about full import regarding the default of clean=true. However, looking at the source code, it doesn't seem to differentiate especially between a full and a delta in relation to the default of clean=true, which would be pretty crappy. However, I'd need to try it. Upayavira On

Re: Date Format Conversion Function Query

2015-06-10 Thread Upayavira
Another technology that might make more sense is a Doc Transformer. You also specify them in the fl parameter. I would imagine you could specify fl=id,[persian f=gregorian_Date] See here for more cases: https://cwiki.apache.org/confluence/display/solr/Transforming+Result+Documents This does

Re: Assign rich-text document's title name from clustering results

2015-06-10 Thread Zheng Lin Edwin Yeo
The main objective here is actually to assign a title to the documents as they are being indexed. We actually found that the cluster labels provides a good information on the key points of the documents, but I'm not sure if we can get a good cluster labels with a single documents. Besides

Re: Indexing issue - index get deleted

2015-06-10 Thread Alessandro Benedetti
Let me try to help you, first of all I would like to encourage people to post more information about their scenario than This is my log, index deleted, help me :) This kind of Info can be really useful : 1) Solr version 2) Solr architecture ( Solr Cloud ? Solr Cloud configuration ? Manual

Re: Indexing documents in Chinese

2015-06-10 Thread Zheng Lin Edwin Yeo
I've tried to use solr.HMMChineseTokenizerFactory with the following configurations: fieldType name=text_chinese class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.HMMChineseTokenizerFactory/ filter class=solr.StopFilterFactory

Solr date variable resolver is not working with MySql

2015-06-10 Thread abhijit bashetti
I have used Solr 3.3 version as Data Import Handler(DIH) with Oracle.Its working fine for me.nbsp; Now I am trying the same with Mysql.With the change in database, I have changed the query used in data-config.xml for MySql. The query has variables which are passed url in http.The same thing

Re: Velocity UI and hyperlink

2015-06-10 Thread Erik Hatcher
In cloud mode, configurations live in ZooKeeper. By doing the -Dvelocity.template.base.dir=/full/path/to/example/files/conf/velocity/ trick (or baking that into your solrconfig setup for the VelocityResponseWriter) you can have the templates on the file system instead though. — Erik Hatcher,

Re: Solr date variable resolver is not working with MySql

2015-06-10 Thread Alexandre Rafalovitch
Some reason, you email is complete unreadable with a lot of nbsp instead of spaces. Maybe it is trying to send as broken HTML? You may want to try to reformat the message and resend. Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:

Re: Velocity UI and hyperlink

2015-06-10 Thread Sznajder ForMailingList
Hi Erik When running solr in simple mode on my laptop, I found the *vm files under under server/solr/COLLECTION_NAME/conf however, when running on my server in cloud mode (with only one node), I do not find these conf/ directory under server. Does it sit on another place? thanks! On Tue, Jun

Re: Assign rich-text document's title name from clustering results

2015-06-10 Thread Alessandro Benedetti
Hi Edwin, let's do this step by step. Clustering is problem solved by unsupervised machine learning algorithms. The scope of clustering is to group per similarity a corpus of documents, trying to have meaningful groups for a human being. Solr currently provides different approaches for *Query

Re: Date Format Conversion Function Query

2015-06-10 Thread Alessandro Benedetti
Erick will correct me if I am wrong but this function query I don't think it exists. But maybe can be a nice contribution. It should take in input a date format and a field and give in response the new formatted Date. The would be simple to use it :

Re: AngularJS

2015-06-10 Thread Upayavira
On Wed, Jun 10, 2015, at 05:52 AM, William Bell wrote: Finding DIH issue with the new AngularJS DIH section, while indexing... 1,22613/s ? Last Update: 22:50:50 *Indexing since 0:1:38.204* Requests: 1, Fetched: 1,22613/s, Skipped: 0, Processed: 1,22613/s Started: 3 minutes ago Ahh,

Re: TZ rounding

2015-06-10 Thread jon kerling
Thank you for your reply. So my question is: can I get offset of time if I use NOW/MINUTE and not NOW/DAY rounding? You said  TZ affects what timezone is used when defining the concept of a day for the purposes of rounding by day. I understand from this answer that query like I mentioned

Solr date variable resolver is not working with MySql

2015-06-10 Thread abhijit bashetti
I have used Solr 3.3 version as Data Import Handler(DIH) with Oracle.Its working fine for me.nbsp; Now I am trying the same with Mysql.With the change in database, I have changed the query used in data-config.xml for MySql. The query has variables which are passed url in http.The same thing

Re: Solr date variable resolver is not working with MySql

2015-06-10 Thread Shawn Heisey
On 6/10/2015 6:43 AM, abhijit bashetti wrote: snip gt;= to_date('[?, '28/05/2015 11:13:50']', 'DD/MM/ HH24:MI:SS') snip Anyone knows where is the problem? Why is the variable resolver not working as expected? Note : to_date is function written by us in MySql. I have checked out the

Re: Date Format Conversion Function Query

2015-06-10 Thread Ali Nazemian
Dear Erick, Hi, Actually I want to convert date format from Geregorian calendar (solr default) to Perisan calendar. You may ask why i do not do that at client side? Here is why: I want to provide a way to extract data from solr in the csv format. I know that solr has csv ResponseWriter that could

Indexing documents in Chinese

2015-06-10 Thread Zheng Lin Edwin Yeo
Hi, I'm trying to index rich-text documents that are in chinese. Currently, there's no problem with indexing, but there's problem with the searching. Does anyone knows what is the best Tokenizer and Filter Factory to use? I'm now using the solr.StandardTokenizerFactory which I heard that it's

Re: Indexing documents in Chinese

2015-06-10 Thread Alexandre Rafalovitch
You may find the series of article on CJK analysis/search helpful: http://discovery-grindstone.blogspot.com.au/ It's a little out of date, but should be a very solid intro. Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 10

Re: Assign rich-text document's title name from clustering results

2015-06-10 Thread Alessandro Benedetti
I agree with Upayavira, Title extraction is an activity independent from Solr. Furthermore I would say it's easy to extract the title before the Solr Indexng stage. When we send the content arrives to Solr Update processors it is already a String. If you want to do some clever title extraction,

Re: How to assign shard to specifc node?

2015-06-10 Thread Erick Erickson
Take a look at the collections API CREATE command in more detail here: https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api1 Admittedly this is 5.2 but you didn't mention what version of Solr you're using. In particular the createNodeSet and createNodeSet.shuffle

Re: Indexing issue - index get deleted

2015-06-10 Thread Alessandro Benedetti
Just taking a look to the code : if (requestParams.containsKey(clean)) { clean = StrUtils.parseBool( (String) requestParams.get(clean), true); } else if (DataImporter.DELTA_IMPORT_CMD.equals(command) || DataImporter.IMPORT_CMD.equals(command)) { clean = false; } else { clean = debug ?

How to assign shard to specifc node?

2015-06-10 Thread MOIS Martin (MORPHO)
Hello, I have a cluster with 3 nodes (node1, node2 and node3). Now I want to create a new collection with 3 shards using `implicit` routing:

SolrCloud No Active Slice

2015-06-10 Thread James Webster
I'm having a config issue, I'm posting the error from Solrj which also includes the cluster state JSON: org.apache.solr.common.SolrException: No active slice servicing hash code 2ee4d125 in DocCollection(rfp365)={ shards:{shard1:{ range:-, state:active,

Re: The best way to exclude seen results from search queries

2015-06-10 Thread Mikhail Khludnev
start with negating and bypassing caches by https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-TermsQueryParser eg fq=-{!terms f=p_id cache=false}1,3,5,already,seen note: Elastic can even store such filters via

File paths in Zookeeper managed config files

2015-06-10 Thread Peter Scholze
Hi all, I'm using Zookeeper 3.4.6 in the context of SolrCloud 5. When uploading a config file containing the following, I get an Invalid Path String error. filter class=solr.StopFilterFactory words=/netapp/dokubase/seeval/dicts/stopwords/stopwords_de.txt ignoreCase=true/ leads obviously

SolrCloud No Active Slice

2015-06-10 Thread James Webster
I'm having a config issue, I'm posting the error from Solrj which also includes the cluster state JSON: org.apache.solr.common.SolrException: No active slice servicing hash code 2ee4d125 in DocCollection(rfp365)={ shards:{shard1:{ range:-, state:active,

Re: Adding applicative cache to SolrSearcher

2015-06-10 Thread Mikhail Khludnev
Hello, The problem is SlowCompositeReaderWrapper.wrap(searcher.getIndexReader()); you hardly ever need to to this, at least because Solr already does it. DocValues need to be accessed per segment, leaf/atomic/reader/context provided to collector. eg look at DocTermsIndexDocValues.strVal(int)

Adding applicative cache to SolrSearcher

2015-06-10 Thread adfel70
I am using RankQuery to implement my applicative scorer that returns a score based on the value of specific field (lets call it 'score_field') that is stored for every document. The RankQuery creates a collector, and for every collected docId I retrieve the value of score_field, calculate the

Re: How to tell when Collector finishes collect loop?

2015-06-10 Thread adfel70
I need to execute close() because the scorer is being opened in a context of a query and caches some data in that scope - of the specific query. The way to clear this cache, which is only relevant for that query, is to call close(). I think this API is not so good, but I assume that the scorer's

Re: Date Format Conversion Function Query

2015-06-10 Thread Ali Nazemian
Thank you very much. It seems that document transformer is the perfect extension point for this conversion. I will try to implement that. Best regards. On Wed, Jun 10, 2015 at 3:54 PM, Upayavira u...@odoko.co.uk wrote: Another technology that might make more sense is a Doc Transformer. You

The best way to exclude seen results from search queries

2015-06-10 Thread amid
Hi, We have a solr index with ~1M documents. We want to give the ability to our users to filter results from queries - meaning they will not shown again for any query of this specific user (we currently have 10K users). You can think of a scenario like a recommendation engine which you don't

Re: File paths in Zookeeper managed config files

2015-06-10 Thread Shawn Heisey
On 6/10/2015 2:47 PM, Peter Scholze wrote: I'm using Zookeeper 3.4.6 in the context of SolrCloud 5. When uploading a config file containing the following, I get an Invalid Path String error. filter class=solr.StopFilterFactory words=/netapp/dokubase/seeval/dicts/stopwords/stopwords_de.txt

RE: The best way to exclude seen results from search queries

2015-06-10 Thread Reitzel, Charles
I don't see any way around storing which recommendations have been delivered to each user. Sounds like a separate collection with the unique ID created from the combination of the user ID and the recommendation ID (with the IDs also available as a separate, searchable and returnable fields).

Re: Adding applicative cache to SolrSearcher

2015-06-10 Thread Chris Hostetter
: : The problem is SlowCompositeReaderWrapper.wrap(searcher.getIndexReader()); : you hardly ever need to to this, at least because Solr already does it. Specifically you should just use... searcher.getLeafReader().getSortedSetDocValues(your_field_anme) ...instead of doing all this

Re: TZ rounding

2015-06-10 Thread Chris Hostetter
: So my question is: can I get offset of time if I use NOW/MINUTE and not NOW/DAY rounding? i'm sorry, but your question is still too terse, vague, and ambiguious for me to really make much sense of it; and the example queries you provided really don't have enough context for me to understand

Re: Indexing issue - index get deleted

2015-06-10 Thread Chris Hostetter
: The guys was using delta import anyway, so maybe the problem is : different and not related to the clean. that's not what the logs say. Here's what i see... Log begins with server startup @ Jun 10, 2015 11:14:56 AM The DeletionPolicy for the shopclue_prod core is initialized at Jun 10,

Re: Index optimize runs in background.

2015-06-10 Thread Erick Erickson
If I knew, I would fix it ;). The sub-optimizes (i.e. the ones sent out to each replica) should be sent in parallel and then each thread should wait for completion from the replicas. There is no real check for optimize, I believe that the return from the call is considered sufficient. If we can

Re: Index optimize runs in background.

2015-06-10 Thread Modassar Ather
Hi, There are 5 cores and a separate server for indexing on this solrcloud. Can you please share your suggestions on: How can indexer know that the optimize has completed even if the commit/optimize runs in background without going to the solr servers may be by using any solrj or other API? I

Show all fields in Solr highlighting output

2015-06-10 Thread Zheng Lin Edwin Yeo
Hi, Is it possible to list all the fields in the highlighting portion in the output? Currently,even when I str name=hl.fl*/str, it only shows fields where highlighting is possible, and fields which highlighting is not possible is not shown. I would like to have the output where all the fields,