Re:***UNCHECKED*** Limit Solr search to number of character/words (without changing index)

2018-01-26 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
of character/words (without changing index) Hi All, Is there any way I can restrict Solr search query to look for specified number of characters/words (for only searching purposes not for highlighting) *For example:* *Indexed content:* *I am a man of my words I am a lazy man

***UNCHECKED*** Limit Solr search to number of character/words (without changing index)

2018-01-26 Thread Muhammad Zahid Iqbal
Hi All, Is there any way I can restrict Solr search query to look for specified number of characters/words (for only searching purposes not for highlighting) *For example:* *Indexed content:* *I am a man of my words I am a lazy man...* Search to consider only below mentioned (words=7

Question about upgrading a Solr index to version 6

2018-01-22 Thread None
the Apache Solr Index Upgrader tool, but I'm not sure if I should use it. The program I have uses a SQL database for referencing and I'm trying to minimize any potential corruption in the process. Could someone better explain the interaction between Solr and Lucene and whether it would be possible to use

Re: Custom Sort option to apply at SOLR index

2018-01-09 Thread padmanabhan
Thank you Erick, it worked. -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: delete solr data and index older than 3 days

2018-01-02 Thread Alexandre Rafalovitch
t the syntax or some examples for deleting solr data > and index older than 3 days based on datetime field in solr collection. > > I have field data_Start_time which stores the date value in and is of type > 'date' in my solr collection. I want to delete the index/data older than 3

Re: delete solr data and index older than 3 days

2017-12-28 Thread Alessandro Hoss
5:02 PM ppeddi <phani.pe...@gmail.com> wrote: > >> hi, >> Can anyone please post the syntax or some examples for deleting solr data >> and index older than 3 days based on datetime field in solr collection. >> >> I have field data_Start_time which store

Re: delete solr data and index older than 3 days

2017-12-28 Thread Alessandro Hoss
i.pe...@gmail.com> wrote: > hi, > Can anyone please post the syntax or some examples for deleting solr data > and index older than 3 days based on datetime field in solr collection. > > I have field data_Start_time which stores the date value in and is of type > 'date' in my so

Re: delete solr data and index older than 3 days

2017-12-28 Thread Erick Erickson
First off, please don't optimize unless you're willing to do it every time, there's a long discussion of why here: https://issues.apache.org/jira/browse/LUCENE-7976. It's almost always a bad idea to optimize unless you're willing to optimize every time you update your index. But second

delete solr data and index older than 3 days

2017-12-28 Thread ppeddi
hi, Can anyone please post the syntax or some examples for deleting solr data and index older than 3 days based on datetime field in solr collection. I have field data_Start_time which stores the date value in and is of type 'date' in my solr collection. I want to delete the index/data older

Re: howto sum of terms of specific field in index

2017-12-21 Thread Emir Arnautović
ec 2017, at 09:23, Bernd Fehling <bernd.fehl...@uni-bielefeld.de> >>> wrote: >>> >>> Hi list, >>> >>> actually a simple question, but somehow i can't figure out how to get >>> the total number of terms in a field in the index, example:

Re: howto sum of terms of specific field in index

2017-12-21 Thread Bernd Fehling
t.com/ > > > >> On 21 Dec 2017, at 09:23, Bernd Fehling <bernd.fehl...@uni-bielefeld.de> >> wrote: >> >> Hi list, >> >> actually a simple question, but somehow i can't figure out how to get >> the total number of terms in a field in th

Re: howto sum of terms of specific field in index

2017-12-21 Thread Emir Arnautović
n Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > On 21 Dec 2017, at 09:23, Bernd Fehling <bernd.fehl...@uni-bielefeld.de> > wrote: > > Hi list, > > actually a simple question, but somehow i can't figure out how to get > the total number of t

howto sum of terms of specific field in index

2017-12-21 Thread Bernd Fehling
Hi list, actually a simple question, but somehow i can't figure out how to get the total number of terms in a field in the index, example: record_1: fruit: apple, banana, cherry record_2: fruit: apple, pineapple, cherry record_3: fruit: kiwi, pineapple record_4: fruit: - a search for fruit

Solr 7 - Issue with index replication to PULL replicas after schema/config change

2017-12-19 Thread Samuel Tatipamula
Hi everyone, We are running a SolrCloud (Solr 7.1.0) with 2 NRT, 2 TLOG, and 20 PULL replicas. Recently we have observed that, after a schema push to the Zookeeper (3.4.6), and upon hitting the collection RELOAD api, index has stopped replicating to all the PULL replicas, and they went out

Solr 7 - Issue with index replication to PULL replicas after schema/config change

2017-12-19 Thread Samuel Tatipamula
Hi everyone, We are running a SolrCloud (Solr 7.1.0) with 2 NRT, 2 TLOG, and 20 PULL replicas. Recently we have observed that, after a schema push to the Zookeeper (3.4.6), and upon hitting the collection RELOAD api, index has stopped replicating to all the PULL replicas, and they went out

Re: index fields with custom gaps between terms

2017-12-19 Thread Shawn Heisey
or not then it will be single value. If you have multiValued set to false and try to add more than one value to a field when you index, the indexing will fail with an error message in the log. Multiple values is different than multiple terms in a solr.TextField type. When a field

Re: index fields with custom gaps between terms

2017-12-19 Thread Amin Raeiszadeh
:29 AM, Amin Raeiszadeh wrote: >>>> thanks too much Erick and mikhail. >>>> i change SloppyPhraseScorer class for my custom behavior with some fields. >>>> so i need to index some fields with customized gap between terms of fields. >>>> i'm not profession wi

Re: index fields with custom gaps between terms

2017-12-18 Thread Amin Raeiszadeh
10. > thanks, > Amin > > On Tue, Dec 19, 2017 at 3:25 AM, Shawn Heisey <apa...@elyograg.org> wrote: >> On 12/18/2017 12:29 AM, Amin Raeiszadeh wrote: >>> thanks too much Erick and mikhail. >>> i change SloppyPhraseScorer class for my custom behavio

Re: index fields with custom gaps between terms

2017-12-18 Thread Amin Raeiszadeh
too much Erick and mikhail. >> i change SloppyPhraseScorer class for my custom behavior with some fields. >> so i need to index some fields with customized gap between terms of fields. >> i'm not profession with solr and i think with schema.xml only i can >> set fixed gap incr

Re: index fields with custom gaps between terms

2017-12-18 Thread Shawn Heisey
On 12/18/2017 12:29 AM, Amin Raeiszadeh wrote: > thanks too much Erick and mikhail. > i change SloppyPhraseScorer class for my custom behavior with some fields. > so i need to index some fields with customized gap between terms of fields. > i'm not profession with solr and i think wit

Re: index fields with custom gaps between terms

2017-12-18 Thread Erick Erickson
PM, Amin Raeiszadeh <amin24march1...@gmail.com> wrote: > thanks too much Erick and mikhail. > i change SloppyPhraseScorer class for my custom behavior with some fields. > so i need to index some fields with customized gap between terms of fields. > i'm not profession wit

Re: index fields with custom gaps between terms

2017-12-17 Thread Amin Raeiszadeh
thanks too much Erick and mikhail. i change SloppyPhraseScorer class for my custom behavior with some fields. so i need to index some fields with customized gap between terms of fields. i'm not profession with solr and i think with schema.xml only i can set fixed gap increment between terms

Re: index fields with custom gaps between terms

2017-12-17 Thread Mikhail Khludnev
; >> how is it possible to add custom gaps between terms of > >> TextFiled in solr documents. for example between term > >> number 1 and 2 we need 200 gap and between term number > >> 2 and 3 we need 1000 gap. > >> i can only send key-value pair as String t

Re: index fields with custom gaps between terms

2017-12-17 Thread Erick Erickson
;amin24march1...@gmail.com> >> wrote: >> >>> how is it possible to add custom gaps between terms of >>> TextFiled in solr documents. for example between term >>> number 1 and 2 we need 200 gap and between term number >>> 2 and 3 we need 1000 gap.

Re: index fields with custom gaps between terms

2017-12-17 Thread Amin Raeiszadeh
h <amin24march1...@gmail.com> > wrote: > >> how is it possible to add custom gaps between terms of >> TextFiled in solr documents. for example between term >> number 1 and 2 we need 200 gap and between term number >> 2 and 3 we need 1000 gap. >> i can only send key-valu

Re: index fields with custom gaps between terms

2017-12-16 Thread Mikhail Khludnev
<amin24march1...@gmail.com> wrote: > how is it possible to add custom gaps between terms of > TextFiled in solr documents. for example between term > number 1 and 2 we need 200 gap and between term number > 2 and 3 we need 1000 gap. > i can only send key-value pair as String to solr(the

index fields with custom gaps between terms

2017-12-16 Thread Amin Raeiszadeh
how is it possible to add custom gaps between terms of TextFiled in solr documents. for example between term number 1 and 2 we need 200 gap and between term number 2 and 3 we need 1000 gap. i can only send key-value pair as String to solr(then solr index fields in different types according

Re: adding data to existing index

2017-12-13 Thread Andrew Kelly
Hi Erick, I think it's json input I'm after. Here, some background might help, and sorry for being unclear in my query. To date my index is made up of about 2 dozen entities, collected with a pull from a remote mysql db. This db is the persistence layer of a CMS we use. One of our departments has

Re: reindex a lucene index with solr

2017-12-13 Thread Erick Erickson
Then you'll have to re-index probably after you set up your new collections. If you have stored _all_ your original fields you could query from your 4x and index to your SolrCloud, but it'd be best if you could just reindex from the original source. Best, Erick On Tue, Dec 12, 2017 at 10:02 PM

Re: adding data to existing index

2017-12-13 Thread Erick Erickson
So you need to index json docs? Or you want to process json input and get it to Solr via SolrJ? Or??? https://lucidworks.com/2012/02/14/indexing-with-solrj/ will give you a skeleton how to connect to Solr and index Solr docs. You'd have to parse the json and construct the SolrInputDocument

adding data to existing index

2017-12-13 Thread Andrew Kelly
I have an existing solr installation which uses the mysql jdbc driver to access a remote database and index some complex data structures. This has been serving me very well for a good long time, but I now need to expand it a bit. I need to add some additional data from another source via json

Re: reindex a lucene index with solr

2017-12-12 Thread Amin Raeiszadeh
you need to shard or not. But > there's no difference between the index built for stand-alone Solr and > SolrCloud. > > So just create your SolrCloud instance, probably single shard, 1 > replica with the appropriate configset. Then shut that down and copy > the index

Re: reindex a lucene index with solr

2017-12-12 Thread Erick Erickson
What you haven't told us is whether you need to shard or not. But there's no difference between the index built for stand-alone Solr and SolrCloud. So just create your SolrCloud instance, probably single shard, 1 replica with the appropriate configset. Then shut that down and copy the index from

solr slave do not delete old index files

2017-12-12 Thread Muke
The problem i am having is the old index files are not being deleted on the slave. After each replication, I can see the old files still hanging around This causes the data directory size to increase by the index size every replication until the disk fills up. master: -rw-r- 1

reindex a lucene index with solr

2017-12-12 Thread Amin Raeiszadeh
i have a lucene index that some fields of docs are indexed with custom incremental gaps and all fields are stored too(not only indexed). i need to import this docs to solr cloud. is there any way to automatically rebuild this docs for importing in solr with costum gaps by some thing likes

Re: Index size optimization between 4.5.1 and 4.10.4 Solr

2017-12-07 Thread Natarajan, Rajeswari
ded solr from 4.5.1 to 4.10.4 and we see index size reduction. Trying to see if any optimization done to decrease the index sizes , couldn’t locate. If anyone knows why please share. Here's a history where you can see the a summary of the changes in Lucene's index format

Re: Index size optimization between 4.5.1 and 4.10.4 Solr

2017-12-07 Thread Shawn Heisey
On 12/7/2017 1:27 PM, Natarajan, Rajeswari wrote: > We have upgraded solr from 4.5.1 to 4.10.4 and we see index size reduction. > Trying to see if any optimization done to decrease the index sizes , couldn’t > locate. If anyone knows why please share. Here's a history where yo

Index size optimization between 4.5.1 and 4.10.4 Solr

2017-12-07 Thread Natarajan, Rajeswari
Hi, We have upgraded solr from 4.5.1 to 4.10.4 and we see index size reduction. Trying to see if any optimization done to decrease the index sizes , couldn’t locate. If anyone knows why please share. Thank you, Rajeswari

Re: Skewed IDF in multi lingual index, again

2017-12-05 Thread Doug Turnbull
ield dependent). > This is because docCount(field dependent) represents a state in time > associated to the current index while maxDocs represents an historical > consideration. > A corpus of documents can change in time, and how much a term is rare can > drastically change (

Re: Skewed IDF in multi lingual index, again

2017-12-05 Thread alessandro.benedetti
than maxDocs(field dependent). This is because docCount(field dependent) represents a state in time associated to the current index while maxDocs represents an historical consideration. A corpus of documents can change in time, and how much a term is rare can drastically change ( let's pick an highly

Re: Skewed IDF in multi lingual index, again

2017-12-05 Thread Doug Turnbull
e I don't understand what > > I'm talking about, but that is the best I can come up with. " > > > > Thanks Shawn, yes, that is correct and I was aware of it. > > I was curious of another difference : > > I think we confirmed that docCount is local to the field ( than

Re: Skewed IDF in multi lingual index, again

2017-12-05 Thread Yonik Seeley
ents. Maybe I don't understand what > I'm talking about, but that is the best I can come up with. " > > Thanks Shawn, yes, that is correct and I was aware of it. > I was curious of another difference : > I think we confirmed that docCount is local to the field ( thanks Yonik fo

Re: Skewed IDF in multi lingual index, again

2017-12-05 Thread alessandro.benedetti
anks Shawn, yes, that is correct and I was aware of it. I was curious of another difference : I think we confirmed that docCount is local to the field ( thanks Yonik for that) so : docCount(index,field1)= # of documents in the index that currently have value(s) for field1 My question is : max

Re: Index Content Removing the HTML Tags.

2017-12-04 Thread Erick Erickson
Have you tried: HtmlStripCharFilterFactory? On Mon, Dec 4, 2017 at 12:37 PM, Fiz Newyorker <fiznewy...@gmail.com> wrote: > Hello Solr Group, > > Good Morning ! > > I am working on Solr 6.5 version and I am trying to Index from Mongo DB > 3.2.5. > > I have conte

Index Content Removing the HTML Tags.

2017-12-04 Thread Fiz Newyorker
Hello Solr Group, Good Morning ! I am working on Solr 6.5 version and I am trying to Index from Mongo DB 3.2.5. I have content collection in mongodb where there is body column which has html tags in it. I want to index body column with out html tags. *Please see the below body column data

Re: Skewed IDF in multi lingual index, again

2017-12-04 Thread Yonik Seeley
On Mon, Dec 4, 2017 at 1:35 PM, Shawn Heisey wrote: > I'm pretty sure that the difference between docCount and maxDoc is deleted > documents. docCount (not the best name) here is the number of documents with the field being searched. docFreq (df) is the number of documents

Re: Skewed IDF in multi lingual index, again

2017-12-04 Thread Shawn Heisey
On 12/4/2017 7:21 AM, alessandro.benedetti wrote: the reason docCount was improving things is because it was using a docCount relative to a specific field while maxDoc is global all over the index ? Lucene/Solr doesn't actually delete documents when you delete them, it just marks them

Re: Skewed IDF in multi lingual index, again

2017-12-04 Thread alessandro.benedetti
Furthermore, taking a look to the code for BM25 similarity, it seems to me it is currently working right : - docCount is used per field if != -1 /** * Computes a score factor for a simple term and returns an explanation * for that score factor. * * * The default implementation

Re: Skewed IDF in multi lingual index, again

2017-12-04 Thread alessandro.benedetti
guess. e.g. text_en -> 1 docs text_fr -> 1000 docs text_it -> 500 docs the reason docCount was improving things is because it was using a docCount relative to a specific field while maxDoc is global all over the index ? - --- Alessandro Benedetti Search Con

Solr index size statistics

2017-12-02 Thread John Davis
Hello, Is there a way to get index size statistics for a given solr instance? For eg broken by each field stored or indexed. The only things I know of is running du on the index data files and getting counts per field indexed/stored, however each field can be quite different wrt size. Thanks John

Re: Skewed IDF in multi lingual index, again

2017-11-30 Thread Walter Underwood
; To: solr-user@lucene.apache.org >> Subject: Re: Skewed IDF in multi lingual index, again >> >> I’ve occasionally considered using Unicode language tags (U+E001 and >> friends) on each term. That would make a term specific to a language, so we >> would get [en]LaserJe

RE: Skewed IDF in multi lingual index, again

2017-11-30 Thread Markus Jelsma
, hence the deboost is not too low. Thanks, Markus -Original message- > From:Walter Underwood <wun...@wunderwood.org> > Sent: Thursday 30th November 2017 17:29 > To: solr-user@lucene.apache.org > Subject: Re: Skewed IDF in multi lingual index, again > > I’ve occas

Re: Skewed IDF in multi lingual index, again

2017-11-30 Thread Walter Underwood
ed higher for some terms. > > It was solved back then by using docCount instead of maxDoc when calculating > idf, it worked really well! But, probably due to index changes, the problem > is back for some terms, mostly proper nouns, well, just like five years ago. >

Skewed IDF in multi lingual index, again

2017-11-30 Thread Markus Jelsma
Hello, We already discussed this problem five years ago [1]. In short: documents in foreign languages are scored higher for some terms. It was solved back then by using docCount instead of maxDoc when calculating idf, it worked really well! But, probably due to index changes, the problem

Inverted Index positions vs Term Vector positions

2017-11-27 Thread alessandro.benedetti
Hi all, it may sounds a silly question, but is there any reason that the term positions in the inverted index are using 1 based numbering while the Term Vector positions are using a 0 based numbering[1] ? This may affect different areas in Solr and cause problems which are quite tricky to spot

Re: Merging of index in Solr

2017-11-27 Thread Zheng Lin Edwin Yeo
Solr 7.1.0, and re-index the data. > > But as Solr 7.1.0 is still not ready to index EML files yet due to this > JIRA, https://issues.apache.org/jira/browse/SOLR-11622, we have to make > use with our current Solr 6.5.1 first, which was already created without > sharding from the star

Re: Merging of index in Solr

2017-11-23 Thread Zheng Lin Edwin Yeo
Hi Shawn, Thanks for the info. We will most likely be doing sharding when we migrate to Solr 7.1.0, and re-index the data. But as Solr 7.1.0 is still not ready to index EML files yet due to this JIRA, https://issues.apache.org/jira/browse/SOLR-11622, we have to make use with our current Solr

Re: Merging of index in Solr

2017-11-22 Thread Shawn Heisey
in Lucene. As I said earlier in this thread, a merge is **NOT** just a copy. Lucene must completely rebuild the data structures of the index to incorporate all of the segments of the source indexes into a single segment in the target index, while simultaneously *excluding* information from

Re: Merging of index in Solr

2017-11-22 Thread Zheng Lin Edwin Yeo
Hi Erick, Yes, we are planning to do sharding when we upgrade to the newer Solr 7.1.0, and probably will re-index everything. But currently we are waiting for certain issues on indexing the EML files to Solr 7.1.0 to be addressed first, like for this JIRA, https://issues.apache.org/jira/browse

Re: Merging of index in Solr

2017-11-22 Thread Erick Erickson
everything. As a straw-man recommendation, I'd put no more than 200G on each shard in terms of index size. Best, Erick On Wed, Nov 22, 2017 at 5:19 PM, Zheng Lin Edwin Yeo <edwinye...@gmail.com> wrote: > I'm doing the merging on the SSD drive, the speed should be ok? > > We need

Re: Merging of index in Solr

2017-11-22 Thread Zheng Lin Edwin Yeo
3.5TB. I claim that an index of that size is very difficult to work > with effectively. _Why_ do you want to do this? Do you have any > evidence that you'll be able to effectively use it? > > And Shawn tells you that the result will be one large segment. If you > replace documents

Re: Merging of index in Solr

2017-11-22 Thread Erick Erickson
Really, let's back up here though. This sure seems like an XY problem. You're merging indexes that will eventually be something on the order of 3.5TB. I claim that an index of that size is very difficult to work with effectively. _Why_ do you want to do this? Do you have any evidence that you'll

Re: Merging of index in Solr

2017-11-22 Thread Shawn Heisey
On 11/21/2017 9:10 AM, Zheng Lin Edwin Yeo wrote: > I am using the IndexMergeTool from Solr, from the command below: > > java -classpath lucene-core-6.5.1.jar;lucene-misc-6.5.1.jar > org.apache.lucene.misc.IndexMergeTool > > The heap size is 32GB. There are more than 20 million documents in the

Re: Merging of index in Solr

2017-11-22 Thread Zheng Lin Edwin Yeo
upport Training - http://sematext.com/ > > > > > On 22 Nov 2017, at 02:33, Zheng Lin Edwin Yeo <edwinye...@gmail.com> > wrote: > > > > Hi, > > > > I have encountered this error during the merging of the 3.5TB of index. > > What could be the cause that l

Re: Merging of index in Solr

2017-11-22 Thread Emir Arnautović
& Elasticsearch Consulting Support Training - http://sematext.com/ > On 22 Nov 2017, at 02:33, Zheng Lin Edwin Yeo <edwinye...@gmail.com> wrote: > > Hi, > > I have encountered this error during the merging of the 3.5TB of index. > What could be the cause that

Re: Merging of index in Solr

2017-11-21 Thread Zheng Lin Edwin Yeo
Hi, I have encountered this error during the merging of the 3.5TB of index. What could be the cause that lead to this? Exception in thread "main" Exception in thread "Lucene Merge Thread #8" java.io. IOException: background merge hit exception: _6f(6.5.1):C7256757 _6e(6

Re: Merging of index in Solr

2017-11-21 Thread Zheng Lin Edwin Yeo
e data rates I've seen to 25, then that would indicate > that an optimize on a 3.5TB is going to take about 39 hours, and might take > as long as 48 hours. And if you're running SolrCloud with multiple > replicas, multiply that by the number of copies of the 3.5TB index. An > optimiz

Re: Merging of index in Solr

2017-11-21 Thread Shawn Heisey
ours.  And if you're running SolrCloud with multiple replicas, multiply that by the number of copies of the 3.5TB index.  An optimize on a SolrCloud collection handles one shard replica at a time and works its way through the entire collection. If you are merging different indexes *together*, w

Re: Merging of index in Solr

2017-11-21 Thread Emir Arnautović
these 3.5TB. > The merging has already written the additional 3.5TB to another segment. > However, it is still not a single segment, and the size of the folder where > the merged index is supposed to be is now 4.6TB, This excludes the original > 3.5TB, meaning it is already using up 8.1TB

Re: Merging of index in Solr

2017-11-21 Thread Zheng Lin Edwin Yeo
Hi Emir, Thanks for your reply. There are only 1 host, 1 nodes and 1 shard for these 3.5TB. The merging has already written the additional 3.5TB to another segment. However, it is still not a single segment, and the size of the folder where the merged index is supposed to be is now 4.6TB

Re: Merging of index in Solr

2017-11-21 Thread Emir Arnautović
- is index static/updates free? Regards, Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > On 20 Nov 2017, at 17:35, Zheng Lin Edwin Yeo <edwinye...@gmail.com> wrote: > > Hi, > &

Merging of index in Solr

2017-11-20 Thread Zheng Lin Edwin Yeo
Hi, Does anyone knows how long usually the merging in Solr will take? I am currently merging about 3.5TB of data, and it has been running for more than 28 hours and it is not completed yet. The merging is running on SSD disk. I am using Solr 6.5.1. Regards, Edwin

Re: Index Message-ID from EML file to Solr

2017-11-16 Thread Zheng Lin Edwin Yeo
Hi, Just to check, is this feature available in Solr 6.5.1? Or is it only available in Solr 7? Regards, Edwin On 10 November 2017 at 19:45, Zheng Lin Edwin Yeo <edwinye...@gmail.com> wrote: > Hi, > > Can we index the Message-ID that is from the EML file into Solr? > Tika doe

Re: Index time boosting

2017-11-14 Thread Erick Erickson
Do not use index time boosting, please. When something is deprecated, the usual process is that that functionality is supported for one major version after deprecation, then the devs are free to remove it. Index time boosting is not supported in 7.0 even though it is in 6x, from CHANGES.txt

Re: Index time boosting

2017-11-14 Thread Venkateswarlu Bommineni
Thanks for the reply Amit. I have Solr 6.6 source code and I can still see the code which sets the index level boost value. If the class name is handy for you , could you please tell me where we will calculate the score of a document. so that i can just go through the code. Thanks, Venkat

Re: Index time boosting

2017-11-14 Thread Amrit Sarkar
Hi Venkat, FYI: Index time boosting has been deprecated from latest versions of Solr: https://issues.apache.org/jira/browse/LUCENE-6819. Not sure which version you are on, but best consider the comments on the JIRA before using it. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269

Index time boosting

2017-11-14 Thread Venkateswarlu Bommineni
Hello Guys, I would like to understand how index time boosting works in Solr. and how it is relates to ommitNorms property in schema.xml. and i am trying to understand how it works internally , if you have any documentation please provide. Thanks, Venkat.

Index Message-ID from EML file to Solr

2017-11-10 Thread Zheng Lin Edwin Yeo
Hi, Can we index the Message-ID that is from the EML file into Solr? Tika does have the Message-ID in the MIME data, but is Solr able to read it and index it? I'm using Solr 6.5.1. Regards, Edwin

AW: LatLonPointSpatialField, sorting : sort param could not be parsed as a query, and is not a field that exists in the index

2017-11-02 Thread Clemens Wyss DEV
ne.apache.org' <solr-user@lucene.apache.org> Betreff: LatLonPointSpatialField, sorting : sort param could not be parsed as a query, and is not a field that exists in the index Context: solr 6.6.0 Im switching my schemas from derprecated solr.LatLonType to solr.LatLonPointSpatialField. No

Using bits in multi tenant document routing index in single shard

2017-11-01 Thread Ketan Thanki
Hi, I have 4 shard and 4 replica and I do Composite document routing for my unique field 'Id' as mentions below. e.g : projectId:158380 modelId:3606 where tenants bits use as projectId/Numbits!modelId/Numbits! prefix with Id NumBits distributed as mention below 3 bits would spread the tenant

LatLonPointSpatialField, sorting : sort param could not be parsed as a query, and is not a field that exists in the index

2017-11-01 Thread Clemens Wyss DEV
query, and is not a field that exists in the index: geodist(b4_location__geo_si,47.36667,8.55)" Invoking sort by sfield=b4_location__geo_si=47.36667,8.55=geodist() asc works as expected though... Why does "sort=geodict(fld,lat,ln)" no more work? Thx for any hints advices Clemens

Re: Incomplete Index

2017-10-31 Thread Rick Leir
ts instead of writing new. > >In any case, I would suggest you change your approach in case you have >enough disk space to keep two copies of indices: >1. use alias to read data from index instead of index name >2. index data into new index >3. after verification (e.g. quick check wo

Re: Incomplete Index

2017-10-31 Thread Emir Arnautović
index instead of index name 2. index data into new index 3. after verification (e.g. quick check would be number of docs) switch alias to new index 4. keep old index available in case you need to switch back. 5. before indexing next index, delete one from previous day to free up space. In case you

Incomplete Index

2017-10-31 Thread o1webdawg
I have an index with about a million documents. It is the backend for a shopping cart system. Sometimes the inventory gets out of sync with solr and the storefront contains out of stock items. So I setup a scheduled task on the server to run at 12am every morning to delete the entire solr index

Re: Measuring time spent in analysis and writing to index

2017-10-20 Thread Zisis T.
Another thing you can do - and which has helped me in the past quite a few times - is to just run JVisualVM, attach to Solr's Java process and enable the CPU sampler under the Sampler tab. As you run indexing the methods that most time is spent on will appear near the top. -- Sent from:

Re: Measuring time spent in analysis and writing to index

2017-10-19 Thread Zisis T.
I've worked in the past for a Solr 5.x custom plugin using AspectJ to track the # of calls as well as the time spent inside /incrementToken()/ of all Tokenizers and Filters used during indexing. I could get stats per Solr indexing thread, not per indexing request though. In any case you could spot

Measuring time spent in analysis and writing to index

2017-10-19 Thread Nawab Zada Asad Iqbal
Hi, I want to analyze the time spent in different stages during add/update document request. E.g., I want to compare time spend in analysis vs writing to Lucene index. Does Solr provide any such thing? I have looked at [core/admin/mbeans?stats=true=json=true] which provides overall stats but I

Intermittent issue in solr index update

2017-10-18 Thread Bhaumik Joshi
Hi, I am facing "Cannot talk to ZooKeeper" issue intermittently in solr index update. While facing this issue strange thing is that there is no error in ZooKeeper logs and also all shards are showing active in solr admin panel. Please find below details logs and Solr server con

Re: is there a way to remove deleted documents from index without optimize

2017-10-16 Thread Shawn Heisey
On 10/12/2017 10:01 PM, Erick Erickson wrote: > You can use the IndexUpgradeTool that ships with each version of Solr > (well, actually Lucene) to, well, upgrade your index. So you can use > the IndexUpgradeTool that ships with 5x to upgrade from 4x. And the > one that ships with 6

Re: is there a way to remove deleted documents from index without optimize

2017-10-13 Thread Harry Yoo
Tool that ships with each version of Solr > (well, actually Lucene) to, well, upgrade your index. So you can use > the IndexUpgradeTool that ships with 5x to upgrade from 4x. And the > one that ships with 6x to upgrade from 5x. etc. > > That said, none of that is necessary _if_

Re: is there a way to remove deleted documents from index without optimize

2017-10-12 Thread Erick Erickson
You can use the IndexUpgradeTool that ships with each version of Solr (well, actually Lucene) to, well, upgrade your index. So you can use the IndexUpgradeTool that ships with 5x to upgrade from 4x. And the one that ships with 6x to upgrade from 5x. etc. That said, none of that is necessary _if_

Re: is there a way to remove deleted documents from index without optimize

2017-10-12 Thread Harry Yoo
ood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > > >> On Sep 22, 2015, at 6:01 PM, CrazyDiamond <crazy_diam...@mail.ru> wrote: >> >> my index is updating frequently and i need to remove unused documents from >> index after

query Slower with Document Routing while Use on Heavy Index Size

2017-10-11 Thread Ketan Thanki
HI, I have issue as mentions below while use Document Routing. 1: Query is slower with heavy index for below detail. Config: 4 shard and 4 replica,with 8.5 GB Index Size(2GB Index Size for each shard). -With routing parameter: q=worksetid_l:2028446%20AND%20modelid_l:23718=1&_ro

Re: FilterCache size should reduce as index grows?

2017-10-06 Thread Yonik Seeley
On Fri, Oct 6, 2017 at 6:50 AM, Toke Eskildsen wrote: > Letting the default use maxSizeMB would be better IMO. But I assume > that FastLRUCache is used for a reason, so that would have to be > extended to support that parameter first. FastLRUCache is the default on the filter cache

Re: FilterCache size should reduce as index grows?

2017-10-06 Thread Toke Eskildsen
On Thu, 2017-10-05 at 21:56 -0700, S G wrote: > So for large indexes, there is a chance that filterCache of 128 can > cause bad GC. Large indexes measured in document count, yes. Or you could argue that a large index is likely to be served with a much larger heap and that it will

Re: FilterCache size should reduce as index grows?

2017-10-05 Thread S G
So for large indexes, there is a chance that filterCache of 128 can cause bad GC. And for smaller indexes, it would really not matter that much because well, the index size is small and probably whole of it is in OS-cache anyways. So perhaps a default of 64 would be a much saner choice to get

Re: FilterCache size should reduce as index grows?

2017-10-05 Thread Yonik Seeley
On Thu, Oct 5, 2017 at 3:20 AM, Toke Eskildsen wrote: > On Wed, 2017-10-04 at 21:42 -0700, S G wrote: > > It seems that the memory limit option maxSizeMB was added in Solr 5.2: > https://issues.apache.org/jira/browse/SOLR-7372 > I am not sure if it works with all caches in Solr, but

Re: FilterCache size should reduce as index grows?

2017-10-05 Thread Yonik Seeley
On Thu, Oct 5, 2017 at 10:07 AM, Erick Erickson wrote: > The other thing I'd point out is that if your hit ratio is low, you > might as well disable it entirely. I'd normally recommend against turning it off entirely, except in *very* custom cases. Even if the user

Re: FilterCache size should reduce as index grows?

2017-10-05 Thread Erick Erickson
lue per entry, the default value of 128 >> values in will become 128x128mb = 16gb and would not be very good for >> a system running below 32 gb of memory. > > Sure. The default values are just that. For an index with 1M documents > and a lot of different filters, 128 would probably

Re: FilterCache size should reduce as index grows?

2017-10-05 Thread Toke Eskildsen
a big cache-value per entry,  the default value of 128 > values in will become 128x128mb = 16gb and would not be very good for > a system running below 32 gb of memory. Sure. The default values are just that. For an index with 1M documents and a lot of different filters, 128 would probably be to

<    3   4   5   6   7   8   9   10   11   12   >