Re: Good practices on indexing larger amount of documents at once using SolrJ

2018-07-20 Thread Arunan Sugunakumar
Dear Erick, Thank you for your reply. I initialize the arraylist variable with a new Array List after I add and commit the solrDocumentList into the solrClient. So I dont think I have the problem of ever increasing ArrayList. (I hope the add method in solrClient flushes the previous documents

Re: Exact Phrase search not returning results.

2018-07-20 Thread Tim Casey
Deepti, I am going to guess the analyzer part of the .net application is cutting off the last token. If you try the queries on the console of the running solr cluster, what do you get? If you dump that specific field for all the docs, can you find it with grep? tim On Fri, Jul 20, 2018 at

Re: Exact Phrase search not returning results.

2018-07-20 Thread Shawn Heisey
On 7/20/2018 8:33 AM, Krishnan, Deepti (NIH/OD) [C] wrote: > > We are working on a .net application using Solr. When we initially > launched the site we were using the 5.5.3 version and last sprint we > updated it to the 7.3.1 version. Everything is working fine ass > expected expect for one

Re: Exact Phrase search not returning results.

2018-07-20 Thread Steve Rowe
Hi Deepti, Your schema snippet didn’t make it to the list. Please repost as inline text rather than an image. -- Steve www.lucidworks.com > On Jul 20, 2018, at 10:33 AM, Krishnan, Deepti (NIH/OD) [C] > wrote: > > Hi, > > We are working on a .net application using Solr. When we initially

Re: Question regarding searching Chinese characters

2018-07-20 Thread Tomoko Uchida
Yes, while traditional - simplified transformation would be out of the scope of Unicode normalization, you would like to add ICUNormalizer2CharFilterFactory anyway :) Let me refine my example settings: Regards, Tomoko 2018年7月21日(土) 2:54 Alexandre Rafalovitch : > Would

Exact Phrase search not returning results.

2018-07-20 Thread Krishnan, Deepti (NIH/OD) [C]
Hi, We are working on a .net application using Solr. When we initially launched the site we were using the 5.5.3 version and last sprint we updated it to the 7.3.1 version. Everything is working fine ass expected expect for one feature. The exact phrase search does not return any value for

What is the cause of the below error?

2018-07-20 Thread rgummadi
What is the cause of the below errror. Is it a disconnect from the overseer node to zookeeper node. We are running a cluster with Solr 4.6. org.apache.solr.handler.admin.CoreAdminHandler - :org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for

Re: Question regarding searching Chinese characters

2018-07-20 Thread Alexandre Rafalovitch
Would ICUNormalizer2CharFilterFactory do? Or at least serve as a template of what needs to be done. Regards, Alex. On 20 July 2018 at 12:40, Walter Underwood wrote: > Looks like we need a charfilter version of the ICU transforms. That could run > before the tokenizer. > > I’ve never built

Re: Good practices on indexing larger amount of documents at once using SolrJ

2018-07-20 Thread Erick Erickson
I do this all the time with batches of 1,000 and don't see this problem. one thing that sometimes bites people is to fail to clear the doclist after every call to add. So you send ever-increasing batches to Solr. Assuming when you talk about batch size meaning the size of the solrDocunentList,

Re: Sorting issue while using collection parameter

2018-07-20 Thread Erick Erickson
Just tried this on master and can't reproduce. Didn't try 5.4. Any chance this is a multiValued field? That can sometimes confuse things. Best, Erick On Fri, Jul 20, 2018 at 2:50 AM, Vijay Tiwary wrote: > Hello Erick > > We are using string field and data is stored in lower case while

Re: Question regarding searching Chinese characters

2018-07-20 Thread Walter Underwood
Looks like we need a charfilter version of the ICU transforms. That could run before the tokenizer. I’ve never built a charfilter, but it seems like this would be a good first project for someone who wants to contribute. wunder Walter Underwood wun...@wunderwood.org

Good practices on indexing larger amount of documents at once using SolrJ

2018-07-20 Thread Arunan Sugunakumar
Hi, I have around 12 millions objects in my PostgreSQL database to be indexed. I'm running a thread to fetch the rows from the database. The thread will also create the documents and put it in an indexing queue. While this is happening my main process will retrieve the documents from the queue

Re: Creating a collection in Solr standalone mode using solrj

2018-07-20 Thread Arunan Sugunakumar
Hi Jason and Shawn, As you mentioned, I've mixed up the concept of a collection and core. Thank you for clearing up. Thank you, Arunan On 20 July 2018 at 20:31, Shawn Heisey wrote: > On 7/20/2018 12:09 AM, Arunan Sugunakumar wrote: > > I would like to know whether it is possible to create a

Re: SOLR 7.1 ClassicSimilarityFactory Problem

2018-07-20 Thread Erick Erickson
Why do you think you need to "fix" anything here? FieldNorm here is significantly different. On a quick scan (and you're right, trying to understand it all at a glance is daunting) your fieldNorm is lowering the score of the second doc. Basically the "two hits" are in a longer field so their

SOLR 7.1 ClassicSimilarityFactory Problem

2018-07-20 Thread Hodder, Rick
I am using SOLR 7.1 ClassicSimilarityFactory I have data in my core with field called CompanyName in an indexed field IDX_CompanyName Here are a few of the 900,000 rows in the core Cityview Citadel CivicVentures Clutch City Sports Clutch City Sports Entertainment Clutch City Sports

Time Routed Aliases & CDCR

2018-07-20 Thread Pavel Micka
Hello, We are planning to implement Time Routed Aliases to our solution. But one of our requirements is to be able to provide disaster recovery in case one of two Data Centers dies. We have a network between DCs, which is potentially unstable and has latencies in hundreds of millis. We were

Re: Question regarding searching Chinese characters

2018-07-20 Thread Tomoko Uchida
Exactly. More concretely, the starting point is: replacing your analyzer to and see if the results are as expected. Then research another filters if your requirements is not met. Just a reminder: HMMChineseTokenizerFactory do not handle traditional characters as I noted previous in

Re: Question regarding searching Chinese characters

2018-07-20 Thread Walter Underwood
I expect that this is the line that does the transformation: This mapping is a standard feature of ICU. More info on ICU transforms is in this doc, though not much detail on this particular transform. http://userguide.icu-project.org/transforms/general wunder Walter Underwood

Re: Creating a collection in Solr standalone mode using solrj

2018-07-20 Thread Shawn Heisey
On 7/20/2018 12:09 AM, Arunan Sugunakumar wrote: > I would like to know whether it is possible to create a collection in Solr > through SolrJ. I tried to create and it throws me an error saying that > "Solr instance is not running in SolrCloud mode. A "collection" is a SolrCloud concept. 

Re: Creating a collection in Solr standalone mode using solrj

2018-07-20 Thread Jason Gerlowski
Hi Arunan, Solr runs in one of two main modes: "Cloud" mode or "Standalone" mode. Collections can only be created in Cloud mode. Standalone mode doesn't allow creation of collections; it uses cores instead. From your error message above, it looks like the problem is that you're trying to create

Re: Question regarding searching Chinese characters

2018-07-20 Thread Susheel Kumar
I think so. I used the exact as in github On Fri, Jul 20, 2018 at 10:12 AM, Amanda Shuman wrote: > Thanks! That does indeed look promising... This can be added on top of > Smart Chinese, right? Or is it an alternative? > > > -- > Dr. Amanda

Re: Question regarding searching Chinese characters

2018-07-20 Thread Tomoko Uchida
Hi, There is ICUTransformFilter (that included Solr distribution) which also should be work for you. See the example settings: https://lucene.apache.org/solr/guide/7_4/filter-descriptions.html#icu-transform-filter Combine it with HMMChineseTokenizer.

Re: Question regarding searching Chinese characters

2018-07-20 Thread Amanda Shuman
Thanks! That does indeed look promising... This can be added on top of Smart Chinese, right? Or is it an alternative? -- Dr. Amanda Shuman Post-doc researcher, University of Freiburg, The Maoist Legacy Project PhD, University of California, Santa

Re: Memory requirements for TLOGs (7.3.1)

2018-07-20 Thread Shawn Heisey
On 7/18/2018 6:33 PM, Ash Ramesh wrote: > Thanks for the quick responses Shawn & Erick! Just to clarify another few > points: > 1. Does having a larger heap size impact ingesting additional documents to > the index (all CRUD operations) onto a TLOG? It's extremely difficult, maybe even

Re: SOLR 7.2.1 on SLES 11?

2018-07-20 Thread Shawn Heisey
On 7/19/2018 2:52 PM, Lichte, Lucas R - DHS (Tek Systems) wrote: > Welp, that didn't go spectacularly. All the OpenSuSE SLES 11 downloads are > RPM, both source and compiled. Non-relocatable. I did attempt to rebuild, > but it choked on the following dependencies: > > audit-devel is needed by

Re: Question regarding searching Chinese characters

2018-07-20 Thread Susheel Kumar
I think CJKFoldingFilter will work for you. I put 舊小說 in index and then each of A, B or C or D in query and they seems to be matching and CJKFF is transforming the 舊 to 旧 On Fri, Jul 20, 2018 at 9:08 AM, Susheel Kumar wrote: > Lack of my chinese language knowledge but if you want, I can do

Re: Question regarding searching Chinese characters

2018-07-20 Thread Susheel Kumar
Lack of my chinese language knowledge but if you want, I can do quick test for you in Analysis tab if you can give me what to put in index and query window... On Fri, Jul 20, 2018 at 8:59 AM, Susheel Kumar wrote: > Have you tried to use CJKFoldingFilter https://github.com/sul-dlss/ >

Re: Question regarding searching Chinese characters

2018-07-20 Thread Susheel Kumar
Have you tried to use CJKFoldingFilter https://github.com/sul-dlss/CJKFoldingFilter. I am not sure if this would cover your use case but I am using this filter and so far no issues. Thnx On Fri, Jul 20, 2018 at 8:44 AM, Amanda Shuman wrote: > Thanks, Alex - I have seen a few of those links

Re: Question regarding searching Chinese characters

2018-07-20 Thread Amanda Shuman
Thanks, Alex - I have seen a few of those links but never considered transliteration! We use lucene's Smart Chinese analyzer. The issue is basically what is laid out in the old blogspot post, namely this point: "Why approach CJK resource discovery differently? 2. Search results must be as

Re: Question regarding searching Chinese characters

2018-07-20 Thread Alexandre Rafalovitch
This is probably your start, if not read already: https://lucene.apache.org/solr/guide/7_4/language-analysis.html Otherwise, I think your answer would be somewhere around using ICU4J, IBM's library for dealing with Unicode: http://site.icu-project.org/ (mentioned on the same page above)

Re: Sorting issue while using collection parameter

2018-07-20 Thread Vijay Tiwary
Hello Erick We are using string field and data is stored in lower case while indexing. We have alias set up to query multiple collections simultaneously. alias=collection1, collection2 If we are querying through alias then sorting is broken. For e.g. Results for descending sort are as follows.

Re: Need an advice for architecture.

2018-07-20 Thread servus01
Well, thanks a lot. Chris Hostetter-3 wrote > The first question i have is why you are using a version of Solr that's > almost 5 years old. *Well, Solr is part of another software and integrated with this version. With next update they will also update Solr to ver. 7...* Chris Hostetter-3

Question regarding searching Chinese characters

2018-07-20 Thread Amanda Shuman
Hi all, We have a problem. Some of our historical documents have mixed together simplified and Chinese characters. There seems to be no problem when searching either traditional or simplified separately - that is, if a particular string/phrase is all in traditional or simplified, it finds it -

Creating a collection in Solr standalone mode using solrj

2018-07-20 Thread Arunan Sugunakumar
Hi, I would like to know whether it is possible to create a collection in Solr through SolrJ. I tried to create and it throws me an error saying that "Solr instance is not running in SolrCloud mode. " I am trying to upgrade a system to use solr which used lucene library in the past. In lucene,