Re: SolrTestCaseJ4 Error: java.lang.RuntimeException: java.io.IOException: Can't find resource...

2014-11-27 Thread Nishant Kelkar
Hi All, I'm trying to run a simple piece of code, to get SolrTestCaseJ4 to work. Here's my code: public class MyTest extends SolrTestCaseJ4 { @BeforeClass public static void init() throws Exception { initCore(solrconfig.xml, schema.xml); lrf =

Re: SolrTestCaseJ4 Error: java.lang.RuntimeException: java.io.IOException: Can't find resource...

2014-11-27 Thread Nishant Kelkar
As an additional issue related to the one above, I sometimes also get this error (and it's pretty random, the times that I get it): *java.lang.AssertionError: fix your classpath to have tests-framework.jar before lucene-core.jar* at __randomizedtesting.SeedInfo.seed([50225DA1F52F32BB]:0) at

Re: SolrTestCaseJ4 Error: java.lang.RuntimeException: java.io.IOException: Can't find resource...

2014-11-27 Thread Nishant Kelkar
Seems like I've resolved these issues: 1. A text search for rs_A_count_gte300k.txt throughout my IntelliJ project revealed that a file by that name was being expected by my schema.xml (thank you, blind copy/pasting). After removing the conflicting fields and a few other fields for which I didn't

Solr mlt doesn't return documents with exactly the same contents

2014-11-27 Thread hhc
I have two documents with ids aaa and bbb, and the titles of both documents are a black fox jumps over a red flower. I imported both documents, along with several other testing documents, two a core test. I want solr to return documents similar to document aaa, so I submited the following:

Terms vector for multiple documents

2014-11-27 Thread Norgorn
I'm working with social media data. We have blog posts in our index - text + authors_id. Now we need to clusterize authors by their texts. We need to get term vector not for documents, but one vector per one author (for all authors documents). We can't get all documents and then unite 'em cause

Re: Terms vector for multiple documents

2014-11-27 Thread Mikhail Khludnev
Presumably requesting pivot facets returns what are you asking for. However, it takes a time. Overall problem seems like more suitable for Mahout, or (really sorry for mentioning it) Hadoop. On Thu, Nov 27, 2014 at 3:01 PM, Norgorn lsunnyd...@mail.ru wrote: I'm working with social media data.

Re: Terms vector for multiple documents

2014-11-27 Thread Norgorn
Thanks, I'll learn about facets. Actually, we want to use Mahout, but it needs term vectors - so we faced the problem of receiving term vector for author from set of documents. Anyway the main reason of my question was the desire to learn, if I'm missing some simple solution, or not. So, thank u

Re: TrieLongField not store large longs correctly

2014-11-27 Thread Yonik Seeley
On Wed, Nov 26, 2014 at 10:38 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: Looks like one of these: http://stackoverflow.com/questions/1379934/large-numbers-erroneously-rounded-in-javascript Yeah, that's what Brendan pointed to earlier in this thread. In the UI code, we just seem to be

Re: SolrTestCaseJ4 Error: java.lang.RuntimeException: java.io.IOException: Can't find resource...

2014-11-27 Thread Erick Erickson
Thanks for closing this off _and_ providing info to others! Best, Erick On Thu, Nov 27, 2014 at 1:15 AM, Nishant Kelkar nishant@gmail.com wrote: Seems like I've resolved these issues: 1. A text search for rs_A_count_gte300k.txt throughout my IntelliJ project revealed that a file by that

confused about how to set a solr query timeout when using tomcat

2014-11-27 Thread solr-user
I inherited a set of some old 1.4x Solrs running under tomcat6/java6 while I will eventually upgrade them to a more recent solr/tomcat/java, I am unable to do in near term one of my priority fixes tho is to implement some sort of timeout for solr queries that exceed 1000ms (or so); ie if the

Re: confused about how to set a solr query timeout when using tomcat

2014-11-27 Thread Walter Underwood
How big is the index (document count, gigabytes)? How much RAM is on the servers? How big is your Java heap? How are the servers hosted? AWS? Long queries are often caused by long-tail queries fetched from disk. There are several ways to speed these up, but they all use RAM or SSD. wunder

Re: confused about how to set a solr query timeout when using tomcat

2014-11-27 Thread solr-user
millions of documents per shard, with a number of shards ~40gb index folder size 12gb of heap on a 16gb machine (this old Solr doesnt use O/S mem space like 4.x does) servers are hosted internally, and are powerful understood. as mentioned, we tuned the bulk of our queries to run very quickly

Re: Solr mlt doesn't return documents with exactly the same contents

2014-11-27 Thread Nishant Kelkar
Hey hhc, I am new to Solr, so pardon me if this throws you off. But I think the following piece of code is relevant to your problem from MoreLikeThisHandler#handleRequestBody(): // Find documents MoreLikeThis - either with a reader or a query //

RE: confused about how to set a solr query timeout when using tomcat

2014-11-27 Thread Toke Eskildsen
solr-user [solr-u...@hotmail.com] wrote: while we have optimized our queries for an average 50ms response time, we do occasionally see some that can run between 10 and 100 seconds. That sounds suspicious. Response times so far from your average indicates that there is special processing going

RE: confused about how to set a solr query timeout when using tomcat

2014-11-27 Thread solr-user
yes, that solr queries continue to run the query on the solr server even after a connection is broken was my understanding and concern as well I was hoping I had overlooked or missed something in Solr or Tomcat documentation that might do the job it is unfortunate if anyone else can think of

Re: Exception in unit tests for distributed search component

2014-11-27 Thread Shalin Shekhar Mangar
Is that the complete stack trace? There are multiple indexDoc methods in that class. Some of them assert that the response from control collection and the default collection are the same. However, in this case, it seems that an AssertionError is being sent from the server itself as a

Re: Solr mlt doesn't return documents with exactly the same contents

2014-11-27 Thread hhc
Hi Nishant, Thank you for the reply. I believe that solr removes the first document from the mlt list because a document is most similar to itself and thus should be removed. In my case, aaa and bbb are two different documents. When search for documents similar to aaa, the document aaa

Trying to get ALL scores from a previous search in a custom search component (last-components)

2014-11-27 Thread Darin Amos
Hello, I am trying to implement a Rollup Search component on a version of SOLR that exists previously to the parent/child additions, so I am trying to implement my own. The searches will be executed exclusively against the child documents, and I want to “rollup” those child documents into the

Re: Solr mlt doesn't return documents with exactly the same contents

2014-11-27 Thread hhc
After carefully reading the mlt parameters here https://wiki.apache.org/solr/MoreLikeThis I found that I can specify the following parameters to return bbb when search for similar documents of aaa: mlt.mintf=1 mlt.mindf=2 Details: mlt.mintf: Minimum Term Frequency - the frequency below which