Re: comparing feature vectors using Solr/Lucene

2014-11-26 Thread Mikhail Khludnev
Hello, Lucene rocks in calculating scalar product (a score of whatever similarity) of sparse feature vectors. That's it. Note that 'feature' usually means a term, and 'feature vector' is a document. Which might be opposite to your problem definition. You can either expand the definition of your pr

Re: comparing feature vectors using Solr/Lucene

2014-11-26 Thread Paul Libbrecht
Upayavira, on the lucene list, two tools are sometimes talked about which might be doing some of what you are searching: - semanticvectors (https://code.google.com/p/semanticvectors) - word2vec https://github.com/kojisekig/word2vec-lucene/i Maybe it helps? I'm under the impression that you are ra

Re: comparing feature vectors using Solr/Lucene

2014-11-26 Thread Upayavira
Thanks Nicholas, there is a sense in which Solr isn't the right tool. However, we already have lots of business rules encapsulated into filter queries, and already have content ingestion pipelines for our content in place. TF-IDF similarity is pluggable (even just by sorting on function queries),

Re: TrieLongField not store large longs correctly

2014-11-26 Thread Alexandre Rafalovitch
Looks like one of these: http://stackoverflow.com/questions/1379934/large-numbers-erroneously-rounded-in-javascript In the UI code, we just seem to be using JSON object's native functions. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http:

Re: TrieLongField not store large longs correctly

2014-11-26 Thread Thomas L. Redman
I was using the SOLR administrative interface to issue my queries. When I bypass the administrative interface and go directly to SOLR, the JSON return indicates the AID is as it should be. The issue is in the presentation layer of the Solr Admin UI. Which is good news. Thanks all, my bad. Shoul

Re: TrieLongField not store large longs correctly

2014-11-26 Thread Yonik Seeley
Yeah, XML was fine, JSON outside admin was fine... it's definitely just the client (admin). Oh, you meant the JSON formatting code in the client - yeah. Hopefully there is a way to fix it w/o sacrificing our nice syntax highlighting. -Yonik http://heliosearch.org - native code faceting, facet func

Re: TrieLongField not store large longs correctly

2014-11-26 Thread Alexandre Rafalovitch
Sounds like a JSON formatting code then? What happens when the return format is XML? Also, what happens if the request is made with browser debug panel open and we can compare what is on the wire with what is in the browser? Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov

Re: TrieLongField not store large longs correctly

2014-11-26 Thread Yonik Seeley
On Wed, Nov 26, 2014 at 7:57 PM, Brendan Humphreys wrote: > I'd wager this is a loss of precision caused by Javascript rounding in the > admin client. More details here: > > http://stackoverflow.com/questions/1379934/large-numbers-erroneously-rounded-in-javascript Ah, indeed - I was testing direc

Re: TrieLongField not store large longs correctly

2014-11-26 Thread Yonik Seeley
On Wed, Nov 26, 2014 at 7:30 PM, Erick Erickson wrote: > Hmmm, this seems to be browser related > because if I use curl or Safari, the return and display > are fine. > > i.e. > curl http://localhost:8983/solr/collection1/query?q=*:* > > displays: > > "eoe_tl":20140716126615472, > "

Re: TrieLongField not store large longs correctly

2014-11-26 Thread Brendan Humphreys
I'd wager this is a loss of precision caused by Javascript rounding in the admin client. More details here: http://stackoverflow.com/questions/1379934/large-numbers-erroneously-rounded-in-javascript Cheers, -Brendan On 27 November 2014 at 11:45, Erick Erickson wrote: > Yep, see my second e-m

Re: TrieLongField not store large longs correctly

2014-11-26 Thread Erick Erickson
Yep, see my second e-mail. I tried a unit test too and couldn't get it to fail. On Wed, Nov 26, 2014 at 4:34 PM, Yonik Seeley wrote: > On Wed, Nov 26, 2014 at 7:10 PM, Erick Erickson > wrote: >> This is very weird, someone want to check this out to insure that I'm >> not hallucinating? > > I ju

Re: TrieLongField not store large longs correctly

2014-11-26 Thread Yonik Seeley
On Wed, Nov 26, 2014 at 7:10 PM, Erick Erickson wrote: > This is very weird, someone want to check this out to insure that I'm > not hallucinating? I just tried the following in Heliosearch, since I had it open (based on 4.10.x): @Test public void testWeird() throws Exception { Client cl

Re: TrieLongField not store large longs correctly

2014-11-26 Thread Erick Erickson
Hmmm, this seems to be browser related because if I use curl or Safari, the return and display are fine. i.e. curl http://localhost:8983/solr/collection1/query?q=*:* displays: "eoe_tl":20140716126615472, "eoe_s":"20140716126615472", "eoe_tl":20140716126615474, "e

Re: TrieLongField not store large longs correctly

2014-11-26 Thread Erick Erickson
This is very weird, someone want to check this out to insure that I'm not hallucinating? Because it looks like a JIRA to me. I tried this with 4.8.0 (because I had it handy) and 5x, same results Indexed three docs with eoe_tl and eoe_s pairs: eoe_tl is a tlong eoe_s is a string doc1 has eoe_tl=

Re: soft commit and deletions

2014-11-26 Thread Erick Erickson
Thanks for closing this off. it'd have been a pretty serious thing if soft commits weren't working. Erick On Wed, Nov 26, 2014 at 12:58 PM, Andreas Hubold wrote: > Thank you, Shawn and Erick! > > With your hint about the re-used searcher I was able to find my error. I > must wait for the newly o

Re: TrieLongField not store large longs correctly

2014-11-26 Thread Jack Krupansky
Your query has a space in it after the colon, which is not valid. Could you post the actual, full query request, as well as the full query response? -- Jack Krupansky -Original Message- From: Thomas L. Redman Sent: Wednesday, November 26, 2014 2:45 PM To: solr-user@lucene.apache.org S

Re: comparing feature vectors using Solr/Lucene

2014-11-26 Thread Nicholas Ding
I'm not sure if Solr is the right tool to do this task. You probably need a machine learning library like Mahout or Weka. PS: Lucene doesn't really use Cosine Similarity, it's using a practical TF-IDF Similarity. Nicholas Ding On Wed, Nov 26, 2014 at 3:05 PM, Upayavira wrote: > Hi, > > I've be

Re: soft commit and deletions

2014-11-26 Thread Andreas Hubold
Thank you, Shawn and Erick! With your hint about the re-used searcher I was able to find my error. I must wait for the newly opened searcher when calling the commit method: solrServer.commit(false, true /*waitSearcher*/, true /*softCommit*/); instead of solrServer.commit(false, false, true);

comparing feature vectors using Solr/Lucene

2014-11-26 Thread Upayavira
Hi, I've been asked how to use Solr as a component in a machine learning system, doing document comparison based upon feature vectors. If I have two vectors, one in the index (in some form) and one in the query (in some form), how can I do, for example, a vector multiplication of the two vectors

TrieLongField not store large longs correctly

2014-11-26 Thread Thomas L. Redman
I believe I have encountered a bug in SOLR. I have a data type defined as follows: I have not been able to reproduce this problem for smaller numbers, but for some of the very large numbers, the value that gets stored for this “aid” field is not the same as the number that gets indexed. For e

Re: Dealing with bad apples in a SolrCloud cluster

2014-11-26 Thread Ramkumar R. Aiyengar
As Eric mentions, his change to have a state where indexing happens but querying doesn't surely helps in this case. But these are still boolean decisions of send vs don't send. In general, it would be nice to abstract the routing policy so that it is pluggable. You could then do stuff like have a

Re: cross site scripting

2014-11-26 Thread Yonik Seeley
On Wed, Nov 26, 2014 at 11:41 AM, Lee Carroll wrote: > Just out of interest, what is the use-case for a pseudo-field whose value > is a repeat of the field name? Not having to specify a field name for the function query: fl=add(x,y) somes back as (for example) "add(x,y)" : 14.2 And constants

Re: cross site scripting

2014-11-26 Thread Lee Carroll
Ok. So for the purposes of documenting the thread the pseudo-fields stuff is here https://issues.apache.org/jira/browse/SOLR-2444 The solution is either allow clients to generate queries which use pseudo field queries and ensure the client uses returned data with care (as if it is user input) or

Re: soft commit and deletions

2014-11-26 Thread Erick Erickson
As Shawn says, deletes should be visible after a soft commit. Let's see the code though. If you re-use a searcher that you had open before the commit, it'll still see the old snapshot of the index including the deleted documents. Or if you do open a new searcher and any autowarming hasn't complete

Re: soft commit and deletions

2014-11-26 Thread Shawn Heisey
On 11/26/2014 8:18 AM, Andreas Hubold wrote: > But I'm still not totally sure. Does a soft commit also make deleted > documents invisible? > > In a test with an EmbeddedSolrServer I triggered a soft commit and was > still able to find a deleted document afterwards. Is this as expected? All change

Re: cross site scripting

2014-11-26 Thread Yonik Seeley
On Wed, Nov 26, 2014 at 10:47 AM, Lee Carroll wrote: > The applications using the data may write solr data to the dom. (I doubt > they do but they could now or in the future. They have an expectation of > trusting the data back from solr). > > As a straight forward attack you are right though. But

Re: cross site scripting

2014-11-26 Thread Lee Carroll
The applications using the data may write solr data to the dom. (I doubt they do but they could now or in the future. They have an expectation of trusting the data back from solr). As a straight forward attack you are right though. But it is incorrect behavior? It should not produce bogus fields a

Re: cross site scripting

2014-11-26 Thread Yonik Seeley
It would have been helpful if you would have pointed out exactly what you think the problem is. I still don't see an issue, since it doesn't look like any encapsulation has been broken. -Yonik http://heliosearch.org - native code faceting, facet functions, sub-facets, off-heap data On Wed, Nov 2

soft commit and deletions

2014-11-26 Thread Andreas Hubold
Hi, I've read about soft commits in Erick Erickson's excellent blog article [1]: > The thing to understand most about soft commits are that they will make documents visible But I'm still not totally sure. Does a soft commit also make deleted documents invisible? In a test with an EmbeddedS

Re: cross site scripting

2014-11-26 Thread Alexandre Rafalovitch
I think I saw some JIRAs on various items, but not sure about this specific one. But are you exposing Solr directly to the web? Because that's a big no-no for multiple reasons. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-s

cross site scripting

2014-11-26 Thread Lee Carroll
Hi All, In solr 4.7 this query /solr/coreName/select/?q=*:*&fl=%27nasty%20value%27&rows=1&wt=json returns {"responseHeader":{"status":0,"QTime":2},"response":{"numFound":189796,"start":0,"docs":[{"'nasty value'":"nasty value"}]}} This is naughty. Has this been seen before / fixed ?

AW: CoreContainer : create new cores reusing/sharing solrconfig.xml and schema.xml

2014-11-26 Thread Clemens Wyss DEV
To whom it may concern: I had (at least) two issues 1) setting CoreDescriptor property/ies Properties coreProps = new Properties(); coreProps.setProperty( CoreDescriptor.CORE_CONFIGSET, "configset" ); CoreDescriptor cd = new CoreDescriptor( container, "test", pathToSolrCores + "/test", coreProps )

Getting multiple Result for same document, doing a dateRange query on multiple date field

2014-11-26 Thread Sven Schönfeldt
Hi Solr-Users, i like to do a date range query on a multiple date field "dateField_dts:[NOW TO NOW+7DAY]“. If the query find a document that has more then one date matching in that range, it would be nice to have multiple times the document in the result, with an identification what date the re

Move a shard from one disk to another

2014-11-26 Thread yriveiro
Hi, I need to move some data from one disk to another one. My question is if can I move the shard and do a symlink on the place where the shard was? This works? - Best regards -- View this message in context: http://lucene.472066.n3.nabble.com/Move-a-shard-from-one-disk-to-another-tp41710

Re: Missing value with Date Range

2014-11-26 Thread nabil Kouici
Any idea about this issue? Regards,Nabil De : nabil Kouici À : "solr-user@lucene.apache.org" Envoyé le : Lundi 24 novembre 2014 15h40 Objet : Missing value with Date Range   Hi All, I'm trying to get missing count with Date Range adding facet.missing=true as parameter but this n

Re: updateNumericDocValue in solr 4.6.1

2014-11-26 Thread Michael Sokolov
Yes - here's a working example we have in production (tested in 4.8.1 and 4.10.2, but the underlying lucene stuff hasn't changed since 4.6.1 I'm pretty sure): https://github.com/safarijv/ifpress-solr-plugin/blob/master/src/main/java/com/ifactory/press/db/solr/processor/UpdateDocValuesProcessor.

Re: updateNumericDocValue in solr 4.6.1

2014-11-26 Thread lboutros
Hello Suchi, I'm using this Lucene function with Solr 4.6.1 in a specific Update Processor and it's working well. How do you test the update ? I'm using a ValueSourceRangeFilter with a LongFieldSource parameter. Ludovic. - Jouve France. -- View this message in context: http://lucene.4720