Get first value in a multivalued field

2021-03-04 Thread ufuk yılmaz
Hi, Is it possible in any way to get the first value in a multivalued field? Using function queries, streaming expressions or any other way without reindexing? (Stream decorators have array(), but no way to get a value at a specific index?) Another one, is it possible to match a regex to a

RE: Idle timeout expired and Early Client Disconnect errors

2021-03-02 Thread ufuk yılmaz
more shards in the query are idle > beyond the timeout threshold. This happens because lot's of data is being > read from other shards. > > Breaking the query into small parts would be a good strategy. > > > > > Joel Bernstein > http://joelsolr.blogspot.com/ > > >

RE: Default conjunction behaving differently after field type change

2021-03-02 Thread ufuk yılmaz
I changed the tokenizer class from KeywordTokenizerFactory to WhitespaceTokenizerFactory for the query analyzer using the Schema API, it seems to have solved the problem. Sent from Mail for Windows 10 From: ufuk yılmaz Sent: 02 March 2021 20:47 To: solr-user@lucene.apache.org Subject: Default

RE: Schema API specifying different analysers for query and index

2021-03-02 Thread ufuk yılmaz
;:"solr.KeywordTokenizerFactory" }}} }' http://localhost:8983/solr/gettingstarted/schema So, indexAnalyzer/queryAnalyzer, rather than array: https://lucene.apache.org/solr/guide/8_8/schema-api.html#add-a-new-field-type Hope this works, Alex. P.s. Also check whether you are u

Running Simple Streaming expressions in a loop through SolrJ stops with read timeout after a few iterations

2021-03-02 Thread ufuk yılmaz
I’m using the following example on Lucidworks to use streaming expressions from SolrJ: https://lucidworks.com/post/streaming-expressions-in-solrj/ Problem is, when I run it inside a for loop, even the simplest expression (echo) stops executing after about 5 iterations. I thought the underlying

Schema API specifying different analysers for query and index

2021-03-02 Thread ufuk yılmaz
Hello, I’m trying to change a field’s query analysers. The following works but it replaces both index and query type analysers: { "replace-field-type": { "name": "string_ci", "class": "solr.TextField", "sortMissingLast": true, "omitNorms": true,

Default conjunction behaving differently after field type change

2021-03-02 Thread ufuk yılmaz
Hello all, >From the Solr 8.4 (my version) documentation: “The OR operator is the default conjunction operator. This means that if there is no Boolean operator between two terms, the OR operator is used. To search for documents that contain either "jakarta apache" or just "jakarta," use the

RE: Idle timeout expired and Early Client Disconnect errors

2021-03-01 Thread ufuk yılmaz
e that specifically suppresses these > errors without backporting the full Solr 9.0 changes which impact the > memory footprint of export. > > > > > Joel Bernstein > http://joelsolr.blogspot.com/ > > > On Mon, Mar 1, 2021 at 10:29 AM ufuk yılmaz > wrote: > >

Idle timeout expired and Early Client Disconnect errors

2021-03-01 Thread ufuk yılmaz
Hello all, I’m running a large streaming expression and feeding the result to update expression. update(targetCollection, ...long running stream here..., I tried sending the exact same query multiple times, it sometimes works and indexes some results, then gives exception, other times fails

RE: Select streaming expression, add a field to every tuple, replaceor raw not working

2021-02-26 Thread ufuk yılmaz
"text": "abc" }, { "EOF": true, "RESPONSE_TIME": 70 } ] } } --ufuk yilmaz Sent from Mail for Windows 10 From: ufuk yılmaz Sent: 26 February 2021 16:38 To: solr-user@lucene.apache.org Subject: Select streaming expression, add

Select streaming expression, add a field to every tuple, replace or raw not working

2021-02-26 Thread ufuk yılmaz
Hello all, Solr version 8.4 I have a very simple select expression here. What I’m trying to do is to add a constant value to incoming tuples. My collection has only 1 document. Id_str is of type String. Other fields are Solr generated. { "_version_":1692761378187640832,

Multivalued text_general field returns lowercased value in "if" function query

2021-02-23 Thread ufuk yılmaz
I have a type=”text_general” multivalued=”true” field, named fieldA. When I use a function query, with fields like fields=if(true, fieldA, -1), fieldA Response is: "response":{"numFound":1,"start":0,"maxScore":4.6553917,"docs":[ { "fieldA":["SomeMixedCaseValue"],

RE: Meaning of "Index" flag under properties and schema

2021-02-16 Thread ufuk yılmaz
properties and schema This list strips attachments so you'll have to figure out another way to show the difference, Cheers Charlie On 16/02/2021 15:16, ufuk yılmaz wrote: > > There’s a collection at our customer’s site giving weird exceptions > when a particular field is involved (ask

Meaning of "Index" flag under properties and schema

2021-02-16 Thread ufuk yılmaz
There’s a collection at our customer’s site giving weird exceptions when a particular field is involved (asked another question detailing that). When I inspected it, there’s only one difference between it and other dozens of fine working collections, which is, A text_general field in all

Significant terms expression giving error "length needs to be >= 1"

2021-02-15 Thread ufuk yılmaz
We have a SolrCloud cluster, version 8.4 At the customer’s site there’s a collection with very few documents, around 12. We usually have collections with hundreds of millions of documents, so that collection is a bit of an exception. When I send a significantTerms streaming expression it

Why Solr questions on stackoverflow get very few views and answers, if at all?

2021-02-12 Thread ufuk yılmaz
Is it because the main place for q is this mailing list, or somewhere else that I don’t know? Or Solr isn’t ‘hot’ as some other topics? Sent from Mail for Windows 10

Process copyField only when field is absent in update

2021-02-11 Thread ufuk yılmaz
When I have a copyfield directive like,

RE: Clarification on term facet method dvhash

2021-02-05 Thread ufuk yılmaz
tion you describe (high-cardinality > field, known low-cardinality for the particular domain) sounds like a > perfect use-case for dvhash. > > Michael > > On Fri, Feb 5, 2021 at 11:56 AM ufuk yılmaz > wrote: > >> Hello, >> >> I’m using Solr 8.4. V

Clarification on term facet method dvhash

2021-02-05 Thread ufuk yılmaz
Hello, I’m using Solr 8.4. Very excited about performance improvements in 8.8: http://joelsolr.blogspot.com/2021/01/optimizations-coming-to-solr.html As I understand the main determinator of performance and RAM usage of a terms facet is cardinality of the field in whole collection, but not the

RE: Streaming expressions, what is the effect of collection name inthe request url

2021-01-26 Thread ufuk yılmaz
be for one collection. This will be where the collection is compiled and run. It has no effect on what is actually being searched. That is specified in the expression themselves. Joel Bernstein http://joelsolr.blogspot.com/ On Wed, Jan 20, 2021 at 1:34 PM ufuk yılmaz wrote: > Do collect

Steps to write a custom StreamingExpression

2021-01-26 Thread ufuk yılmaz
Should I create a java project with a dependency on solrj, or solr core ?, then implement the Expressible interface then build my project as a jar and put it into each node of SolrColud’s classpath? Or should I take a completely different route? Many thanks ~ufuk Sent from Mail for Windows 10

RE: Solr Slack Workspace

2021-01-26 Thread ufuk yılmaz
It’s asking for a searchscale.com email address? Sent from Mail for Windows 10 From: Ishan Chattopadhyaya Sent: 26 January 2021 13:33 To: solr-user Subject: Re: Solr Slack Workspace There is a Slack backed by official IRC support. Please see

NullPointerException in Graph Traversal nodes streaming expression

2021-01-21 Thread ufuk yılmaz
Solr version 8.4. I’m getting an unexplanetory NullPointerException when executing a simple 2 level nodes stream, do you have any idea what may cause this? I tried setting /stream?partialResults=true=true and shards.tolerant=true in nodes expressions, with no luck. I also tried reading source

RE: Parallel streaming expression java.lang.IndexOutOfBoundsException

2021-01-21 Thread ufuk yılmaz
Looked at the source code of the parallel stream and it seems I need equal number of SHARDS and workers count parameter. I thought I needed as many replicas, it was shards. Maybe helps someone. Sent from Mail for Windows 10 From: ufuk yılmaz Sent: 21 January 2021 11:16 To: solr-user

RE: Parallel streaming expression java.lang.IndexOutOfBoundsException

2021-01-21 Thread ufuk yılmaz
It only works when I set workers to 1, which defeats the point of parallel. Sent from Mail for Windows 10 From: ufuk yılmaz Sent: 21 January 2021 11:16 To: solr-user@lucene.apache.org Subject: Parallel streaming expression java.lang.IndexOutOfBoundsException Hello all, https

Parallel streaming expression java.lang.IndexOutOfBoundsException

2021-01-21 Thread ufuk yılmaz
Hello all, https://lucene.apache.org/solr/guide/8_4/stream-decorator-reference.html#parallel I’m sending the same query in the docs, (just collection names changed) to my Solr but always getting the exception: { "result-set":{ "docs":[{

Streaming expressions, what is the effect of collection name in the request url

2021-01-20 Thread ufuk yılmaz
Do collection names in request url affect how the query works in any way? A streaming expression is sent to http://mySolrHost/solr/col1,col2/stream (notice multiple collections in url) Col1 has 2 shards, each have 3 replicas. * Shard1 has replicas on nodes A, B, C * Shard2 has replicas on D,E,F

Effects of shards and replicas on performance

2021-01-19 Thread ufuk yılmaz
I’m trying to learn all I can on Solr for a year now and I still scratch my head when it comes to effects of shards and replicas on performance. - info about my setup We have a SolrCloud setup with 6 nodes. Each collection has 2 shards and 2 replicas. 1 shard’s size is about 100GB.

SockerTimeoutException in long running streaming queries

2021-01-13 Thread ufuk yılmaz
When I performa a long running streaming expression, sometimes I get: { "error": { "metadata": [ "error-class", "org.apache.solr.common.SolrException", "root-error-class", "java.net.SocketTimeoutException" ], "msg":

RE: [Solr8.7] UI request reply empty after 8s

2021-01-13 Thread ufuk yılmaz
Hi, A while ago I asked the same thing here. Looking at the source javascript code of the frontend app, I saw a 10k millisecond timeout config in httpInterceptor inside app.js. I changed it to something much larger and results of long queries began to show. Hope it helps Sent from Mail for

What should I do when I see a collection "recovering" in SolrCloud?

2021-01-13 Thread ufuk yılmaz
Should I stop indexing new documents, or stop indexing and wait for collections to recover? Recently our disk got 100% full and Solr started to throw various errors. So I deleted some unnecessary documents and committed with expungeDeletes=true. It freed some space but many collections went

RE: Converting a collection name to an alias

2021-01-07 Thread ufuk yılmaz
-api.html#rename On Thu, Jan 7, 2021 at 2:07 PM ufuk yılmaz wrote: > > Hi again, > > Lets say I have a collection named A. > I’m trying to rename it to A_1, then create an alias named A, which points to > the A_1 collection. > Is this possible without deleting and reindexin

Converting a collection name to an alias

2021-01-07 Thread ufuk yılmaz
Hi again, Lets say I have a collection named A. I’m trying to rename it to A_1, then create an alias named A, which points to the A_1 collection. Is this possible without deleting and reindexing the collection from scratch? Regards, uyilmaz

Interpreting Solr indexing times

2021-01-07 Thread ufuk yılmaz
Hello all, I have been looking at our SolrCloud indexing performance statistics and trying to make sense of the numbers. We are using a custom Flume sink and sending updates to Solr (8.4) using SolrJ. I know these stuff depend on a lot of things but can you tell me if these statistics are

Monitoring Solr for currently running queries

2020-12-29 Thread ufuk yılmaz
Hello All, Is there a way to see currently executing queries in a SolrCloud? Or a general strategy to detect a query using absurd amount or resources? We are using Solr for not only simple querying, but running complex streaming expressions, facets with large data etc. Sometimes, randomly, CPU

Range faceting on timestamp field

2020-12-24 Thread ufuk yılmaz
Hello all, I have a plong field in my schema representing a Unix timestamp I’m doing a range facet over this field to find which event occured on which day. I’m setting “start” on some date at 00:00 o’clock, end on another, and setting gap to 86400 (total seconds in a day) ... "type":

Facet count issues wit multi sharded collections, same issue with multi collection queries?

2020-12-16 Thread ufuk yılmaz
Hi everyone, Last day I was comparing term+range facet counts from two different collections having exact same data and schema. Only difference is one collection has 2 shards, the other 1. After searching about this I came upon an article: medium.com My results were like this: Counts from

Copyfields, will there be any difference between source and dest if they are switched?

2020-12-11 Thread ufuk yılmaz
Hello all, Documentation states “Fields are copied before analysis is done, meaning you can have two fields with identical original content, but which use different analysis chains and are stored in the index differently.” I have a field definition for a case insensitive string which I use for

RE: Use stream result like a query (alternative to innerJoin)

2020-11-24 Thread ufuk yılmaz
Fetch would work for my specific case (since I’m working with id’s there’s no one to many), if I was able to restrict fetch’s target domain with a query. I would first get all possible deleted ids, then use fetch to the items collection. But then the current fetch implementation would find all

Use stream result like a query (alternative to innerJoin)

2020-11-22 Thread ufuk yılmaz
Hi all, I’m looking for a way to query two collections and find documents that exist in both, I know this can be done with innerJoin streaming expression but I want to avoid it, since one of the collection streams can possibly have billions of results: Let’s say two collections are:

How to use the "eval" streaming expression?

2020-11-18 Thread ufuk yılmaz
Hey, Can anyone give me an example on how can eval https://lucene.apache.org/solr/guide/8_4/stream-decorator-reference.html#eval be used? Docs says it allows to run streaming expressions those created on the fly, but I can’t wrap my head on how an expression can be created on the fly, maybe

RE: Using Multiple collections with streaming expressions

2020-11-12 Thread ufuk yılmaz
Many thanks for the info Joel --ufuk Sent from Mail for Windows 10 From: Joel Bernstein Sent: 12 November 2020 17:00 To: solr-user@lucene.apache.org Subject: Re: Using Multiple collections with streaming expressions T

RE: Using Multiple collections with streaming expressions

2020-11-10 Thread ufuk yılmaz
Thanks again Erick, that’s a good idea! Alternatively, I use an alias covering multiple collections in these situations, but there may be too many combinations of collections, so it’s not always suitable. Merged significantTerms streams will have meaningles scores in tuples I think, it would

Using Multiple collections with streaming expressions

2020-11-09 Thread ufuk yılmaz
For example the streaming expression significantTerms: https://lucene.apache.org/solr/guide/8_4/stream-source-reference.html#significantterms significantTerms(collection1, q="body:Solr", field="author", limit="50",