Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 20 Feb 2018, at 20:39, Roman Chyla wrote:
> >
> > Say there is a high load and I'd like to bring a new machine and let it
> > replicate the index, if 100gb and more can be shaved, i
well at least.
>
> On Tue, Feb 20, 2018 at 10:27 AM, Roman Chyla
> wrote:
>
> > Hello,
> >
> > We have a use case of a very large index (slave-master; for unrelated
> > reasons the search cannot work in the cloud mode) - one of the fields is
> a
>
Hello,
We have a use case of a very large index (slave-master; for unrelated
reasons the search cannot work in the cloud mode) - one of the fields is a
very large text, stored mostly for highlighting. To cut down the index size
(for purposes of replication/scaling) I thought I could try to save it
amp; !(i < liveDocs.length() && liveDocs.get(i))) {
i++;
continue;
}
transformer.process(docBase, i);
i++;
}
}
}
}
On Wed, Aug 17, 2016 at 1:22 PM, Roman Chyla wrote:
> Joel, thanks, but which of them? I'v
values are available.
--roman
On Tue, Aug 16, 2016 at 9:54 PM, Joel Bernstein wrote:
> You'll want to use org.apache.lucene.index.DocValues. The DocValues api has
> replaced the field cache.
>
>
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On
I need to read data from the index in order to build a special cache.
Previously, in SOLR4, this was accomplished with FieldCache or
DocTermOrds
Now, I'm struggling to see what API to use, there is many of them:
on lucene level:
UninvertingReader.getNumericDocValues (and others)
.getNumericValue
Hi,
I'm hoping someone has seen/encountered a similar problem. We have
solr instances with all Jetty threads in BLOCKED state. The
application does not respond to any http requests.
It is SOLR 4.9 running inside docker on Amazon EC2. Jetty is 8.1 and
there is an nginx proxy in front of it (with p
I've taken the route of extending solr, the repo checks out solr and builds
on top of that. The hard part was to figure out how to use solr test
classes and the default location for integration tests, but once there, it
is relatively easy. Google for montysolr, the repo is on github.
Roman
On Oct 1
Or you could also apply XSL to returned records:
https://wiki.apache.org/solr/XsltResponseWriter
On Thu, Oct 8, 2015 at 5:06 PM, Uwe Reh wrote:
> Hi,
>
> my suggestions are probably to simple, because they are not a real
> protection of privacy. But maybe one fits to your needs.
>
> Most simple:
I'd like to offer another option:
you say you want to match long query into a document - but maybe you
won't know whether to pick "Mad Max" or "Max is" (not mentioning the
performance hit of "*mad max*" search - or is it not the case
anymore?). Take a look at the NGram tokenizer (say size of 2; or
Hi,
inStockSkusBitSet.get(currentChildDocNumber)
Is that child a lucene id? If yes, does it include offset? Every index
segment starts at a different point, but docs are numbered from zero. So to
check them against the full index bitset, I'd be doing
Bitset.exists(indexBase + docid)
Just one thin
It shouldn't matter. Btw try a url instead of a file path. I think the
underlying loading mechanism uses java File , it could work.
On May 4, 2015 2:07 AM, "Zheng Lin Edwin Yeo" wrote:
> Would like to check, will this method of splitting the synonyms into
> multiple files use up a lot of memory?
ose to the solution.
> Any thoughts there?
>
> I appreciate your help on this matter.
>
> Thank you,
>
> Kaushik
>
>
>
> On Wed, Apr 29, 2015 at 5:48 PM, Roman Chyla
> wrote:
>
> > Hi Kaushik, I meant to compare tween 20 against "tween 20
t; "parsedquery": "name:tweenx20",
> "parsedquery_toString": "name:tweenx20",
> "explain": {},
>
> Thank you,
>
> Kaushik
>
>
> On Wed, Apr 29, 2015 at 4:00 PM, Roman Chyla
> wrote:
>
> > Pls post o
TE 20 [MART.],SORBIMACROGOL LAURATE
> 300,POLYSORBATE 20 [FHFI],FEMA NO. 2915,POLYSORBATE 20 [FCC],POLYSORBATE 20
> [WHO-DD],POLYSORBATE 20 [VANDF]
>
> *Autophrase.txt...*
>
> Has all the above phrases in one column
>
> *Indexed document....*
>
>
> 31
> Poly
I'm not sure I understand - the autophrasing filter will allow the
parser to see all the tokens, so that they can be parsed (and
multi-token synonyms) identified. So if you are using the same
analyzer at query and index time, they should be able to see the same
stuff.
are you using multi-token syn
hanks,
Roman
On 30 Jan 2015 21:51, "Shawn Heisey" wrote:
> On 1/30/2015 1:07 PM, Roman Chyla wrote:
> > There exists a new open-source implementation of a search interface for
> > SOLR. It is written in Javascript (using Backbone), currently in version
> > v1.0.19 - bu
Hi everybody,
There exists a new open-source implementation of a search interface for
SOLR. It is written in Javascript (using Backbone), currently in version
v1.0.19 - but new features are constantly coming. Rather than describing it
in words, please see it in action for yourself at http://ui.ads
I think this makes sense to (ie. the setup), since the search is getting 1K
documents each time (for textual analysis, ie. they are probably large
docs), and use Solr as a storage (which is totally fine) then the parallel
multiple drive i/o shards speed things up. The index is probably large, so
it
,
but that was one year ago...
On Tue, Jan 6, 2015 at 5:20 PM, Vishal Swaroop wrote:
> Thanks Roman... I will check it... Maybe it's off topic but how about
> Angular...
> On Jan 6, 2015 5:17 PM, "Roman Chyla" wrote:
>
> > Hi Vishal, Alexandre,
> >
&
Hi Vishal, Alexandre,
Here is another one, using Backbone, just released v1.0.16
https://github.com/adsabs/bumblebee
you can see it in action: http://ui.adslabs.org/
While it primarily serves our own needs, I tried to architect it to be
extendible (within reasonable limits of code, man power)
Hi Leonid,
I didn't look into solr qparser for a long time, but I think you should be
able to combine different query parsers in one query. Look at the
SolrQueryParser code, maybe now you can specify custom query parser for
every clause (?), st like:
foo AND {!lucene}bar
I dont know, but worth e
parser or parser plugin?
>
> I might not have followed you, this discussing challenges my understanding
> of Lucene and SOLR.
>
> Darin
>
>
>
> > On Dec 5, 2014, at 12:47 PM, Roman Chyla wrote:
> >
> > Hi Mikhail, I think you are right, it won't be pro
> onto segment keys, hence it exclude such leakage across different
> searchers.
>
> On Fri, Dec 5, 2014 at 6:43 AM, Roman Chyla wrote:
>
> > +1, additionally (as it follows from your observation) the query can get
> > out of sync with the index, if eg it was saved for
+1, additionally (as it follows from your observation) the query can get
out of sync with the index, if eg it was saved for later use and ran
against newly opened searcher
Roman
On 4 Dec 2014 10:51, "Darin Amos" wrote:
> Hello All,
>
> I have been doing a lot of research in building some custom
Hi, What will replace spans, if spans are nuked ?
Roman
On 17 May 2014 09:15, "Ahmet Arslan" wrote:
> Hi,
>
>
> Payloads are used to store arbitrary data along with terms. You can
> influence score with these arbitrary data.
> See :
> http://sujitpal.blogspot.com.tr/2013/07/porting-payloads-to-so
perhaps useful, here is an open source implementation with near[digit]
support, incl analysis of proximity tokens. When days become longer maybe
itwill be packaged into a nice lib...:-)
https://github.com/romanchyla/montysolr/blob/master/contrib/adsabs/grammars/ADS.g
On 25 Mar 2014 00:14, "Salman
Hi Tri,
Look at this:
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201307.mbox/%3CCAEN8dyX_Am_v4f=5614eu35fnhb5h7dzkmkzdfwvrrm1xpq...@mail.gmail.com%3E
Roman
On 13 Feb 2014 03:39, "Tri Cao" wrote:
> Hi Joel,
>
> Thanks a lot for the suggestion.
>
> After thinking more about this, I t
Hi Rajeev,
You can take this:
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201307.mbox/%3CCAEN8dyX_Am_v4f=5614eu35fnhb5h7dzkmkzdfwvrrm1xpq...@mail.gmail.com%3E
I haven't created the jira yet, but I have improved the plugin. Recently, I
have seen a use case of passing 90K identifiers /
And perhaps one other, but very pertinent, recommendation is: allocate only
as little heap as is necessary. By allocating more, you are working against
the OS caching. To know how much is enough is bit tricky, though.
Best,
roman
On Wed, Feb 12, 2014 at 2:56 PM, Shawn Heisey wrote:
> On 2/1
objects
with holding to some big object etc/. Btw if i study the graph, i see that
there *are* warning signs. That's the point of testing/measuring after all,
IMHO.
--roman
On 8 Feb 2014 13:51, "Shawn Heisey" wrote:
> On 2/8/2014 11:02 AM, Roman Chyla wrote:
> > I would be c
I would be curious what the cause is. Samarth says that it worked for over
a year /and supposedly docs were being added all the time/. Did the index
grew considerably in the last period? Perhaps he could attach visualvm
while it is in the 'black hole' state to see what is actually going on. I
don't
Isaac, is there an easy way to recognize this problem? We also index
synonym tokens in the same position (like you do, and I'm sure that our
positions are set correctly). I could test whether the default similarity
factory in solrconfig.xml had any effect (before/after reindexing).
--roman
On Mo
nts are write-once. It's been
> a long standing design that deleted data will be
> reclaimed on segment merge, but not before. It's
> pretty expensive to change the terms loaded on the
> fly to respect deleted document's removed data.
>
> Best,
> Erick
>
>
Hi,
I'd like to check - there is something I don't understand about cache - and
I don't know if it is a bug, or feature
the following calls return a cache
FieldCache.DEFAULT.getTerms(reader, idField);
FieldCache.DEFAULT.getInts(reader, idField, false);
the resulting arrays *will* contain entrie
roman
On Mon, Nov 25, 2013 at 7:54 PM, Roman Chyla wrote:
>
>
>
> On Mon, Nov 25, 2013 at 12:54 AM, Mikhail Khludnev <
> mkhlud...@griddynamics.com> wrote:
>
>> Roman,
>>
>> I don't fully understand your question. After segment is flushed it
n't know if
they are in the middle of some regeneration or not, and they should not
keep a state (of previous index) - as they can be shared by threads that
build the cache
Best,
roman
>
>
> On Sat, Nov 23, 2013 at 9:40 AM, Roman Chyla
> wrote:
>
> > Hi,
> > doc
e different
> than it was in segment1 or 2.
>
> I think you're reading too much into LUCENE-2897. I'm pretty sure the
> segment in question is not available to you anyway before this rewrite is
> done,
> but freely admit I don't know much about it.
>
> Yo
ch seemed to
explain that behaviour.
>
> You're probably going to get into the whole PerSegment family of
> operations,
> which is something I'm not all that familiar with so I'll leave
> explanations
> to others.
>
Thank you, it is useful to get insights from various si
t;
> As long as a searcher is open, it's guaranteed that nothing is changing.
> Hard commits with openSearcher=false don't open new searchers, which
> is why changes aren't visible until a softCommit or a hard commit with
> openSearcher=true despite the fact that the segm
Hi,
docids are 'ephemeral', but i'd still like to build a search cache with
them (they allow for the fastest joins).
i'm seeing docids keep changing with updates (especially, in the last index
segment) - as per
https://issues.apache.org/jira/browse/LUCENE-2897
That would be fine, because i could
; 18 East 41st Street
>
> New York, NY 10017
>
> t: @appinions <https://twitter.com/Appinions> | g+:
> plus.google.com/appinions<
> https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
> >
> w: appinions.com <http://www.appinions.com/&
ast 41st Street
>
> New York, NY 10017
>
> t: @appinions <https://twitter.com/Appinions> | g+:
> plus.google.com/appinions<
> https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
> >
> w: appinions.com <http://www.appinions.com/&g
Hello,
We have two solr searchers/instances (read-only). They read the same index,
but they did not return the same #hits for a particular query
Log is below, but to summarize: first server always returns 576 hits, the
second server returns: 440, 440, 576, 576...
These are just few seconds apart
Hi Antoine,
I'll permit myself to respond in English, cause my written French is
slower;-)
Your problem is a well known amongst Sold users, the query parser splits
tokens by empty space, so the analyser never sees input 'la redoutte' but
it receives 'la' 'reroute'. You can of course enclose your se
Hi Parvesh,
I think you should check the following jira
https://issues.apache.org/jira/browse/SOLR-5379. You will find there links
to other possible solutions/problems:-)
Roman
On 28 Oct 2013 09:06, "Erick Erickson" wrote:
> Consider setting expand=true at index time. That
> puts all the tokens i
i just tested it whether our 'beautifu' parser supports it, and funnily
enough, it does :-)
https://github.com/romanchyla/montysolr/commit/f88577345c6d3a2dbefc0161f6bb07a549bc6b15
but i've (kinda) given up hope that people need powerful query parsers in
the lucene world, the LUCENE-5014 is there s
David,
We have a similar query in astrophysics, an user can select an area of the
skymany stars out there
I am long overdue in creating a Jira issue, but here you have another
efficient mechanism for searching large number of ids
https://github.com/romanchyla/montysolr/blob/master/contrib
sting online at:
http://www.cfa.harvard.edu/hr/postings/13-32.html
Thank you,
Roman
--
Dr. Roman Chyla
ADS, Harvard-Smithsonian Center for Astrophysics
roman.ch...@gmail.com
You don't need to index fields several times, you can index is just into
one field, and use the different query analyzers just to build the query.
We're doing this for authors, for example - if query language says
"=author:einstein", the query parser knows this field should be analyzed
differently
niversalRunner.buildUpdatedClassPath(UniversalRunner.java:109)
> at kg.apc.cmd.UniversalRunner.(UniversalRunner.java:55)
>
> at
> kg.apc.cmd.UniversalRunner.buildUpdatedClassPath(UniversalRunner.java:109)
> at kg.apc.cmd.UniversalRunner.(UniversalRunner.java:55)
>
>
>
solr/statements/admin/system
> > > >
> > > > But I can access http://localhost:8983/solr/admin/cores, only when
> > with
> > > > adminPath="/admin/cores" (which suggests that this is the right value
> > to
> > > be
> > > > used for cores), a
son
>
> Regards,
>
> Dmitry
>
>
>
> On Wed, Aug 14, 2013 at 2:03 PM, Dmitry Kan wrote:
>
> > Hi Roman,
> >
> > This looks much better, thanks! The ordinary non-comarison mode works.
> > I'll post here, if there are other findings.
> >
> &
ges/simplejson/encoder.py", line 202,
> in default
> raise TypeError(repr(o) + " is not JSON serializable")
> TypeError: <__main__.ForgivingValue object at 0x7fc6d4040fd0> is not JSON
> serializable
>
>
> Regards,
>
> D.
>
>
> On Tue, Aug 13, 2013 at
d'
>
Thanks for letting me know, that info is probably not available in this
situation - i've cooked st quick to fix it, please try the latest commit
(hope it doesn't do more harm, i should get some sleep ..;))
roman
>
> In case it matters: Python 2.7.3, ubuntu, so
On Fri, Aug 9, 2013 at 2:56 PM, Chris Hostetter wrote:
>
> : I'll look into this. Thanks for the concrete example as I don't even
> : know which classes to start to look at to implement such a feature.
>
> Either roman isn't understanding what you are aksing for, or i'm not --
> but i don't think
On Fri, Aug 9, 2013 at 11:29 AM, Mark wrote:
> > *All* of the terms in the field must be matched by the querynot
> vice-versa.
>
> Exactly. This is why I was trying to explain it as a reverse search.
>
> I just realized I describe it as a *large list of known keywords when
> really its small;
ly on the shard server in
> > background mode.
> >
> > my test run was:
> >
> > python solrjmeter.py -a -x ./jmx/SolrQueryTest.jmx -q
> > ./queries/demo/demo.queries -s localhost -p 8983 -a --durationInSecs 60
> -R
> > foo -t /solr/statements -e statements
34 PM, Shawn Heisey wrote:
>
> > On 8/6/2013 6:17 AM, Dmitry Kan wrote:
> > > Of three URLs you asked for, only the 3rd one gave response:
> >
> > > The rest report 404.
> > >
> > > On Mon, Aug 5, 2013 at 8:38 PM, Roman Chyla
> > wrote:
> &
o JSON object could be decoded: line 1
> column 0 (char 0)
>
>
> The README.md on the github is somehow outdated, it suggests using -q
> ./demo/queries/demo.queries, but there is no such path in the fresh
> checkout.
>
> Nice to have the -t param.
>
> Dmitry
>
>
>
x27; %
> (options.serverName, options.serverPort)
>
> jmx_options = []
> for k, v in options.__dict__.items():
>
>
>
> Dmitry
>
>
> On Thu, Aug 1, 2013 at 6:41 PM, Roman Chyla wrote:
>
> > Dmitry,
> > Can you post the entire invocation line?
On Thu, Aug 1, 2013 at 6:11 PM, Shawn Heisey wrote:
> On 8/1/2013 2:08 PM, Roman Chyla wrote:
>
>> Hi, here is a short post describing the results of the yesterday run with
>> added parameters as per Shawn's recommendation, have fun getting confused
>> ;)
>>
&
Hi, here is a short post describing the results of the yesterday run with
added parameters as per Shawn's recommendation, have fun getting confused ;)
http://29min.wordpress.com/2013/08/01/measuring-solr-performance-ii/
roman
On Wed, Jul 31, 2013 at 12:32 PM, Roman Chyla wrote:
> I
When you set your cache (solrconfig.xml) to size=0, you are not using a
cache. so you can debug more easily
roman
On Thu, Aug 1, 2013 at 1:12 PM, jimtronic wrote:
> I have a query that runs slow occasionally. I'm having trouble debugging it
> because once it's cached, it runs fast -- under 10
storting your measurements.
>
>
> Bernd
>
>
> Am 31.07.2013 05:01, schrieb Shawn Heisey:
> > On 7/30/2013 6:59 PM, Roman Chyla wrote:
> >> I have been wanting some tools for measuring performance of SOLR,
> similar
> >> to Mike McCandles' lucene benchmark.
ib/python2.7/contextlib.py", line 17, in __enter__
> return self.gen.next()
> File "solrjmeter.py", line 229, in changed_dir
> os.chdir(new)
> OSError: [Errno 20] Not a directory:
> '/home/dmitry/projects/lab/solrjmeter/queries/demo/demo.queries'
>
y be random. So, yes, now I am sure what to
> > think of default G1 as 'bad', and that these G1 parameters, even if they
> > don't seem G1 specific, have real effect.
> > Thanks,
> >
> > roman
> >
> >
> > On Tue, Jul 30, 2013 at 1
o
think of default G1 as 'bad', and that these G1 parameters, even if they
don't seem G1 specific, have real effect.
Thanks,
roman
On Tue, Jul 30, 2013 at 11:01 PM, Shawn Heisey wrote:
> On 7/30/2013 6:59 PM, Roman Chyla wrote:
> > I have been wanting some tools for meas
es(options)
>> File "solrjmeter.py", line 351, in check_prerequisities
>> error('Cannot contact: %s' % options.query_endpoint)
>> File "solrjmeter.py", line 66, in error
>> traceback.print_stack()
>> Cannot contact: http://localhost:8983
Hello,
I have been wanting some tools for measuring performance of SOLR, similar
to Mike McCandles' lucene benchmark.
so yet another monitor was born, is described here:
http://29min.wordpress.com/2013/07/31/measuring-solr-query-performance/
I tested it on the problem of garbage collectors (see
Hi,
Yes, it can be done, if you search the mailing list for 'two solr instances
same datadir', you will a post where i am describing our setup - it works
well even with automated deployments
how do you measure performance? I am asking before one reason for us having
the same setup is sharing the O
On Sat, Jul 27, 2013 at 4:17 PM, Shawn Heisey wrote:
> On 7/27/2013 11:38 AM, Joe Zhang wrote:
> > I have a constantly growing index, so not updating the index can't be
> > practical...
> >
> > Going back to the beginning of this thread: when we use the vanilla
> > "*:*"+pagination approach, woul
.com/m-khl/solr-patches/compare/streaming#L15R115
>
> all other code purposed for distributed search.
>
>
>
> On Sat, Jul 27, 2013 at 4:44 PM, Roman Chyla
> wrote:
>
> > Mikhail,
> > If your solution gives lazy loading of solr docs /and thus streaming of
> > hu
Mikhail,
If your solution gives lazy loading of solr docs /and thus streaming of
huge result lists/ it should be big YES!
Roman
On 27 Jul 2013 07:55, "Mikhail Khludnev" wrote:
> Otis,
> You gave links to 'deep paging' when I asked about response streaming.
> Let me understand. From my POV, deep p
Dear list,
I'vw written a special processor exactly for this kind of operations
https://github.com/romanchyla/montysolr/tree/master/contrib/adsabs/src/java/org/apache/solr/handler/batch
This is how we use it
http://labs.adsabs.harvard.edu/trac/ads-invenio/wiki/SearchEngineBatch
It is capable of
Hi,
I think you are pushing it too far - there is no 'string search' without an
index. And besides, these things are just better done by a few lines of
code - and if your array is too big, then you should create the index...
roman
On Thu, Jul 25, 2013 at 9:06 AM, Rohit Kumar wrote:
> Hi,
>
>
This paper contains an excellent algorithm for plagiarism detection, but
beware the published version had a mistake in the algorithm - look for
corrections - I can't find them now, but I know they have been published
(perhaps by one of the co-authors). You could do it with solr, to create an
index
_One_ idea would be to configure your java to dump core on the oom error -
you can then load the dump into some analyzers, eg. Eclipse, and that may
give you the desired answers (I fortunately don't remember that from top of
my head how to activate the dump, but google will give your the answer)
r
performances acceptable (~ within minutes) ?
>
> Thanks,
> Matt
>
> On 7/23/13 6:57 PM, "Roman Chyla" wrote:
>
> >Hello Matt,
> >
> >You can consider writing a batch processing handler, which receives a
> >query
> >and instead of sending res
you disclosure how that streaming writer works? What does it stream
> docList or docSet?
>
> Thanks
>
>
> On Wed, Jul 24, 2013 at 5:57 AM, Roman Chyla
> wrote:
>
> > Hello Matt,
> >
> > You can consider writing a batch processing handler, which recei
Hello Matt,
You can consider writing a batch processing handler, which receives a query
and instead of sending results back, it writes them into a file which is
then available for streaming (it has its own UUID). I am dumping many GBs
of data from solr in few minutes - your query + streaming write
the query, so in that sense, it is not different from
pre-computing the citation cache - but it happens for every query/request,
and so for 0.5M of edges it must take some time. But I guess I should
measure it. I haven't made notes so now I am having hard time backtracking
:)
roman
> It
Deepak,
I think your goal is to gain something in speed, but most likely the
function query will be slower than the query without score computation (the
filter query) - this stems from the fact how the query is executed, but I
may, of course, be wrong. Would you mind sharing measurements you make?
Look at speed of reading the data - likely, it takes long time to assemble
a big response, especially if there are many long fields - you may want to
try SSD disks, if you have that option.
Also, to gain better understanding: Start your solr, start jvisualvm and
attach to your running solr. Start
gt; field in a Solr doc with the value 6 in it. I can then
> > form a query like
> > {!bitwise field=myfield op=AND source=2}
> > and it would match.
> >
> > You're talking about a much different operation as I
> > understand it.
> >
> > In which ca
Hi Dave,
On Wed, Jul 17, 2013 at 2:03 PM, dmarini wrote:
> Roman,
>
> As a developer, I understand where you are coming from. My issue is that I
> specialize in .NET, haven't done java dev in over 10 years. As an
> organization we're new to solr (coming from endeca) and we're looking to
> use
rch for query-time phrase
> synonyms, off-the-shelf, today, no patches required.)
>
>
> -- Jack Krupansky
>
> -Original Message- From: Roman Chyla
> Sent: Wednesday, July 17, 2013 11:44 AM
>
> To: solr-user@lucene.apache.org
> Subject: Re: Searching w/expli
implementation, but again, this is all a
> longer-term future, not a "here and now". Maybe in the 5.0 timeframe?
>
> I don't want anyone to get the impression that there are off-the-shelf
> patches that completely solve the synonym phrase problem. Yes, progress is
> be
Hi all,
What I find very 'sad' is that Lucene/SOLR contain all the necessary
components for handling multi-token synonyms; the Finite State Automaton
works perfectly for matching these items; the biggest problem is IMO the
old query parser which split things on spaces and doesn't know to be
smarte
on of different fields is allowed). I'd like to spend
> some time on ANTLR and the new way of parsing you mentioned. I will let you
> know if it was useful for me. Thanks.
>
> Kind regards.
>
>
> On 16 July 2013 20:07, Roman Chyla wrote:
>
> > Well, I think this is
Well, I think this is slightly too categorical - a range query on a
substring can be thought of as a simple range query. So, for example the
following query:
"lucene 1*"
becomes behind the scenes: "lucene (10|11|12|13|14|1abcd)"
the issue there is that it is a string range, but it is a range que
JIRA? Somehow I missed it if it did, and this
> > would
> > be pretty cool
> >
> > Erick
> >
> > On Mon, Jul 15, 2013 at 6:52 PM, Roman Chyla
> > wrote:
> > > On Sun, Jul 14, 2013 at 1:45 PM, Oleg Burlaca
> > wrote:
> > >
> > >> He
On Sun, Jul 14, 2013 at 1:45 PM, Oleg Burlaca wrote:
> Hello Erick,
>
> > Join performance is most sensitive to the number of values
> > in the field being joined on. So if you have lots and lots of
> > distinct values in the corpus, join performance will be affected.
> Yep, we have a list of uni
Hi Mikhail,
I have commented on your blog, but it seems I have done st wrong, as the
comment is not there. Would it be possible to share the test setup (script)?
I have found out that the crucial thing with joins is the number of 'joins'
[hits returned] and it seems that the experiments I have see
On Wed, Jul 10, 2013 at 5:37 PM, Marcelo Elias Del Valle wrote:
> Hello,
>
> I have asked a question recently about solr limitations and some about
> joins. It comes that this question is about both at the same time.
> I am trying to figure how to denormalize my data so I will need just 1
Other than using futures and callables? Runnables ;-) Other than that you
will need async request (ie. client).
But in case sb else is looking for an easy-recipe for the server-side async:
public void handleRequestBody(.) {
if (isBusy()) {
rsp.add("message", "Batch processing is already r
uot;server-side named filters". It
> matches the feature described at
> http://www.elasticsearch.org/blog/terms-filter-lookup/
>
> Would be a cool addition, IMHO.
>
> Otis
> --
> Solr & ElasticSearch Support -- http://sematext.com/
> Performance Monitoring -- http
One of the approaches is to index create a new field based on the stopwords
(ie accept only stopwords :)) - ie. if the documents contains them, you
index 1 - and use a q=apple&fq=bad_apple:0
This has many limitations (in terms of flexibility), but it will be
superfast
roman
On Mon, Jul 8, 2013 a
Hello,
The joins are not the only idea, you may want to write your own function
(ValueSource) that can implement your logic. However, I think you should
not throw away the regex idea (as being slow), before trying it out -
because it can be faster than the joins. Your problem is that the number of
gt; would never have occurred to me. Thank you too!
>
> Best,
> Katie
>
>
> On Wed, Jul 3, 2013 at 11:35 PM, Roman Chyla
> wrote:
>
> > Hi Kathryn,
> > I wonder if you could index all your terms as separate documents and then
> > construct a new query
1 - 100 of 167 matches
Mail list logo