Let me answer in line, to get more info :
2015-06-10 10:59 GMT+01:00 Midas A test.mi...@gmail.com:
Hi Alessandro,
Please find the answers inline and help me out to figure out this problem.
1) Solr version : *4.2.1*
2) Solr architecture :* Master -slave/ Replication with requestHandler*
It depends a lot on what the documents are. Some document formats have
metadata that stores a title. Perhaps you can just extract that.
If not, once you've extracted the content, perhaps you could just have a
special field that is the first n words (followed by an ellipsis).
If you use a
Wow, Upaya, I didn't know that clean was default=true in the delta import
as well!
I did know it was default in the full import, but I agree with you that
having a default to true for delta import is very dangerous !
But assuming the user was using the delta import so far, if cleaning every
time,
Hi Alessandro,
Please find the answers inline and help me out to figure out this problem.
1) Solr version : *4.2.1*
2) Solr architecture :* Master -slave/ Replication with requestHandler*
3) Kind of data source indexed : *Mysql *
4) What happened to the datasource ? any change in there ? : *No
Note the clean= parameter to the DIH. It defaults to true. It will wipe
your index before it runs. Perhaps it succeeded at wiping, but failed to
connect to your database. Hence an empty DB?
clean=true is, IMO, a very dangerous default option.
Upayavira
On Wed, Jun 10, 2015, at 10:59 AM, Midas A
I was only speaking about full import regarding the default of
clean=true. However, looking at the source code, it doesn't seem to
differentiate especially between a full and a delta in relation to the
default of clean=true, which would be pretty crappy. However, I'd need
to try it.
Upayavira
On
Another technology that might make more sense is a Doc Transformer.
You also specify them in the fl parameter. I would imagine you could
specify
fl=id,[persian f=gregorian_Date]
See here for more cases:
https://cwiki.apache.org/confluence/display/solr/Transforming+Result+Documents
This does
The main objective here is actually to assign a title to the documents as
they are being indexed.
We actually found that the cluster labels provides a good information on
the key points of the documents, but I'm not sure if we can get a good
cluster labels with a single documents.
Besides
Let me try to help you, first of all I would like to encourage people to
post more information about their scenario than This is my log, index
deleted, help me :)
This kind of Info can be really useful :
1) Solr version
2) Solr architecture ( Solr Cloud ? Solr Cloud configuration ? Manual
I've tried to use solr.HMMChineseTokenizerFactory with the following
configurations:
fieldType name=text_chinese class=solr.TextField
positionIncrementGap=100
analyzer
tokenizer class=solr.HMMChineseTokenizerFactory/
filter class=solr.StopFilterFactory
I have used Solr 3.3 version as Data Import Handler(DIH) with Oracle.Its
working fine for me.nbsp;
Now I am trying the same with Mysql.With the change in database, I have changed
the query used in data-config.xml for MySql.
The query has variables which are passed url in http.The same thing
In cloud mode, configurations live in ZooKeeper.
By doing the
-Dvelocity.template.base.dir=/full/path/to/example/files/conf/velocity/ trick
(or baking that into your solrconfig setup for the VelocityResponseWriter) you
can have the templates on the file system instead though.
—
Erik Hatcher,
Some reason, you email is complete unreadable with a lot of nbsp
instead of spaces. Maybe it is trying to send as broken HTML?
You may want to try to reformat the message and resend.
Regards,
Alex.
Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
Hi Erik
When running solr in simple mode on my laptop, I found the *vm files under
under server/solr/COLLECTION_NAME/conf
however, when running on my server in cloud mode (with only one node), I do
not find these conf/ directory under server.
Does it sit on another place?
thanks!
On Tue, Jun
Hi Edwin,
let's do this step by step.
Clustering is problem solved by unsupervised machine learning algorithms.
The scope of clustering is to group per similarity a corpus of documents,
trying to have meaningful groups for a human being.
Solr currently provides different approaches for *Query
Erick will correct me if I am wrong but this function query I don't think
it exists.
But maybe can be a nice contribution.
It should take in input a date format and a field and give in response the
new formatted Date.
The would be simple to use it :
On Wed, Jun 10, 2015, at 05:52 AM, William Bell wrote:
Finding DIH issue with the new AngularJS DIH section, while indexing...
1,22613/s ?
Last Update: 22:50:50
*Indexing since 0:1:38.204*
Requests: 1, Fetched: 1,22613/s, Skipped: 0, Processed: 1,22613/s
Started: 3 minutes ago
Ahh,
Thank you for your reply.
So my question is: can I get offset of time if I use NOW/MINUTE and not NOW/DAY
rounding?
You said TZ affects what timezone is used when defining the concept of a
day for
the purposes of rounding by day. I understand from this answer that query
like I mentioned
I have used Solr 3.3 version as Data Import Handler(DIH) with Oracle.Its
working fine for me.nbsp;
Now I am trying the same with Mysql.With the change in database, I have changed
the query used in data-config.xml for MySql.
The query has variables which are passed url in http.The same thing
On 6/10/2015 6:43 AM, abhijit bashetti wrote:
snip
gt;= to_date('[?, '28/05/2015 11:13:50']', 'DD/MM/ HH24:MI:SS')
snip
Anyone knows where is the problem? Why is the variable resolver not working
as expected?
Note : to_date is function written by us in MySql.
I have checked out the
Dear Erick,
Hi,
Actually I want to convert date format from Geregorian calendar (solr
default) to Perisan calendar. You may ask why i do not do that at client
side? Here is why:
I want to provide a way to extract data from solr in the csv format. I know
that solr has csv ResponseWriter that could
Hi,
I'm trying to index rich-text documents that are in chinese. Currently,
there's no problem with indexing, but there's problem with the searching.
Does anyone knows what is the best Tokenizer and Filter Factory to use? I'm
now using the solr.StandardTokenizerFactory which I heard that it's
You may find the series of article on CJK analysis/search helpful:
http://discovery-grindstone.blogspot.com.au/
It's a little out of date, but should be a very solid intro.
Regards,
Alex.
Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/
On 10
I agree with Upayavira,
Title extraction is an activity independent from Solr.
Furthermore I would say it's easy to extract the title before the Solr
Indexng stage.
When we send the content arrives to Solr Update processors it is already a
String.
If you want to do some clever title extraction,
Take a look at the collections API CREATE command in more detail here:
https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api1
Admittedly this is 5.2 but you didn't mention what version of Solr
you're using.
In particular the createNodeSet and createNodeSet.shuffle
Just taking a look to the code :
if (requestParams.containsKey(clean)) {
clean = StrUtils.parseBool( (String) requestParams.get(clean), true);
} else if (DataImporter.DELTA_IMPORT_CMD.equals(command) ||
DataImporter.IMPORT_CMD.equals(command)) {
clean = false;
} else {
clean = debug ?
Hello,
I have a cluster with 3 nodes (node1, node2 and node3). Now I want to create a
new collection with 3 shards using `implicit` routing:
I'm having a config issue, I'm posting the error from Solrj which also
includes the cluster state JSON:
org.apache.solr.common.SolrException: No active slice servicing hash code
2ee4d125 in DocCollection(rfp365)={
shards:{shard1:{
range:-,
state:active,
start with negating and bypassing caches by
https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-TermsQueryParser
eg
fq=-{!terms f=p_id cache=false}1,3,5,already,seen
note:
Elastic can even store such filters via
Hi all,
I'm using Zookeeper 3.4.6 in the context of SolrCloud 5. When uploading
a config file containing the following, I get an Invalid Path String
error.
filter class=solr.StopFilterFactory
words=/netapp/dokubase/seeval/dicts/stopwords/stopwords_de.txt
ignoreCase=true/
leads obviously
I'm having a config issue, I'm posting the error from Solrj which also
includes the cluster state JSON:
org.apache.solr.common.SolrException: No active slice servicing hash code
2ee4d125 in DocCollection(rfp365)={
shards:{shard1:{
range:-,
state:active,
Hello,
The problem is SlowCompositeReaderWrapper.wrap(searcher.getIndexReader());
you hardly ever need to to this, at least because Solr already does it.
DocValues need to be accessed per segment, leaf/atomic/reader/context
provided to collector.
eg look at DocTermsIndexDocValues.strVal(int)
I am using RankQuery to implement my applicative scorer that returns a score
based on the value of specific field (lets call it 'score_field') that is
stored for every document.
The RankQuery creates a collector, and for every collected docId I retrieve
the value of score_field, calculate the
I need to execute close() because the scorer is being opened in a context of
a query and caches some data in that scope - of the specific query. The way
to clear this cache, which is only relevant for that query, is to call
close(). I think this API is not so good, but I assume that the scorer's
Thank you very much.
It seems that document transformer is the perfect extension point for this
conversion. I will try to implement that.
Best regards.
On Wed, Jun 10, 2015 at 3:54 PM, Upayavira u...@odoko.co.uk wrote:
Another technology that might make more sense is a Doc Transformer.
You
Hi,
We have a solr index with ~1M documents.
We want to give the ability to our users to filter results from queries -
meaning they will not shown again for any query of this specific user (we
currently have 10K users).
You can think of a scenario like a recommendation engine which you don't
On 6/10/2015 2:47 PM, Peter Scholze wrote:
I'm using Zookeeper 3.4.6 in the context of SolrCloud 5. When
uploading a config file containing the following, I get an Invalid
Path String error.
filter class=solr.StopFilterFactory
words=/netapp/dokubase/seeval/dicts/stopwords/stopwords_de.txt
I don't see any way around storing which recommendations have been delivered to
each user. Sounds like a separate collection with the unique ID created from
the combination of the user ID and the recommendation ID (with the IDs also
available as a separate, searchable and returnable fields).
:
: The problem is SlowCompositeReaderWrapper.wrap(searcher.getIndexReader());
: you hardly ever need to to this, at least because Solr already does it.
Specifically you should just use...
searcher.getLeafReader().getSortedSetDocValues(your_field_anme)
...instead of doing all this
: So my question is: can I get offset of time if I use NOW/MINUTE and not
NOW/DAY rounding?
i'm sorry, but your question is still too terse, vague, and ambiguious for
me to really make much sense of it; and the example queries you provided
really don't have enough context for me to understand
: The guys was using delta import anyway, so maybe the problem is
: different and not related to the clean.
that's not what the logs say.
Here's what i see...
Log begins with server startup @ Jun 10, 2015 11:14:56 AM
The DeletionPolicy for the shopclue_prod core is initialized at Jun
10,
If I knew, I would fix it ;). The sub-optimizes (i.e. the ones
sent out to each replica) should be sent in parallel and then
each thread should wait for completion from the replicas. There
is no real check for optimize, I believe that the return from the
call is considered sufficient. If we can
Hi,
There are 5 cores and a separate server for indexing on this solrcloud. Can
you please share your suggestions on:
How can indexer know that the optimize has completed even if the
commit/optimize runs in background without going to the solr servers may be
by using any solrj or other API?
I
Hi,
Is it possible to list all the fields in the highlighting portion in the
output?
Currently,even when I str name=hl.fl*/str, it only shows fields where
highlighting is possible, and fields which highlighting is not possible is
not shown.
I would like to have the output where all the fields,
44 matches
Mail list logo