Hi,
I am having a same kind of issue. I am not able to search accented characters
of spanish. For eg: - Según, próximos etc.
I have field called attr_content which holds the content of a PDF file whose
contents are in spanish. I am using Apache Tika to index the contents of a PDF
file. I have
Hello,
I'm looking to use Solr for creating cross linking in text.
For exemple : I'll like to be able to request for a text field, an
article, in my blog. And that Solr use a script/method, request to parse
the text, find all matching categories term and caps the results.
Do you have any
I have already checked this link. Could not find any hint about Mongolian
language. Is there any plugin available for that?
-Original Message-
From: bbarani [mailto:bbar...@gmail.com]
Sent: Thursday, May 30, 2013 2:04 AM
To: solr-user@lucene.apache.org
Subject: Re: Support for Mongolian
Thanks for having analyzed the problem. But please let me note that I came to a
somehow different conclusion.
Define for the moment title to be the primary unique key:
solr-4.3.0\example\example-DIH\solr\rss\conf\schema.xml
uniqueKeytitle/uniqueKey
Hello,
I want to use a index a huge list of xml file.
_ Using FileListEntityProcessor causes an OutOfMemoryException (too many
files...)
_ I can do it using a LineEntityProcessor reading a list of files,
generated externally, but I would prefer to generate the list in SOLR
_ So to avoid to
Hi,
Indeed, with character # encoded the query works fine.
Thanks
--
Yago Riveiro
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
On Wednesday, May 29, 2013 at 9:43 PM, bbarani wrote:
# has a separate meaning in URL.. You need to encode that..
Thanks Shalini...
It is solr 3.6.2
Instead of NOW, I can use today's date (I did not know this cache
issue,, thanks).
Later I realized , it looks it is my mistake that misleads asc and desc
ordering result.
After I get data from solr, again I do mysql query where the order changes
again.
sort=last_updated_date desc
Maybe adding %20 will help:
sort=last_updated_date%20desc
--
View this message in context:
http://lucene.472066.n3.nabble.com/Sorting-results-by-last-update-date-tp4066692p4066986.html
Sent from the Solr - User mailing list archive at Nabble.com.
Did you declare that field name in outer entity? Not just select as in
the query.
Regards,
Alex
On 30 May 2013 04:31, jerome.dup...@bnf.fr wrote:
Hello,
I want to use a index a huge list of xml file.
_ Using FileListEntityProcessor causes an OutOfMemoryException (too many
files...)
Do it outside of solr or look at update request processors. E.g. UIMA
integration as an example.
Regards,
Alex
On 30 May 2013 02:52, It-forum it-fo...@meseo.fr wrote:
Hello,
I'm looking to use Solr for creating cross linking in text.
For exemple : I'll like to be able to request for a
Hi,
I have a time out error when I try to split a collection with 15M documents
The exception (solr version 4.3):
542468 [catalina-exec-27] INFO org.apache.solr.servlet.SolrDispatchFilter
– [admin] webapp=null path=/admin/collections
Solr Join is _not_ sql subquery and won't work like one.
There's a reason it's called pseudo join in the JIRA issues.
My advice. Forget joins and try to write this in pure
Solr query language. The more you try to use Solr like
a database, the more you'll get into trouble. De-normalize
your data
Deep:
Have you looked through the rest of the thread and tried the
suggestions? If so, what were the results?
Best
Erick
On Thu, May 30, 2013 at 2:45 AM, Deep Lotia deeplo...@gmail.com wrote:
Hi,
I am having a same kind of issue. I am not able to search accented characters
of spanish. For
I have a Solr application with a multiValue field 'tags'. All fields
are indexed in this application. There exists a uniqueKey field 'id'
and a '_version_' field. This is running on Solr 4.x.
In order to add a tag, the application retrieves the full document,
creates a PHP array from the document
Just count the character in the literal portions of the patterns and include
that spaces in the replacement.
So, TextLine would become .
It gets trickier if names are variable length. But I'm sure you could come
up with patterns to replace one, two, three, etc. char names with
No, there is not.
-- Jack Krupansky
-Original Message-
From: Sagar Chaturvedi
Sent: Thursday, May 30, 2013 3:03 AM
To: solr-user@lucene.apache.org
Subject: RE: Support for Mongolian language
I have already checked this link. Could not find any hint about Mongolian
language. Is there
So having tried all combinations of LUCENE_40, 41 and 42 we're still having no
success in getting our indexes to load with Solr 4.2.1...
Any direction we can look into ? in our system the underlying data is very slow
to re-index and would take an unreasonable amount of time at a customer site
You can just use NOW/DAY for a filter that would only change once a day:
[NOW/DAY-60DAY TO NOW/DAY]
Oops... make that:
[NOW/DAY-60DAY TO NOW/DAY+1DAY]
Otherwise, it would miss dates after the start of today.
Even better, make it:
[NOW/DAY-60DAY TO *]
-- Jack Krupansky
-Original
On Wed, May 29, 2013 at 5:37 PM, Shawn Heisey s...@elyograg.org wrote:
It's impossible for us to give you hard numbers. You'll have to
experiment to know how fast you can reindex without killing your
servers. A basic tenet for such experimentation, and something you
hopefully already know:
On Wed, May 29, 2013 at 5:09 PM, Shawn Heisey s...@elyograg.org wrote:
I handle this in a very specific way with my sharded index. This won't
work for all designs, and the precise procedure won't work for SolrCloud.
There is a 'live' and a 'build' core for each of my shards. When I want
to
First, you cannot do any internal editing of a multi-valued list, other
than:
1. Replace the entire list.
2. Add values on to the end of the list.
But you can do both of those operations on a single multivalued field with
atomic update without reading and writing the entire document.
I wrote Otherwise, it would miss dates after the start of today, but that
should be Otherwise, it would miss documents with times after the start of
today if the current time is before noon.
But use * and you will be better off anyway.
-- Jack Krupansky
-Original Message-
From: Jack
Ah, I missed that part.
The problem that you have is because you have forEach=/feed/entry but you
want to read /feed/link as a common field. You need to have forEach=/feed
| /feed/entry which should let you have both /feed/link as well as
/feed/entry/link.
On Thu, May 30, 2013 at 1:25 PM,
On Thu, May 30, 2013 at 3:42 PM, Jack Krupansky j...@basetechnology.com wrote:
First, you cannot do any internal editing of a multi-valued list, other
than:
1. Replace the entire list.
2. Add values on to the end of the list.
Thank you. I meant that I am actually editing the entire
You gave an XML example, so I assumed you were working with XML!
In JSON...
[{id: doc-id, tags: {add: [a, b]}]
and
[{id: doc-id, tags: {set: null}}]
BTW, this kind of stuff is covered in the book, separate chapters for XML
and JSON, each with dozens of examples like this.
-- Jack
I am trying to get Solr installed in Tomcat, and having trouble.
I am trying to use the instructions at
http://wiki.apache.org/solr/SolrTomcat as a guide. Trying to start with
the example Solr from the Solr distro. Tried using the Tried with both a
binary distro with existing solr.war, and
Hi Jonathan,
Did you find
http://stackoverflow.com/questions/3016808/tomcat-startup-logs-severe-error-filterstart-how-to-get-a-stack-trace
?
Steve
On May 30, 2013, at 10:10 AM, Jonathan Rochkind rochk...@jhu.edu wrote:
I am trying to get Solr installed in Tomcat, and having trouble.
I am
Usually tomcat errors with Solr 4.3 happen due to uncopied logging
libraries. I would check if installing Solr 4.2.1 works and/or copy
additional libraries in (search mailing list for this issue).
However, I am not entirely sure that's the case here. It feels that
perhaps the definition of the
-- Forwarded message --
From: Igor Littig igor.lit...@gmail.com
Date: 2013/5/30
Subject: indexing only selected fields
To: solr-user-...@lucene.apache.org
Hello everyone.
I'm quite new in Solr and need your advice... Does anybody know how to
index not all fields in an uploading
I am trying to get Solr installed in Tomcat, and having trouble.
When I start up tomcat, I get in the Tomcat log:
INFO: Deploying web application archive solr.war
May 29, 2013 3:59:40 PM org.apache.catalina.core.StandardContext start
SEVERE: Error filterStart
May 29, 2013 3:59:40 PM
How are you submitting your document? Some methods automatically
ignore unknown fields, other complaint.
In any case, there is always a way to define an ignored field type.
The schema.xml in the main example shows how to do it. Search for
'ignored'. But beware that this will hide all spelling and
-- Forwarded message --
From: Igor Littig igor.lit...@gmail.com
Date: 2013/5/30
Subject: indexing only selected fields
To: solr-user-...@lucene.apache.org
Hello everyone.
I'm quite new in Solr and need your advice... Does anybody know how to
index not all fields in an
Alex
Thank you for the answer. I am submitting by POST method via curl... For
example when I want to submit a document I'm typing in the command line:
curl 'http://localhost:8983/solr/update/json?commit=true' --data-binary @
base.info -H 'Content-type:application/json'
where base.info my file
I don't want to dissuade you from trying but I believe FileListEntityProcessor
has something special coded up into it to allow for its unique usage. Not sure
if your approach isn't do-able. I would imagine that fixing FLEP to handle a
row-at-a-time or page-at-a-time in memory wouldn't be
Update Request Processors to the rescue!
Example - Ignore input values for any undefined fields
Add to solrconfig:
updateRequestProcessorChain name=ignore-undefined
processor class=solr.IgnoreFieldUpdateProcessorFactory /
processor class=solr.LogUpdateProcessorFactory /
processor
If you want to just removing anything that does not match then
'ignored' field type in example schema would work. If you want to
ignore specific fields but complain on any unexpected things you can
still use specific fields but with ignored type.
Or you could use Update Request Processors like
Thanks! I guess I should have asked on-list BEFORE wasting 4 hours
fighting with it myself, but I was trying to be a good user and do my
homework! Oh well.
Off to the logging instructions, hope I can figure them out -- if you
could update the tomcat instructions with the simplest possible
Shard splitting is buggy in 4.3. I recommend that you wait for the next
release (4.3.1) before using this feature.
That being said, the split is executed by the Overseer and will continue to
happen even after the http request times out. There aren't enough hooks to
monitor the progress of the
I'm going to add a note to http://wiki.apache.org/solr/SolrLogging ,
with the Tomcat sample Error filterStart error, as an example of
something you might see if you have not set up logging.
Then at least in the future, googling solr tomcat error filterStart
might lead someone to the clue that
On 5/30/2013 9:26 AM, Jonathan Rochkind wrote:
Thanks! I guess I should have asked on-list BEFORE wasting 4 hours
fighting with it myself, but I was trying to be a good user and do my
homework! Oh well.
Off to the logging instructions, hope I can figure them out -- if you
could update the
Hi,
Thanks for your anwser, it made me go ahead.
The name of the entity was not good, not consistent with schema
Now the first entity works fine: the query is done to the database and
returns the good result.
The problem is that the second entity, which is a XPathEntityProcessor
entity, doesn't
I've been trying to get into how distributed field facets do their work but
I haven't been able to uncover how they deal with this issue.
Currently distrib pivot facets does a getTermCounts(first_field) to
populate a list at the level its working on.
When putting together the data structure we
On Thu, May 30, 2013 at 11:44 AM, jerome.dup...@bnf.fr wrote:
entity name=processorDocument
processor=XPathEntityProcessor
datasource=racineNoticeDatasource
Ok, that is clear. Thanks fo the answer
2013/5/30 Alexandre Rafalovitch arafa...@gmail.com
If you want to just removing anything that does not match then
'ignored' field type in example schema would work. If you want to
ignore specific fields but complain on any unexpected things you can
Hi,
We recently had production release to upgrade our Solr3.5 to Solr 4.2.1. (No
schema change except the some basic required for 4.2.1)
The nature of our document is that we have huge multivalued fields. they can
go from 1000 to 100K in once single field.
# Documents : 300K
# Index size: 9GB
Hoss, thanks a lot for the explanation.
We override most of the methods of query
component(prepare,handleResponses,finishStage etc..) to incorporate custom
logic and we set the _responseDocs values based on custom logic (after
filtering out few data) and then we call the parent(super)
How are you indexing the documents? Are you using indexing program?
The below post discusses the same issue..
http://lucene.472066.n3.nabble.com/removing-write-lock-file-in-solr-after-indexing-td3699356.html
--
View this message in context:
I am using Nutch 1.6 and Solr 1.4.1 on Ubuntu in local mode and using
Nutch's solrindex to index documents into Solr.
When indexing documents, I hit an occasional document that does not match
the Solr schema. For example, a document which has two address fields when
my Solr schema.xml does
We have a solr instance running on a 4 CPU box.
Sometimes, we send a query to our solr server and it take up 100% of one
CPU and 60% of memory. I assume that if we send another query request,
solr should be able to use another idling CPU. However, it is not the
case. Using top, I only see
I am using Nutch 1.6 and Solr 1.4.1 on Ubuntu in local mode and using
Nutch's solrindex to index documents into Solr.
When indexing documents, I hit an occasional document that does not match
the Solr schema. For example, a document which has two address fields when
my Solr schema.xml does
On 5/30/2013 11:03 AM, Iain Lopata wrote:
When indexing documents, I hit an occasional document that does not match
the Solr schema. For example, a document which has two address fields when
my Solr schema.xml does not specify address as being multi-valued (and I do
not want it to be).
On 5/30/2013 11:12 AM, Mingfeng Yang wrote:
We have a solr instance running on a 4 CPU box.
Sometimes, we send a query to our solr server and it take up 100% of one
CPU and 60% of memory. I assume that if we send another query request,
solr should be able to use another idling CPU. However,
I need to do a query where I need to find all people who have done 2 events
within a range. I currently log one row per an event.
Example:
Person,Date,ViewedUrl
1,2012May10,google.com
2,2012May10,yahoo.com
1,2012May13,yahoo.com
2,2012May13,google.com
Sample request would be wanting to find all
On Thu, May 30, 2013 at 1:03 PM, Iain Lopata ilopa...@hotmail.com wrote:
For example, a document which has two address fields when
my Solr schema.xml does not specify address as being multi-valued (and I do
not want it to be).
No help on the core topic, but a workaround for the specific
Hi,
We just use CURL from PHP code to submit indexing request, like:
/update?commit=true..
This worked well in solr 3.6.1. I saw the link you showed and really appreciate
(if no other choice I will change java source code but hope there is a better
way..)?
Thanks very much for helps, Lisheng
I did more tests and get more info: the basic setting is that we created core
from PHP CURl
API where we define:
schema
config
instanceDir=my_solr_home
dataDir=my_solr_home/data/new_collection_name
In solr 3.6.1 we donot need to define schema/config because conf folder is not
inside each
: My advice. Forget joins and try to write this in pure
: Solr query language. The more you try to use Solr like
: a database, the more you'll get into trouble. De-normalize
: your data and try again.
with that important caveat in mind, it is worth noting that what you are
essentailly asking
: I recently upgraded solr from 3.6.1 to 4.3, it works well, but I noticed that
after finishing
: indexing
:
: write.lock
:
: is NOT removed. Later if I index again it still works OK. Only after I
shutdown Tomcat
: then write.lock is removed. This behavior caused some problem like I could
: I wanted to know if Solr has some functionality to group results based on
: the field that matched the query.
:
: So if I have id, name and manufacturer in my document structure, I want to
: know how many results are there because its manufacturer matched the q and
: how many results are there
Everytime I try to do a reload using the collections API my entire cloud goes
down and I cannot search it. The solrconfig.xml and schema.xml are good
because when I just restart tomcat everything works fine.
Here is the output of the collections api reload command:
59155087
Okay, sadly, i still can't get this to work.
Following the instructions at:
https://wiki.apache.org/solr/SolrLogging#Using_the_example_logging_setup_in_containers_other_than_Jetty
I copied solr/example/lib/ext/*.jar into my tomcat's ./lib, and copied
solr/example/resources/log4j.properties
https://issues.apache.org/jira/browse/SOLR-4805
- Mark
On May 30, 2013, at 3:09 PM, davers dboych...@improvementdirect.com wrote:
Everytime I try to do a reload using the collections API my entire cloud goes
down and I cannot search it. The solrconfig.xml and schema.xml are good
because when
Is it possible that this has something do do with it?
59157032 [Thread-2] INFO org.apache.solr.cloud.Overseer – Update state
numShards=null message={
numShards=null
--
View this message in context:
On 5/30/2013 1:19 PM, Jonathan Rochkind wrote:
Okay, sadly, i still can't get this to work.
Following the instructions at:
https://wiki.apache.org/solr/SolrLogging#Using_the_example_logging_setup_in_containers_other_than_Jetty
I copied solr/example/lib/ext/*.jar into my tomcat's ./lib, and
I've set up a simple 10 node, 5 shard SolrCloud 4.3. I'm pushing just a few
thousand documents into it. What I'm doing is rather write intensive
100x...more writes than reads. I've noticed that there seems to be an
unbounded use of resources. I'm seeing a steadily increasing number of
network
Okay, for posterity: I did manage to get it working. It WAS lack of the
logging files.
First, the only way I could manage to get Tomcat6 to log an actual
stacktrace for the Error filterStart was to _delete_ my
CATALINA_HOME/conf/logging.properties file. Apparently without this
file at all,
Good day everyone.
I recently faced another problem. I've got a bunch of documents to index.
The problem, that they in the same time database for another application.
These documents stored in JSON format in the following scheme:
{
id: 10,
name: dad 177,
cat:[{
id:254,
name:124
}]
Jamey,
You will need a load balancer on the front end to direct traffic into one of
your SolrCore entry points. It doesn't matter, technically, which one though
you will find benefits to narrowing traffic to fewer (for purposes of better
cache management).
Internally SolrCloud will
Working to setup SolrCloud in Windows Azure. I have read over the solr Cloud
wiki, but am a little confused about some of the deployment options. I am
attaching an image for what I am thinking we want to do. 2 VM's that will have
2 shards spanning across them. 4 Nodes total across the two
Hi,
Thanks very much for the explanation! Could we config to get to old behavior?
I asked this option because our app has many small cores so that we prefer
create/close writer on the fly (otherwise we may have memory issue quickly).
We also do not need NRT for now.
Thanks very much for helps,
Hi Eric,
Thanks very much for helps (I should have responded sooner):
1/ My problem in 3.6 turned out to be much related to the fact I did not share
schema,
after using shareSchema, the start time is reduced up to 80% (to my great
surprise,
previously I thought burden is most in
I will look at these problems. Thanks for trying it out!
Lance Norskog
On 05/28/2013 10:08 PM, Patrick Mi wrote:
Hi there,
Checked out branch_4x and applied the latest patch
LUCENE-2899-current.patch however I ran into 2 problems
Followed the wiki page instruction and set up a field with
I did more test and it seems that this is still a bug (previous issue 3/):
1/ Create a core by CURL command with dataDir=some_folder, core is created OK
and later indexing worked OK also.
2/ But in solr.xml, dadaDir is not defined in element core
3/ After restart solr, dataDir
Hi AllI am trying to understand what gets stored when i configure a field
indexed and stored for example i have this in my schema.xmlfield
name=articleBody type=text_general indexed=true stored=true /and
fieldType name=text_general class=solr.TextField
positionIncrementGap=100
On 5/30/2013 8:30 AM, Dotan Cohen wrote:
On Wed, May 29, 2013 at 5:37 PM, Shawn Heisey s...@elyograg.org wrote:
It's impossible for us to give you hard numbers. You'll have to
experiment to know how fast you can reindex without killing your
servers. A basic tenet for such experimentation, and
What would be the steps if we want to use Mongolian or any other language that
is not supported?
-Original Message-
From: Jack Krupansky [mailto:j...@basetechnology.com]
Sent: Thursday, May 30, 2013 5:43 PM
To: solr-user@lucene.apache.org
Subject: Re: Support for Mongolian language
No,
Update Request Processors to the rescue again. Namely, the HTML Strip Field
Update processor:
Add to your solrconfig:
updateRequestProcessorChain name=html-strip-features
processor class=solr.HTMLStripFieldUpdateProcessorFactory
str name=fieldNamefeatures/str
/processor
Well, you would need a tokenizer, probably a stemmer, a list of
stop-words (to ignore). Is the original text in UTF8 or is it in some
alternative encoding.
A quick search showed that there is an academic paper where they are
trying to work with Mongolian to get it into Lucene. It seems quite
Thanks Alexandre for the link. It was really helpful.
The original text will be in UTF-8.
-Original Message-
From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
Sent: Friday, May 31, 2013 8:41 AM
To: solr-user@lucene.apache.org
Subject: Re: Support for Mongolian language
Well, you
Try using the text_general field type and see how reasonable or
unreasonable the standard tokenizer is at identifying reasonable word breaks
for some sample Mongolian text.
Use the Solr Admin UI Analyzer page to see what the various term analysis
filters output.
-- Jack Krupansky
Hi,
On solr admin UI, in a query I am trying to highlight some fields. I have set
hl = true, given name of comma separated fields in hl.fl but fields are not
getting highlighted. Any insights?
Regards,
Sagar
DISCLAIMER:
Sorry for wrong subject. Corrected it.
-Original Message-
From: Sagar Chaturvedi [mailto:sagar.chaturv...@nectechnologies.in]
Sent: Friday, May 31, 2013 11:25 AM
To: solr-user@lucene.apache.org
Subject: RE: Support for Mongolian language
Hi,
On solr admin UI, in a query I am trying to
82 matches
Mail list logo