I wonder if Boilerplate could be helpful here? Boilerplate is now integrated
in Tika.
Otis
Performance Monitoring for Solr / ElasticSearch / HBase -
http://sematext.com/spm
>
> From: "Mark , N"
>To: solr-user@lucene.apache.org
>Sent: Thursday, May 24,
Hi Dmitry ,
There is no out of memory execution in solr ..
Thanks and Regards,
S SYED ABDUL KATHER
On Thu, May 24, 2012 at 1:14 AM, Dmitry Kan [via Lucene] <
ml-node+s472066n3985762...@n3.nabble.com> wrote:
> do you also see out of memory exception in your tomcat logs? If s
Is it possible to filter certain repeated footer information from text
documents while indexing to solr ?
Are there any built-in filters similar to stop word filters ?
--
Thanks,
*Nipen Mark *
Hi Manish,
The attachment seems to be missing. Would you mind sharing the same.
Am a Search Engineer based in Bangalore. Would me interested in attending
the workshop.
Best Regards,
Dikchant Sahi
On Thu, May 24, 2012 at 10:22 AM, Manish Bafna wrote:
> Dear Friend,
> We are organizing a worksho
Dear Friend,
We are organizing a workshop on Big Data. Here are details regarding the
same.
Please forward it to your company HR and also your friends and let me know
if anyone is interested. We have early bird offer if registration is done
before 31st May 2012.
Big Data is one space that is buzz
Well, 12000 is probably too little to do a representative sizing, but you can
try an optimize() and then calculate what the size will be for 80mill docs.
You'll definitely not be able to cache the whole index in memory on one server,
but if you can live with that kind of performance then it's ok
I finally get it working... I was compiling using a different version of solr
my class 3.6, while my working solr version was the 3.5
--
View this message in context:
http://lucene.472066.n3.nabble.com/Dismax-boost-payload-boost-tp3432650p3985797.html
Sent from the Solr - User mailing list archiv
Hi Mark, Darren
Thanks very much for your help, Will try collection for each customer then.
Regards,
Yandong
2012/5/22 Mark Miller
> I think the key is this: you want to think of a SolrCore on a single node
> Solr installation as a collection on a multi node SolrCloud installation.
>
> So if
My interest in this is the desire to create one index per user of a system -
the issue here is privacy - data indexed for one user should not be visible
to other users.
For this purpose solr will be hidden behind a proxy which steers
authenticated sessions to the appropriat ecore.
Does this seem
Hi Everyone,
solr 3.6 does not seem to be honoring the field compress.
While merging the indexes the size of Index is very big.
Is there any other way to handle this to keep compression functionality?
thanks,
--Pramila
--
View this message in context:
http://lucene.472066.n3.nabble.com/fie
do you also see out of memory exception in your tomcat logs? If so, try
setting the JVM's -Xmx to something reasonable.
-- Dmitry
On Wed, May 23, 2012 at 10:09 PM, in.abdul wrote:
> Sorry i missed the point i am already using Method.Post Only .. Still i
> could not able to execute
>
Yeah, currently you have to create the core on each node...we are working on a
'collections' api that will make this a simple one call operation.
We should have this soon.
- Mark
On May 23, 2012, at 2:36 PM, Daniel Brügge wrote:
> Hi,
>
> i am creating several cores using the following script
Explicitly running an optimize on the index via the admin screens solved
this problem - the correct counts are now being returned.
On Tue, May 22, 2012 at 4:33 PM, Mike Hugo wrote:
> We're testing a snapshot of Solr4 and I'm looking at some of the responses
> from the Luke request handler. Ever
Sorry i missed the point i am already using Method.Post Only .. Still i
could not able to execute
Thanks and Regards,
S SYED ABDUL KATHER
On Thu, May 24, 2012 at 12:19 AM, iorixxx [via Lucene] <
ml-node+s472066n3985746...@n3.nabble.com> wrote:
> > I have creteria where
Please any one can help me on this
Rgds
AJ
On 23-May-2012, at 14:37, Jens Grivolla wrote:
> So are you even doing text search in Solr at all, or just using it as a
> key-value store?
>
> If the latter, do you have your schema configured so
> that only the search_id field is indexed (with a ke
hi iorixxx,
Thank you for your reply. Appreciate it. There are few areas I need little
clarity. I am not using any queries. Every thing is been implemented as part
of config files(schema.xml, data-config.xml, solr-config.xml). Could you
give some more hints based on below config files specificatio
> I have creteria where i am passing more than
> 10 ids in Query like
> q=(ROWINDEX:(1 2 3 4 )) using solrJ . i had increased
> the Max Boolean
> clause = 10500 and i had increased the Max Header
> Size in tomcat also
> in sufficient amount .. But still its is throwing Null
> Poin
Hi,
i am creating several cores using the following script. I use this for
testing SolrCloud and to learn about the distribution of multiple
collections.
max=500
> for ((i=2; i<=$max; ++i )) ;
> do
> curl "
> http://solrinstance1:8983/solr/admin/cores?action=CREATE&name=collection$i&collectio
Team ,
I have creteria where i am passing more than 10 ids in Query like
q=(ROWINDEX:(1 2 3 4 )) using solrJ . i had increased the Max Boolean
clause = 10500 and i had increased the Max Header Size in tomcat also
in sufficient amount .. But still its is throwing Null Pointer Excep
If you want to suppress merging, set the 'mergeFactor' very high.
Perhaps 100. Note that Lucene opens many files (50? 100? 200?) for
each segment. You would have to set the 'ulimit' for file descriptors
to 'unlimited' or 'millions'.
Later, you can call optimize with a 'maxSegments' value. Optimize
I am trying to do a very large insertion (about 68million documents) into a
solr instance.
Our schema is pretty simple. About 40 fields using these types:
We are runnin
Greetings,
I'm looking for pointers on where to start when creating a
custom QueryCache.
Our usage patterns are possibly a bit unique, so let me explain the desired
use case:
Our Solr index is read-only except for dedicated periods where it is
updated and re-optimized.
On startup, I would like t
> I am using DIH to import data from
> Oracle and every thing is working fine.
> the description field usually contains more lines
> (from 10-300 lines). When
> I present the results through Solr/Browse It displays the
> results. However I
> have a requirement to show only 2-3 lines as description
Anyone found a solution to the getTransformer error. I am getting the same
error.
Here is my output:
Problem accessing /solr/JOBS/select/. Reason:
getTransformer fails in getContentType
java.lang.RuntimeException: getTransformer fails in getContentType
at
org.apache.solr.response.X
Here is my query:
http://127.0.0.1:/solr/JOBS/select/??q=Apache&wt=xslt&tr=example.xslt
The response I get is the following. I have example.xslt in the /conf/xslt
path. What is wrong here? Thanks!
HTTP ERROR 500
Problem accessing /solr/JOBS/select/. Reason:
getTransformer fails in
We have an issue with TermComponent on Solr 3.6 (and 3.5), using term list on
filed id (unique id of documents) we receive as reply that we have multiple
documents with the same id!!
Doing a search only one doc is returned as expected.
After more deep investigation this issue is "fixed" doing an
I am using DIH to import data from Oracle and every thing is working fine.
the description field usually contains more lines (from 10-300 lines). When
I present the results through Solr/Browse It displays the results. However I
have a requirement to show only 2-3 lines as description and provide s
Hi Lee Carrol,
Will take a look on the pointers. Really appreciate your feedback, thank
you.
Regards,
Robby
On Wed, May 23, 2012 at 5:04 AM, Lee Carroll
wrote:
> Take a look at the clustering component
>
> http://wiki.apache.org/solr/ClusteringComponent
>
> Consider clustering off line and ind
> I'd guess that this is because SnowballPorterFilterFactory
> does not implement MultiTermAwareComponent. Not sure, though.
Yes, I think this hinders the automagically multiterm awarness to do it's
job.
Could an own analyzer chain with help? Like
described (very, very short, too short...) here:
Thanks Gora i,t worked.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Indexing-files-using-multi-cores-could-not-fix-after-many-retries-tp3985253p3985672.html
Sent from the Solr - User mailing list archive at Nabble.com.
I'd guess that this is because SnowballPorterFilterFactory does not implement
MultiTermAwareComponent. Not sure, though.
-Michael
> Maybe a filter like ISOLatin1AccentFilter that doesn't get
> applied when
> using wildcards? How do the terms actually appear in the index?
Bär get indexed as bar.
I use not ISOLatin1AccentFilter . My field def is this:
Maybe a filter like ISOLatin1AccentFilter that doesn't get applied when
using wildcards? How do the terms actually appear in the index?
Jens
On 05/23/2012 01:19 PM, spr...@gmx.eu wrote:
No one an idea?
Thx.
The text may contain "FooBar".
When I do a wildcard search like this: "Foo*" -
Hi,
I'm in the middle of planning a new Solr setup. The situation is this:
- We currently have one document type with around 20 fields, indexed, not
stored, except for a few date fields
- We currently have indexed 400M documents across 20+ shards.
- The number of documents to be indexed is around
> -Original Message-
> From: Dmitry Kan [mailto:dmitry@gmail.com]
> Sent: Mittwoch, 23. Mai 2012 14:02
> To: solr-user@lucene.apache.org
> Subject: Re: Wildcard-Search Solr 3.5.0
>
> do umlauts arrive properly on the server side, no encoding
> issues?
Yes, works fine.
It must, s
do umlauts arrive properly on the server side, no encoding issues? Check
the query params of the response xml/json/.. set debugQuery to true as well
to see if it produces any useful diagnostic info.
On Wed, May 23, 2012 at 2:58 PM, wrote:
> No. No hits for bä*.
> It's something with the umlauts
No. No hits for bä*.
It's something with the umlauts but I have no idea what...
> -Original Message-
> From: Dmitry Kan [mailto:dmitry@gmail.com]
> Sent: Mittwoch, 23. Mai 2012 13:36
> To: solr-user@lucene.apache.org
> Subject: Re: Wildcard-Search Solr 3.5.0
>
> what about bä*->hits?
what about bä*->hits?
-- Dmitry
On Wed, May 23, 2012 at 2:19 PM, wrote:
> No one an idea?
>
> Thx.
>
>
> > > The text may contain "FooBar".
> > >
> > > When I do a wildcard search like this: "Foo*" - no hits.
> > > When I do a wildcard search like this: "foo*" - doc is
> > > found.
> >
> >
No one an idea?
Thx.
> > The text may contain "FooBar".
> >
> > When I do a wildcard search like this: "Foo*" - no hits.
> > When I do a wildcard search like this: "foo*" - doc is
> > found.
>
> Please see http://wiki.apache.org/solr/MultitermQueryAnalysis
Well, it works in 3.6. With one
yes, as in the linked post:
public class PayloadSimilarity extends DefaultSimilarity
{
@Override public float scorePayload(int docId, String fieldName, int
start, int end, byte[] payload, int offset, int length)
{
if (length > 0) {
return PayloadHelper.decodeFloat(payload,
Hi Sohail,
In my previous mail, I mentioned about storing categories as separate
record. You should store and index Category name, MainProduct name as
separate record. Index ChildProduct name, MainProduct as separate record.
When you want the count,
1. Retrieve the main product name matching the
Jens,
Yes we are doing text search.
My question to all is, the approach of creating cores for each user is a
good idea?
AJ
On Wed, May 23, 2012 at 2:37 PM, Jens Grivolla wrote:
> So are you even doing text search in Solr at all, or just using it as a
> key-value store?
>
> If the latter, do y
So are you even doing text search in Solr at all, or just using it as a
key-value store?
If the latter, do you have your schema configured so
that only the search_id field is indexed (with a keyword tokenizer) and
everything else only stored? Also, are you sure that Solr is the best
option as a
Hello,
Set up a schema with at least 3 fields: id (integer or string, unique),
doc_contents (of type text_en for example), link (string).
Index each document into doc_contents field and use hightlights when
searching (hl=true&hl.fl=doc_contents). For each hit, count how many
highlights you have go
> In the query debug i can see that i'm using the modified
> query parser
> however there aren't debug information about the payload
> boosts.
> I've not implemented a request handler, but i'm specifiying
> all the
> parameters (e.g. defType=payload plf=entityIdList...) in the
> request.
> What am
I'm trying to get working the digitalpebble payload queryparser...
*I've a multivalued field with payloads:*
*with type:*
In the query debug i can see that i'm using the modified query parser
however there
Awaiting for suggestions.
On Wed, May 23, 2012 at 8:04 AM, Amit Jha wrote:
> Hi,
>
> Thanks for your advice.
> It is basically a meta search application. Users can perform a search on N
> number of data sources at a time. We broadcast Parallel search to each
> selected data sources and write da
47 matches
Mail list logo