It seems that you are using the bbyopen data. If have made up your mind on
using the JSON data then simply store it in ElasticSearch instead of Solr
as they do take any valid JSON structure. Otherwise, you can download the
xml archive from bbyopen and prepare a schema:
Here are some generic
@Prakash: Can your please format the body a bit for readability?
@Solr-Users: Is anybody else having any problems when running Zookeeper
from the latest code in the trunk(4.x)?
On Mon, Nov 7, 2011 at 4:44 PM, prakash chandrasekaran
prakashchandraseka...@live.com wrote:
hi all, i followed
Hello,
I have a unique dataset of 1,110,000 products, each as its own file.
It is split into three different directories as 500,000 and 110,000
files and 500,000.
When I run:
http://localhost:8983/solr/bbyopen/dataimport?command=full-importclean=falsecommit=true
The first 500,000 entries are
Bah it worked after cleaning it out for the 3rd time, don't know what
I did differently this time :(
result name=response numFound=1110983 start=0/
On Tue, Oct 4, 2011 at 8:00 PM, Pulkit Singhal pulkitsing...@gmail.com wrote:
Hello,
I have a unique dataset of 1,110,000 products, each as its
Its rather strange stacktrace(at the bottom).
An entire 1+ dataset finishes up only to end up crashing burning
due to a log statement :)
Based on what I can tell from the stacktrace and the 4.x trunk source
code, it seems that the follwoign log statement dies:
The Problem:
When using DIH with trunk 4.x, I am seeing some very funny numbers
with a particularly large XML file that I'm trying to import. Usually
there are bound to be more rows than documents indexed in DIH because
of the foreach property but my other xm lfiles have maybe 1.5 times
for this. The fix should have two parts:
1) fix the exception
2) log and ignore exceptions in the LogProcessor
On Sat, Oct 1, 2011 at 2:02 PM, Pulkit Singhal pulkitsing...@gmail.comwrote:
Its rather strange stacktrace(at the bottom).
An entire 1+ dataset finishes up only to end up crashing burning
due
SOLR-2355 is definitely a step in the right direction but something I
would like to get clarified:
a) There were some fixes to it that went on the 3.4 3.5 branch based
on the comments section ... are they not available or not needed on
4.x trunk?
b) Does this basic implementation distribute
30, 2011 at 11:26 AM, Pulkit Singhal
pulkitsing...@gmail.com wrote:
SOLR-2355 is definitely a step in the right direction but something I
would like to get clarified:
a) There were some fixes to it that went on the 3.4 3.5 branch based
on the comments section ... are they not available
At first glance it seems like a simple localization issue as indicated by this:
org.apache.uima.annotator.dict_annot.impl.DictionaryAnnotatorProcessException:
EXCEPTION MESSAGE LOCALIZATION FAILED: java.util.MissingResourceException:
Can't find bundle for base name
@Darren: I feel that the question itself is misleading. Creating
shards is meant to separate out the data ... not keep the exact same
copy of it.
I think the two node setup that was attempted by Sam mislead him and
us into thinking that configuring two nodes which are to be named
shard1 ...
Can you monitor the DB side to see what results it returned for that query?
2011/8/30 于浩 yuhao.1...@gmail.com:
I am using solr1.3,I updated solr index throgh solr delta import every two
hours. but the delta import is database connection wasteful.
So i want to use full-import with entity name
Did you find out about this?
2011/8/2 Yury Kats yuryk...@yahoo.com:
I have multiple SolrCloud instances, each running its own Zookeeper
(Solr launched with -DzkRun).
I would like to create an ensemble out of them. I know about -DzkHost
parameter, but can I achieve the same programmatically?
Few thoughts:
1) If you place the script transformer method on the entity named x
and then pass the ${topic_tree.topic_id} to that as an argument, then
shouldn't you have everything you need to work with x's row? Even if
you can't look up at the parent, all you needed to know was the
topic_id and
Hello,
I'm using DIH in the trunk version and I have placed breakpoints in
the Solr code.
I can see that the value for a row being fed into the
ScriptTransformer instance is:
{buybackPlans.buybackPlan.type=[PSP-PRP],
buybackPlans.buybackPlan.name=[2-Year Buy Back Plan],
Hello Everyone,
I was wondering what are the various best practices that everyone
follows for indexing nested XML into Solr. Please don't feel limited
by examples, feel free to share your own experiences.
Given an xml structure such as the following:
categoryPath
category
Not sure if this is a good lead for you but when I run out-of-the-box
multi-core example-DIH instance of Solr, I often see core name thrown
about in the logs. Perhaps you can look there?
On Thu, Sep 15, 2011 at 6:50 AM, Joan joan.monp...@gmail.com wrote:
Hi,
I have multiple core in Solr and I
I am NOT claiming that making a copy of a copy field is wrong or leads
to a race condition. I don't know that. BUT did you try to copy into
the text field directly from the genre field? Instead of the
genre_search field? Did that yield working queries?
On Wed, Sep 21, 2011 at 12:16 PM, Tanner
Usually any good piece of java code refrains from capturing Throwable
so that Errors will bubble up unlike exceptions. Having said that,
perhaps someone in the list can help, if you share which particular
Solr version you are using where you suspect that the Error is being
eaten up.
On Fri, Sep
Also you may use the script transformer to explicitly remove the field
from the document if the field is null. I do this for all my sdouble
and sdate fields ... its a bit manual and I would like to see Solr
enhanced to simply skip stuff like this by having a flag for its DIH
code but until then it
Hello,
I was wondering where can I find the source code for DIH? I want to
checkout the source and step-trhought it breakpoint by breakpoint to
understand it better :)
Thanks!
- Pulkit
Correct! With that additional info, plus
http://wiki.apache.org/solr/HowToContribute (ant eclipse), plus a
refreshed (close/open) eclipse project ... I'm all set.
Thanks Again.
On Wed, Sep 21, 2011 at 1:43 PM, Gora Mohanty g...@mimirtech.com wrote:
On Thu, Sep 22, 2011 at 12:08 AM, Pulkit
, that fixed it. Thanks.
On Wed, Sep 21, 2011 at 11:01 AM, Tanner Postert
tanner.post...@gmail.comwrote:
i believe that was the original configuration, but I can switch it back and
see if that yields any results.
On Wed, Sep 21, 2011 at 10:54 AM, Pulkit Singhal
pulkitsing...@gmail.comwrote:
I
Hello Everyone,
I need help in:
(a) figuring out the causes of OutOfMemoryError (OOM) when I run Data
Import Handler (DIH),
(b) finding workarounds and fixes to get rid of the OOM issue per cause.
The stacktrace is at the very bottom to avoid having your eyes glaze
over and to prevent you from
Hi Hoss,
Thanks for the input!
Something rather strange happened. I fixed my regex such that instead
of returning just 1,000 ... it would return 1,000.00 and voila it
worked! So Parsing group separators is already supported apparently
then ... its just that the format is also looking for a
The data I'm running through the DIH looks like:
products
product
newfalse/new
activefalse/active
regularPrice349.99/regularPrice
salesRankShortTerm/
/product
/products
As you can see, in this particular instance of a product, there is no
value for salesRankShortTerm which
OMG, I'm so sorry, please ignore.
Its so simple, just had to use:
row.remove( 'salesRankShortTerm' );
because the script runs at the end after the entire entity has been
processed (I suppose) rather than per field.
Thanks!
On Tue, Sep 20, 2011 at 5:42 PM, Pulkit Singhal pulkitsing...@gmail.com
Hello,
I am running a simple test after reading:
http://wiki.apache.org/solr/UpdateJSON
I am only using one object from a large json file to test and see if
the indexing works:
curl 'http://localhost:8983/solr/update/json?commit=true'
--data-binary @productSample.json -H
Hello Everyone,
I'm quite curious about how does the following data get understood and
indexed by Solr?
[{
id:Fubar,
url: null,
regularPrice: 3.99,
offers: [
{
url: ,
text: On Sale,
id: OS
}
]
}]
1) The field id is present as part of the main object and as part of
a
, Pulkit Singhal pulkitsing...@gmail.com wrote:
Hello,
I am running a simple test after reading:
http://wiki.apache.org/solr/UpdateJSON
I am only using one object from a large json file to test and see if
the indexing works:
curl 'http://localhost:8983/solr/update/json?commit=true'
--data
Any updates on this topic?
On Fri, Jul 16, 2010 at 5:36 PM, P Williams
williams.tricia.l...@gmail.com wrote:
Hi All,
Has anyone gotten the DataImportHandler to work with json as input? Is
there an even easier alternative to DIH? Could you show me an example?
Many thanks,
Tricia
Ah I see now:
http://wiki.apache.org/solr/UpdateJSON#Example
Not part of DIH that's all.
On Sun, Sep 18, 2011 at 5:42 PM, Pulkit Singhal pulkitsing...@gmail.com wrote:
Any updates on this topic?
On Fri, Jul 16, 2010 at 5:36 PM, P Williams
williams.tricia.l...@gmail.com wrote:
Hi All
My DIH's full-import logs end with a tailing output saying that 1500
documents were added, which is correct because I have 16 sources and
one of them was down and each source is supposed to give me 100
results:
(1500 adds)],optimize=} 0 0
But When I check my document count I get only 1384
Thanks Hoss. I agree that the way you restated the question is better
for getting results. BTW I think you've tipped me off to exactly what
I needed with this URL: http://bbyopen.com/
Thanks!
- Pulkit
On Fri, Sep 16, 2011 at 4:35 PM, Chris Hostetter
hossman_luc...@fucit.org wrote:
: Has anyone
Hello Folks,
Surprisingly, the value from the following raw data gives me a NFE
(Number Format Exception) when running the DIH (Data Import Handler):
span class=tgProductPrice$1,000.00/span
The error logs look like:
Caused by: org.apache.solr.common.SolrException: Error while creating
field
Hello,
I need to pull out the price and imageURL for products in an Amazon RSS feed.
PROBLEM STATEMENT:
The following:
field column=description
xpath=/rss/channel/item/description
/
field column=price
Hello Everyone,
I have a goal of populating Solr with a million unique products in
order to create a test environment for a proof of concept. I started
out by using DIH with Amazon RSS feeds but I've quickly realized that
there's no way I can glean a million products from one RSS feed. And
I'd go
Ah missing } doh!
BTW I still welcome any ideas on how to build an e-commerce test base.
It doesn't have to be amazon that was jsut my approach, any one?
- Pulkit
On Thu, Sep 15, 2011 at 8:52 PM, Pulkit Singhal pulkitsing...@gmail.com wrote:
Thanks for all the feedback thus far. Now to get
Thanks for all the feedback thus far. Now to get little technical about it :)
I was thinking of feeding a file with all the tags of amazon that
yield close to roughly 5 results each into a file and then running
my rss DIH off of that, I came up with the following config but
something is
Hello,
Feel free to point me to alternate sources of information if you deem
this question unworthy of the Solr list :)
But until then please hear me out!
When my config is something like:
field column=imageUrl
regex=.*img src=.(.*)\.gif..alt=.*
;.*
sourceColName=description
/
Cheers,
- Pulkit
On Wed, Sep 14, 2011 at 2:24 PM, Pulkit Singhal pulkitsing...@gmail.com wrote:
Hello,
Feel free to point me to alternate sources of information if you deem
this question unworthy of the Solr list :)
But until then please
: Is there another
option for navigating the HTML DOM tree using some well-tested transformer
or TIka or something?
Thanks!
- Pulkit
On Mon, Sep 12, 2011 at 1:45 PM, Pulkit Singhal pulkitsing...@gmail.comwrote:
Given an RSS raw feed source link such as the following:
http://persistent.info/cgi-bin
This solution doesn't seem to be working for me.
I am using Solr trunk and I have the same question as Bernd with a small
twist: the field that should NOT be empty, happens to be a derived field
called price, see the config below:
entity ...
Oh and Im sure that I'm using Java 6 because the properties from the Solr
webpage spit out:
java.runtime.version = 1.6.0_26-b03-384-10M3425
On Tue, Sep 13, 2011 at 4:15 PM, Pulkit Singhal pulkitsing...@gmail.comwrote:
This solution doesn't seem to be working for me.
I am using Solr trunk
Hello,
1) The documented explanation of skipDoc and skipRow is not enough
for me to discern the difference between them:
$skipDoc : Skip the current document . Do not add it to Solr. The
value can be String true/false
$skipRow : Skip the current row. The document will be added with rows
from
Given an RSS raw feed source link such as the following:
http://persistent.info/cgi-bin/feed-proxy?url=http%3A%2F%2Fwww.amazon.com%2Frss%2Ftag%2Fblu-ray%2Fnew%2Fref%3Dtag_rsh_hl_ersn
I can easily get to the value of the description for an item like so:
field column=description
Hello Bill,
I can't really answer your question about replicaiton being supported on
Solr3.3 (I use trunk 4.x myself) BUT I can tell you that if each Solr node
has just one core ... only then does it make sense to use
-Denable.master=true and -Denable.slave=true ... otherwise, as Yury points
out,
I don't see anywhere in:
http://issues.apache.org/jira/browse/SOLR-2305
any statement that shows the code's inclusion was decided against
when did this happen and what is needed from the community before
someone with the powers to do so will actually commit this?
2011/6/24 Noble Paul നോബിള്
Just to clarify, that link doesn't do anything to promote an already running
slave into a master. One would have to bounce the Solr node which has that
slave and then make the shift. It is not something that happens at runtime
live.
On Wed, Aug 10, 2011 at 4:04 PM, Akshay akm...@gmail.com wrote:
to
work perfectly well with this setup.
2011/9/9 Yury Kats yuryk...@yahoo.com:
On 9/9/2011 6:54 PM, Pulkit Singhal wrote:
Thanks Again.
Another question:
My solr.xml has:
cores adminPath=/admin/cores defaultCoreName=master1
core name=master1 instanceDir=. shard=shard1
collection
straight on this.
Also it would be nice if I knew the code well enough to just look @ it
and give an authoritative answer. Does anyone have that kind of
expertise? Reverse-engineering is getting a bit mundane.
Thanks!
- Pulkit
On Sat, Sep 10, 2011 at 11:43 AM, Pulkit Singhal
pulkitsing...@gmail.com
straight on this.
Also it would be nice if I knew the code well enough to just look @ it
and give an authoritative answer. Does anyone have that kind of
expertise? Reverse-engineering is getting a bit mundane.
Thanks!
- Pulkit
On Sat, Sep 10, 2011 at 11:43 AM, Pulkit Singhal
pulkitsing
Hi Yury,
How do you manage to start the instances without any issues? The way I see
it, no matter which instance is started first, the slave will complain about
not being to find its respective master because that instance hasn't been
started yet ... no?
Thanks,
- Pulkit
2011/5/17 Yury Kats
with this thread's help here:
http://pulkitsinghal.blogspot.com/2011/09/multicore-master-slave-replication-in.html
Thanks,
- Pulkit
On Sat, Sep 10, 2011 at 2:54 PM, Pulkit Singhal pulkitsing...@gmail.comwrote:
Hi Yury,
How do you manage to start the instances without any issues? The way I see
, Sep 7, 2011 at 8:34 PM, Yury Kats yuryk...@yahoo.com wrote:
On 9/7/2011 3:18 PM, Pulkit Singhal wrote:
Hello,
I'm working off the trunk and the following wiki link:
http://wiki.apache.org/solr/SolrCloud
The wiki link has a section that seeks to quickly familiarize a user
with replication
Hello Jan,
You've made a very good point in (b). I would be happy to make the
edit to the wiki if I understood your explanation completely.
When you say that it is looking up what collection that core is part
of ... I'm curious how a core is being put under a particular
collection in the first
it in with the collection name
(myconf) without any need to specify anything at startup via -D or
statically in solr.xml file.
Validate away otherwise I'll just accept any hate mail after making
edits to the Solr wiki directly.
- Pulkit
On Fri, Sep 9, 2011 at 11:38 AM, Pulkit Singhal pulkitsing
?
- Pulkit
2011/9/9 Yury Kats yuryk...@yahoo.com:
On 9/9/2011 10:52 AM, Pulkit Singhal wrote:
Thank You Yury. After looking at your thread, there's something I must
clarify: Is solr.xml not uploaded and held in ZooKeeper?
Not as far as I understand. Cores are loaded/created by the local
Solr
-mac.local:8983_solr_ (v=0)
node_name=tiklup-mac.local:8983_solr
url=http://tiklup-mac.local:8983/solr/;
Thanks!
- Pulkit
On Fri, Sep 9, 2011 at 5:54 PM, Pulkit Singhal pulkitsing...@gmail.com wrote:
Thanks Again.
Another question:
My solr.xml has:
cores adminPath=/admin/cores
Hello,
I'm working off the trunk and the following wiki link:
http://wiki.apache.org/solr/SolrCloud
The wiki link has a section that seeks to quickly familiarize a user
with replication in SolrCloud - Example B: Simple two shard cluster
with shard replicas
But after going through it, I have to
is also the application war for the program that
will communicate as the client with the Solr server.
On Thu, Feb 18, 2010 at 5:49 PM, Richard Frovarp rfrov...@apache.org wrote:
On 2/18/2010 4:22 PM, Pulkit Singhal wrote:
Hello Everyone,
I do NOT want to host Solr separately. I want to run it within
On Thu, Feb 18, 2010 at 10:52 PM, Pulkit Singhal
pulkitsing...@gmail.com wrote:
Hello All,
When I use Maven or Eclipse to try and compile my bean which has the
@Field annotation as specified in http://wiki.apache.org/solr/Solrj
page ... the compiler doesn't find any class to support
I'm getting the following exception
SEVERE: org.apache.solr.common.SolrException: ERROR:unknown field 'desc'
I'm wondering what I need to do in order to add the desc field to
the Solr schema for indexing?
Hello All,
When I use Maven or Eclipse to try and compile my bean which has the
@Field annotation as specified in http://wiki.apache.org/solr/Solrj
page ... the compiler doesn't find any class to support the
annotation. What jar should we use to bring in this custom Solr
annotation?
:
Add desc as a field in your schema.xml
file would be my first guess.
Providing some explanation of what you're trying to do
would help diagnose your issues.
HTH
Erick
On Thu, Feb 18, 2010 at 12:21 PM, Pulkit Singhal
pulkitsing...@gmail.comwrote:
I'm getting the following exception
Hello Everyone,
I do NOT want to host Solr separately. I want to run it within my war
with the Java Application which is using it. How easy/difficult is
that to setup? Can anyone with past experience on this topic, please
comment.
thanks,
- Pulkit
, at 22:23, Pulkit Singhal pulkitsing...@gmail.com
wrote:
Hello Everyone,
I do NOT want to host Solr separately. I want to run it within my war
with the Java Application which is using it. How easy/difficult is
that to setup? Can anyone with past experience on this topic, please
comment.
thanks
67 matches
Mail list logo