Re: Error while trying to load JSON

2012-03-16 Thread Pulkit Singhal
It seems that you are using the bbyopen data. If have made up your mind on using the JSON data then simply store it in ElasticSearch instead of Solr as they do take any valid JSON structure. Otherwise, you can download the xml archive from bbyopen and prepare a schema: Here are some generic

Re: Replication fails in SolrCloud

2011-11-08 Thread Pulkit Singhal
@Prakash: Can your please format the body a bit for readability? @Solr-Users: Is anybody else having any problems when running Zookeeper from the latest code in the trunk(4.x)? On Mon, Nov 7, 2011 at 4:44 PM, prakash chandrasekaran prakashchandraseka...@live.com wrote: hi all, i followed

DIH full-import with clean=false is still removing old data

2011-10-04 Thread Pulkit Singhal
Hello, I have a unique dataset of 1,110,000 products, each as its own file. It is split into three different directories as 500,000 and 110,000 files and 500,000. When I run: http://localhost:8983/solr/bbyopen/dataimport?command=full-importclean=falsecommit=true The first 500,000 entries are

Re: DIH full-import with clean=false is still removing old data

2011-10-04 Thread Pulkit Singhal
Bah it worked after cleaning it out for the 3rd time, don't know what I did differently this time :( result name=response numFound=1110983 start=0/ On Tue, Oct 4, 2011 at 8:00 PM, Pulkit Singhal pulkitsing...@gmail.com wrote: Hello, I have a unique dataset of 1,110,000 products, each as its

Bug in DIH?

2011-10-01 Thread Pulkit Singhal
Its rather strange stacktrace(at the bottom). An entire 1+ dataset finishes up only to end up crashing burning due to a log statement :) Based on what I can tell from the stacktrace and the 4.x trunk source code, it seems that the follwoign log statement dies:

Enabling the right logs for DIH

2011-10-01 Thread Pulkit Singhal
The Problem: When using DIH with trunk 4.x, I am seeing some very funny numbers with a particularly large XML file that I'm trying to import. Usually there are bound to be more rows than documents indexed in DIH because of the foreach property but my other xm lfiles have maybe 1.5 times

Re: Bug in DIH?

2011-10-01 Thread Pulkit Singhal
for this. The fix should have two parts: 1) fix the exception 2) log and ignore exceptions in the LogProcessor On Sat, Oct 1, 2011 at 2:02 PM, Pulkit Singhal pulkitsing...@gmail.comwrote: Its rather strange stacktrace(at the bottom). An entire 1+ dataset finishes up only to end up crashing burning due

Re: basic solr cloud questions

2011-09-30 Thread Pulkit Singhal
SOLR-2355 is definitely a step in the right direction but something I would like to get clarified: a) There were some fixes to it that went on the 3.4 3.5 branch based on the comments section ... are they not available or not needed on 4.x trunk? b) Does this basic implementation distribute

Re: basic solr cloud questions

2011-09-30 Thread Pulkit Singhal
30, 2011 at 11:26 AM, Pulkit Singhal pulkitsing...@gmail.com wrote: SOLR-2355 is definitely a step in the right direction but something I would like to get clarified: a) There were some fixes to it that went on the 3.4 3.5 branch based on the comments section ... are they not available

Re: UIMA DictionaryAnnotator partOfSpeach

2011-09-28 Thread Pulkit Singhal
At first glance it seems like a simple localization issue as indicated by this: org.apache.uima.annotator.dict_annot.impl.DictionaryAnnotatorProcessException: EXCEPTION MESSAGE LOCALIZATION FAILED: java.util.MissingResourceException: Can't find bundle for base name

Re: basic solr cloud questions

2011-09-28 Thread Pulkit Singhal
@Darren: I feel that the question itself is misleading. Creating shards is meant to separate out the data ... not keep the exact same copy of it. I think the two node setup that was attempted by Sam mislead him and us into thinking that configuring two nodes which are to be named shard1 ...

Re: Why I can't take an full-import with entity name?

2011-09-28 Thread Pulkit Singhal
Can you monitor the DB side to see what results it returned for that query? 2011/8/30 于浩 yuhao.1...@gmail.com: I am using solr1.3,I updated solr index throgh solr delta import every two hours. but the delta import is database connection wasteful. So i want to use full-import with entity name

Re: SolrCloud: is there a programmatic way to create an ensemble

2011-09-28 Thread Pulkit Singhal
Did you find out about this? 2011/8/2 Yury Kats yuryk...@yahoo.com: I have multiple SolrCloud instances, each running its own Zookeeper (Solr launched with -DzkRun). I would like to create an ensemble out of them. I know about -DzkHost parameter, but can I achieve the same programmatically?

Re: DIH error when nested db datasource and file data source

2011-09-23 Thread Pulkit Singhal
Few thoughts: 1) If you place the script transformer method on the entity named x and then pass the ${topic_tree.topic_id} to that as an argument, then shouldn't you have everything you need to work with x's row? Even if you can't look up at the parent, all you needed to know was the topic_id and

ScriptTransformer question

2011-09-22 Thread Pulkit Singhal
Hello, I'm using DIH in the trunk version and I have placed breakpoints in the Solr code. I can see that the value for a row being fed into the ScriptTransformer instance is: {buybackPlans.buybackPlan.type=[PSP-PRP], buybackPlans.buybackPlan.name=[2-Year Buy Back Plan],

Best Practices for indexing nested XML in Solr via DIH

2011-09-21 Thread Pulkit Singhal
Hello Everyone, I was wondering what are the various best practices that everyone follows for indexing nested XML into Solr. Please don't feel limited by examples, feel free to share your own experiences. Given an xml structure such as the following: categoryPath category

Re: How to write core's name in log

2011-09-21 Thread Pulkit Singhal
Not sure if this is a good lead for you but when I run out-of-the-box multi-core example-DIH instance of Solr, I often see core name thrown about in the logs. Perhaps you can look there? On Thu, Sep 15, 2011 at 6:50 AM, Joan joan.monp...@gmail.com wrote: Hi, I have multiple core in Solr and I

Re: strange copied field problem

2011-09-21 Thread Pulkit Singhal
I am NOT claiming that making a copy of a copy field is wrong or leads to a race condition. I don't know that. BUT did you try to copy into the text field directly from the genre field? Instead of the genre_search field? Did that yield working queries? On Wed, Sep 21, 2011 at 12:16 PM, Tanner

Re: OOM errors and -XX:OnOutOfMemoryError flag not working on solr?

2011-09-21 Thread Pulkit Singhal
Usually any good piece of java code refrains from capturing Throwable so that Errors will bubble up unlike exceptions. Having said that, perhaps someone in the list can help, if you share which particular Solr version you are using where you suspect that the Error is being eaten up. On Fri, Sep

Re: Solr Indexing - Null Values in date field

2011-09-21 Thread Pulkit Singhal
Also you may use the script transformer to explicitly remove the field from the document if the field is null. I do this for all my sdouble and sdate fields ... its a bit manual and I would like to see Solr enhanced to simply skip stuff like this by having a flag for its DIH code but until then it

Debugging DIH by placing breakpoints

2011-09-21 Thread Pulkit Singhal
Hello, I was wondering where can I find the source code for DIH? I want to checkout the source and step-trhought it breakpoint by breakpoint to understand it better :) Thanks! - Pulkit

Re: Debugging DIH by placing breakpoints

2011-09-21 Thread Pulkit Singhal
Correct! With that additional info, plus http://wiki.apache.org/solr/HowToContribute (ant eclipse), plus a refreshed (close/open) eclipse project ... I'm all set. Thanks Again. On Wed, Sep 21, 2011 at 1:43 PM, Gora Mohanty g...@mimirtech.com wrote: On Thu, Sep 22, 2011 at 12:08 AM, Pulkit

Re: strange copied field problem

2011-09-21 Thread Pulkit Singhal
, that fixed it. Thanks. On Wed, Sep 21, 2011 at 11:01 AM, Tanner Postert tanner.post...@gmail.comwrote: i believe that was the original configuration, but I can switch it back and see if that yields any results. On Wed, Sep 21, 2011 at 10:54 AM, Pulkit Singhal pulkitsing...@gmail.comwrote: I

Troubleshooting OOM in DIH w/ FileListEntityProcessor and XPathEntityProcessor

2011-09-20 Thread Pulkit Singhal
Hello Everyone, I need help in: (a) figuring out the causes of OutOfMemoryError (OOM) when I run Data Import Handler (DIH), (b) finding workarounds and fixes to get rid of the OOM issue per cause. The stacktrace is at the very bottom to avoid having your eyes glaze over and to prevent you from

Re: How to set up the schema to avoid NumberFormatException

2011-09-20 Thread Pulkit Singhal
Hi Hoss, Thanks for the input! Something rather strange happened. I fixed my regex such that instead of returning just 1,000 ... it would return 1,000.00 and voila it worked! So Parsing group separators is already supported apparently then ... its just that the format is also looking for a

How to skip fields when using DIH?

2011-09-20 Thread Pulkit Singhal
The data I'm running through the DIH looks like: products product newfalse/new activefalse/active regularPrice349.99/regularPrice salesRankShortTerm/ /product /products As you can see, in this particular instance of a product, there is no value for salesRankShortTerm which

Re: How to skip fields when using DIH?

2011-09-20 Thread Pulkit Singhal
OMG, I'm so sorry, please ignore. Its so simple, just had to use: row.remove( 'salesRankShortTerm' ); because the script runs at the end after the entire entity has been processed (I suppose) rather than per field. Thanks! On Tue, Sep 20, 2011 at 5:42 PM, Pulkit Singhal pulkitsing...@gmail.com

JSON indexing failing...

2011-09-19 Thread Pulkit Singhal
Hello, I am running a simple test after reading: http://wiki.apache.org/solr/UpdateJSON I am only using one object from a large json file to test and see if the indexing works: curl 'http://localhost:8983/solr/update/json?commit=true' --data-binary @productSample.json -H

How does Solr deal with JSON data?

2011-09-19 Thread Pulkit Singhal
Hello Everyone, I'm quite curious about how does the following data get understood and indexed by Solr? [{ id:Fubar, url: null, regularPrice: 3.99, offers: [ { url: , text: On Sale, id: OS } ] }] 1) The field id is present as part of the main object and as part of a

Re: JSON indexing failing...

2011-09-19 Thread Pulkit Singhal
, Pulkit Singhal pulkitsing...@gmail.com wrote: Hello, I am running a simple test after reading: http://wiki.apache.org/solr/UpdateJSON I am only using one object from a large json file to test and see if the indexing works: curl 'http://localhost:8983/solr/update/json?commit=true' --data

Re: JSON and DataImportHandler

2011-09-18 Thread Pulkit Singhal
Any updates on this topic? On Fri, Jul 16, 2010 at 5:36 PM, P Williams williams.tricia.l...@gmail.com wrote: Hi All,    Has anyone gotten the DataImportHandler to work with json as input?  Is there an even easier alternative to DIH?  Could you show me an example? Many thanks, Tricia

Re: JSON and DataImportHandler

2011-09-18 Thread Pulkit Singhal
Ah I see now: http://wiki.apache.org/solr/UpdateJSON#Example Not part of DIH that's all. On Sun, Sep 18, 2011 at 5:42 PM, Pulkit Singhal pulkitsing...@gmail.com wrote: Any updates on this topic? On Fri, Jul 16, 2010 at 5:36 PM, P Williams williams.tricia.l...@gmail.com wrote: Hi All

Miscellaneous DIH related questions

2011-09-17 Thread Pulkit Singhal
My DIH's full-import logs end with a tailing output saying that 1500 documents were added, which is correct because I have 16 sources and one of them was down and each source is supposed to give me 100 results: (1500 adds)],optimize=} 0 0 But When I check my document count I get only 1384

Re: Generating large datasets for Solr proof-of-concept

2011-09-17 Thread Pulkit Singhal
Thanks Hoss. I agree that the way you restated the question is better for getting results. BTW I think you've tipped me off to exactly what I needed with this URL: http://bbyopen.com/ Thanks! - Pulkit On Fri, Sep 16, 2011 at 4:35 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : Has anyone

How to set up the schema to avoid NumberFormatException

2011-09-16 Thread Pulkit Singhal
Hello Folks, Surprisingly, the value from the following raw data gives me a NFE (Number Format Exception) when running the DIH (Data Import Handler): span class=tgProductPrice$1,000.00/span The error logs look like: Caused by: org.apache.solr.common.SolrException: Error while creating field

[DIH] How to use combine Regex and HTML transformers

2011-09-15 Thread Pulkit Singhal
Hello, I need to pull out the price and imageURL for products in an Amazon RSS feed. PROBLEM STATEMENT: The following: field column=description xpath=/rss/channel/item/description / field column=price

Generating large datasets for Solr proof-of-concept

2011-09-15 Thread Pulkit Singhal
Hello Everyone, I have a goal of populating Solr with a million unique products in order to create a test environment for a proof of concept. I started out by using DIH with Amazon RSS feeds but I've quickly realized that there's no way I can glean a million products from one RSS feed. And I'd go

Re: Generating large datasets for Solr proof-of-concept

2011-09-15 Thread Pulkit Singhal
Ah missing } doh! BTW I still welcome any ideas on how to build an e-commerce test base. It doesn't have to be amazon that was jsut my approach, any one? - Pulkit On Thu, Sep 15, 2011 at 8:52 PM, Pulkit Singhal pulkitsing...@gmail.com wrote: Thanks for all the feedback thus far. Now to get

Re: Generating large datasets for Solr proof-of-concept

2011-09-15 Thread Pulkit Singhal
Thanks for all the feedback thus far. Now to get little technical about it :) I was thinking of feeding a file with all the tags of amazon that yield close to roughly 5 results each into a file and then running my rss DIH off of that, I came up with the following config but something is

RegexTransformer - need help with regex value

2011-09-14 Thread Pulkit Singhal
Hello, Feel free to point me to alternate sources of information if you deem this question unworthy of the Solr list :) But until then please hear me out! When my config is something like: field column=imageUrl regex=.*img src=.(.*)\.gif..alt=.*

Re: RegexTransformer - need help with regex value

2011-09-14 Thread Pulkit Singhal
;.* sourceColName=description / Cheers, - Pulkit On Wed, Sep 14, 2011 at 2:24 PM, Pulkit Singhal pulkitsing...@gmail.com wrote: Hello, Feel free to point me to alternate sources of information if you deem this question unworthy of the Solr list :) But until then please

Re: How to combine RSS w/ Tika when using Data Import Handler (DIH)

2011-09-13 Thread Pulkit Singhal
: Is there another option for navigating the HTML DOM tree using some well-tested transformer or TIka or something? Thanks! - Pulkit On Mon, Sep 12, 2011 at 1:45 PM, Pulkit Singhal pulkitsing...@gmail.comwrote: Given an RSS raw feed source link such as the following: http://persistent.info/cgi-bin

Re: DIH load only selected documents with XPathEntityProcessor

2011-09-13 Thread Pulkit Singhal
This solution doesn't seem to be working for me. I am using Solr trunk and I have the same question as Bernd with a small twist: the field that should NOT be empty, happens to be a derived field called price, see the config below: entity ...

Re: DIH load only selected documents with XPathEntityProcessor

2011-09-13 Thread Pulkit Singhal
Oh and Im sure that I'm using Java 6 because the properties from the Solr webpage spit out: java.runtime.version = 1.6.0_26-b03-384-10M3425 On Tue, Sep 13, 2011 at 4:15 PM, Pulkit Singhal pulkitsing...@gmail.comwrote: This solution doesn't seem to be working for me. I am using Solr trunk

DIH skipping imports with skipDoc vs skipDoc

2011-09-13 Thread Pulkit Singhal
Hello, 1) The documented explanation of skipDoc and skipRow is not enough for me to discern the difference between them: $skipDoc : Skip the current document . Do not add it to Solr. The value can be String true/false $skipRow : Skip the current row. The document will be added with rows from

How to combine RSS w/ Tika when using Data Import Handler (DIH)

2011-09-12 Thread Pulkit Singhal
Given an RSS raw feed source link such as the following: http://persistent.info/cgi-bin/feed-proxy?url=http%3A%2F%2Fwww.amazon.com%2Frss%2Ftag%2Fblu-ray%2Fnew%2Fref%3Dtag_rsh_hl_ersn I can easily get to the value of the description for an item like so: field column=description

Re: Parameter not working for master/slave

2011-09-12 Thread Pulkit Singhal
Hello Bill, I can't really answer your question about replicaiton being supported on Solr3.3 (I use trunk 4.x myself) BUT I can tell you that if each Solr node has just one core ... only then does it make sense to use -Denable.master=true and -Denable.slave=true ... otherwise, as Yury points out,

Re: Re; DIH Scheduling

2011-09-12 Thread Pulkit Singhal
I don't see anywhere in: http://issues.apache.org/jira/browse/SOLR-2305 any statement that shows the code's inclusion was decided against when did this happen and what is needed from the community before someone with the powers to do so will actually commit this? 2011/6/24 Noble Paul നോബിള്‍

Re: Example Solr Config on EC2

2011-09-11 Thread Pulkit Singhal
Just to clarify, that link doesn't do anything to promote an already running slave into a master. One would have to bounce the Solr node which has that slave and then make the shift. It is not something that happens at runtime live. On Wed, Aug 10, 2011 at 4:04 PM, Akshay akm...@gmail.com wrote:

Re: Solr Cloud - is replication really a feature on the trunk?

2011-09-10 Thread Pulkit Singhal
to work perfectly well with this setup. 2011/9/9 Yury Kats yuryk...@yahoo.com: On 9/9/2011 6:54 PM, Pulkit Singhal wrote: Thanks Again. Another question: My solr.xml has:   cores adminPath=/admin/cores defaultCoreName=master1     core name=master1 instanceDir=. shard=shard1 collection

Re: Solr Cloud - is replication really a feature on the trunk?

2011-09-10 Thread Pulkit Singhal
straight on this. Also it would be nice if I knew the code well enough to just look @ it and give an authoritative answer. Does anyone have that kind of expertise? Reverse-engineering is getting a bit mundane. Thanks! - Pulkit On Sat, Sep 10, 2011 at 11:43 AM, Pulkit Singhal pulkitsing...@gmail.com

Re: Solr Cloud - is replication really a feature on the trunk?

2011-09-10 Thread Pulkit Singhal
straight on this. Also it would be nice if I knew the code well enough to just look @ it and give an authoritative answer. Does anyone have that kind of expertise? Reverse-engineering is getting a bit mundane. Thanks! - Pulkit On Sat, Sep 10, 2011 at 11:43 AM, Pulkit Singhal pulkitsing

Re: Replication setup with SolrCloud/Zk

2011-09-10 Thread Pulkit Singhal
Hi Yury, How do you manage to start the instances without any issues? The way I see it, no matter which instance is started first, the slave will complain about not being to find its respective master because that instance hasn't been started yet ... no? Thanks, - Pulkit 2011/5/17 Yury Kats

Re: Replication setup with SolrCloud/Zk

2011-09-10 Thread Pulkit Singhal
with this thread's help here: http://pulkitsinghal.blogspot.com/2011/09/multicore-master-slave-replication-in.html Thanks, - Pulkit On Sat, Sep 10, 2011 at 2:54 PM, Pulkit Singhal pulkitsing...@gmail.comwrote: Hi Yury, How do you manage to start the instances without any issues? The way I see

Re: Solr Cloud - is replication really a feature on the trunk?

2011-09-09 Thread Pulkit Singhal
, Sep 7, 2011 at 8:34 PM, Yury Kats yuryk...@yahoo.com wrote: On 9/7/2011 3:18 PM, Pulkit Singhal wrote: Hello, I'm working off the trunk and the following wiki link: http://wiki.apache.org/solr/SolrCloud The wiki link has a section that seeks to quickly familiarize a user with replication

Re: SolrCloud Feedback

2011-09-09 Thread Pulkit Singhal
Hello Jan, You've made a very good point in (b). I would be happy to make the edit to the wiki if I understood your explanation completely. When you say that it is looking up what collection that core is part of ... I'm curious how a core is being put under a particular collection in the first

Re: SolrCloud Feedback

2011-09-09 Thread Pulkit Singhal
it in with the collection name (myconf) without any need to specify anything at startup via -D or statically in solr.xml file. Validate away otherwise I'll just accept any hate mail after making edits to the Solr wiki directly. - Pulkit On Fri, Sep 9, 2011 at 11:38 AM, Pulkit Singhal pulkitsing

Re: Solr Cloud - is replication really a feature on the trunk?

2011-09-09 Thread Pulkit Singhal
? - Pulkit 2011/9/9 Yury Kats yuryk...@yahoo.com: On 9/9/2011 10:52 AM, Pulkit Singhal wrote: Thank You Yury. After looking at your thread, there's something I must clarify: Is solr.xml not uploaded and held in ZooKeeper? Not as far as I understand. Cores are loaded/created by the local Solr

Re: Solr Cloud - is replication really a feature on the trunk?

2011-09-09 Thread Pulkit Singhal
-mac.local:8983_solr_ (v=0) node_name=tiklup-mac.local:8983_solr url=http://tiklup-mac.local:8983/solr/; Thanks! - Pulkit On Fri, Sep 9, 2011 at 5:54 PM, Pulkit Singhal pulkitsing...@gmail.com wrote: Thanks Again. Another question: My solr.xml has:  cores adminPath=/admin/cores

Solr Cloud - is replication really a feature on the trunk?

2011-09-07 Thread Pulkit Singhal
Hello, I'm working off the trunk and the following wiki link: http://wiki.apache.org/solr/SolrCloud The wiki link has a section that seeks to quickly familiarize a user with replication in SolrCloud - Example B: Simple two shard cluster with shard replicas But after going through it, I have to

Re: Run Solr within my war

2010-02-19 Thread Pulkit Singhal
is also the application war for the program that will communicate as the client with the Solr server. On Thu, Feb 18, 2010 at 5:49 PM, Richard Frovarp rfrov...@apache.org wrote: On 2/18/2010 4:22 PM, Pulkit Singhal wrote: Hello Everyone, I do NOT want to host Solr separately. I want to run it within

Re: @Field annotation support

2010-02-19 Thread Pulkit Singhal
On Thu, Feb 18, 2010 at 10:52 PM, Pulkit Singhal pulkitsing...@gmail.com wrote: Hello All, When I use Maven or Eclipse to try and compile my bean which has the @Field annotation as specified in http://wiki.apache.org/solr/Solrj page ... the compiler doesn't find any class to support

Schema error unknown field

2010-02-18 Thread Pulkit Singhal
I'm getting the following exception SEVERE: org.apache.solr.common.SolrException: ERROR:unknown field 'desc' I'm wondering what I need to do in order to add the desc field to the Solr schema for indexing?

@Field annotation support

2010-02-18 Thread Pulkit Singhal
Hello All, When I use Maven or Eclipse to try and compile my bean which has the @Field annotation as specified in http://wiki.apache.org/solr/Solrj page ... the compiler doesn't find any class to support the annotation. What jar should we use to bring in this custom Solr annotation?

Re: Schema error unknown field

2010-02-18 Thread Pulkit Singhal
: Add desc as a field in your schema.xml file would be my first guess. Providing some explanation of what you're trying to do would help diagnose your issues. HTH Erick On Thu, Feb 18, 2010 at 12:21 PM, Pulkit Singhal pulkitsing...@gmail.comwrote: I'm getting the following exception

Run Solr within my war

2010-02-18 Thread Pulkit Singhal
Hello Everyone, I do NOT want to host Solr separately. I want to run it within my war with the Java Application which is using it. How easy/difficult is that to setup? Can anyone with past experience on this topic, please comment. thanks, - Pulkit

Re: Run Solr within my war

2010-02-18 Thread Pulkit Singhal
, at 22:23, Pulkit Singhal pulkitsing...@gmail.com wrote: Hello Everyone, I do NOT want to host Solr separately. I want to run it within my war with the Java Application which is using it. How easy/difficult is that to setup? Can anyone with past experience on this topic, please comment. thanks