Mark,

Nothing appears to be wrong in the logs. I wiped the indexes and imported
37 files from SharePoint using Manifold. All 37 make it in, but SOLR still
has issues with the results being inconsistent.

Let me run my setup by you, and see whether that is the issue?

On one machine, I have three zookeeper instances, four solr instances, and
a data directory for solr and zookeeper config data.

Step 1. I modified each zoo.xml configuration file to have:

Zookeeper 1 - Create /zookeeper1/conf/zoo.cfg
================
tickTime=2000
initLimit=10
syncLimit=5
dataDir=[DATA_DIRECTORY]/zk1_data
clientPort=2181
server.1=localhost:2888:3888
server.2=localhost:2889:3889
server.3=localhost:2890:3890

Zookeeper 1 - Create /[DATA_DIRECTORY]/zk1_data/myid with the following
contents:
==============================================================
1

Zookeep 2 - Create /zookeeper2/conf/zoo.cfg
==============
tickTime=2000
initLimit=10
syncLimit=5
dataDir=[DATA_DIRECTORY]/zk2_data
clientPort=2182
server.1=localhost:2888:3888
server.2=localhost:2889:3889
server.3=localhost:2890:3890

Zookeeper 2 - Create /[DATA_DIRECTORY]/zk2_data/myid with the following
contents:
==============================================================
2

Zookeeper 3 - Create /zookeeper3/conf/zoo.cfg
================
tickTime=2000
initLimit=10
syncLimit=5
dataDir=[DATA_DIRECTORY]/zk3_data
clientPort=2183
server.1=localhost:2888:3888
server.2=localhost:2889:3889
server.3=localhost:2890:3890

Zookeeper 3 - Create /[DATA_DIRECTORY]/zk3_data/myid with the following
contents:
====================================================
3

Step 2 - SOLR Build
===============

I pulled the latest SOLR trunk down. I built it with the following commands:

           ant example dist

I modified the solr.war files and added the solr cell and extraction
libraries to WEB-INF/lib. I couldn't get the extraction to work
any other way. Will zookeper pickup jar files stored with the rest of the
configuration files in Zookeeper?

I copied the contents of the example directory to each of my SOLR
directories.

Step 3 - Starting Zookeeper instances
===========================

I ran the following commands to start the zookeeper instances:

start .\zookeeper1\bin\zkServer.cmd
start .\zookeeper2\bin\zkServer.cmd
start .\zookeeper3\bin\zkServer.cmd

Step 4 - Start Main SOLR instance
==========================
I ran the following command to start the main SOLR instance

java -Djetty.port=8081 -Dhostport=8081
-Dbootstrap_configdir=[DATA_DIRECTORY]/solr/conf -Dnumshards=2
-Dzkhost=localhost:2181,localhost:2182,localhost:2183 -jar start.jar

Starts up fine.

Step 5 - Start the Remaining 3 SOLR Instances
==================================
I ran the following commands to start the other 3 instances from their home
directories:

java -Djetty.port=8082 -Dhostport=8082
-Dzkhost=localhost:2181,localhost:2182,localhost:2183 -jar start.jar

java -Djetty.port=8083 -Dhostport=8083
-Dzkhost=localhost:2181,localhost:2182,localhost:2183 -jar start.jar

java -Djetty.port=8084 -Dhostport=8084
-Dzkhost=localhost:2181,localhost:2182,localhost:2183 -jar start.jar

All startup without issue.

Step 6 - Modified solrconfig.xml to have a custom request handler
===============================================

<requestHandler name="/update/sharepoint" startup="lazy"
class="solr.extraction.ExtractingRequestHandler">
  <lst name="defaults">
     <str name="update.chain">sharepoint-pipeline</str>
     <str name="fmap.content">text</str>
     <str name="lowernames">true</str>
     <str name="uprefix">ignored</str>
     <str name="caputreAttr">true</str>
     <str name="fmap.a">links</str>
     <str name="fmap.div">ignored</str>
  </lst>
</requestHandler>

<updateRequestProcessorChain name="sharepoint-pipeline">
   <processor class="solr.processor.SignatureUpdateProcessorFactory">
      <bool name="enabled">true</bool>
      <str name="signatureField">id</str>
      <bool name="owerrightDupes">true</bool>
      <str name="fields">url</str>
      <str name="signatureClass">solr.processor.Lookup3Signature</str>
   </processor>
   <processor class="solr.LogUpdateProcessorFactory"/>
   <processor class="solr.RunUpdateProcessorFactory"/>
</updateRequestProcessorChain>


Hopefully this will shed some light on why my configuration is having
issues.

Thanks for your help.

Matt



On Tue, Feb 28, 2012 at 8:29 PM, Mark Miller <markrmil...@gmail.com> wrote:

> Hmm...this is very strange - there is nothing interesting in any of the
> logs?
>
> In clusterstate.json, all of the shards have an active state?
>
>
> There are quite a few of us doing exactly this setup recently, so there
> must be something we are missing here...
>
> Any info you can offer might help.
>
> - Mark
>
> On Feb 28, 2012, at 1:00 PM, Matthew Parker wrote:
>
> > Mark,
> >
> > I got the codebase from the 2/26/2012, and I got the same inconsistent
> > results.
> >
> > I have solr running on four ports 8081-8084
> >
> > 8081 and 8082 are the leaders for shard 1, and shard 2, respectively
> >
> > 8083 - is assigned to shard 1
> > 8084 - is assigned to shard 2
> >
> > queries come in and sometime it seems the windows from 8081 and 8083 move
> > responding to the query but there are no results.
> >
> > if the queries run on 8081/8082 or 8081/8084 then results come back ok.
> >
> > The query is nothing more than: q=*:*
> >
> > Regards,
> >
> > Matt
> >
> >
> > On Mon, Feb 27, 2012 at 9:26 PM, Matthew Parker <
> > mpar...@apogeeintegration.com> wrote:
> >
> >> I'll have to check on the commit situation. We have been pushing data
> from
> >> SharePoint the last week or so. Would that somehow block the documents
> >> moving between the solr instances?
> >>
> >> I'll try another version tomorrow. Thanks for the suggestions.
> >>
> >> On Mon, Feb 27, 2012 at 5:34 PM, Mark Miller <markrmil...@gmail.com
> >wrote:
> >>
> >>> Hmmm...all of that looks pretty normal...
> >>>
> >>> Did a commit somehow fail on the other machine? When you view the stats
> >>> for the update handler, are there a lot of pending adds for on of the
> >>> nodes? Do the commit counts match across nodes?
> >>>
> >>> You can also query an individual node with distrib=false to check that.
> >>>
> >>> If you build is a month old, I'd honestly recommend you try upgrading
> as
> >>> well.
> >>>
> >>> - Mark
> >>>
> >>> On Feb 27, 2012, at 3:34 PM, Matthew Parker wrote:
> >>>
> >>>> Here is most of the cluster state:
> >>>>
> >>>> Connected to Zookeeper
> >>>> localhost:2181, localhost: 2182, localhost:2183
> >>>>
> >>>> /(v=0 children=7) ""
> >>>>  /CONFIGS(v=0, children=1)
> >>>>     /CONFIGURATION(v=0 children=25)
> >>>>            <<<<< all the configuration files, velocity info, xslt,
> etc.
> >>>>>>>>
> >>>> /NODE_STATES(v=0 children=4)
> >>>>    MACHINE1:8083_SOLR (v=121)"[{"shard_id":"shard1",
> >>>> "state":"active","core":"","collection":"collection1","node_name:"..."
> >>>>    MACHINE1:8082_SOLR (v=101)"[{"shard_id":"shard2",
> >>>> "state":"active","core":"","collection":"collection1","node_name:"..."
> >>>>    MACHINE1:8081_SOLR (v=92)"[{"shard_id":"shard1",
> >>>> "state":"active","core":"","collection":"collection1","node_name:"..."
> >>>>    MACHINE1:8084_SOLR (v=73)"[{"shard_id":"shard2",
> >>>> "state":"active","core":"","collection":"collection1","node_name:"..."
> >>>> /ZOOKEEPER (v-0 children=1)
> >>>>    QUOTA(v=0)
> >>>>
> >>>>
> >>>
> /CLUSTERSTATE.JSON(V=272)"{"collection1":{"shard1":{MACHINE1:8081_solr_":{shard_id":"shard1","leader":"true","..."
> >>>> /LIVE_NODES (v=0 children=4)
> >>>>    MACHINE1:8083_SOLR(ephemeral v=0)
> >>>>    MACHINE1:8082_SOLR(ephemeral v=0)
> >>>>    MACHINE1:8081_SOLR(ephemeral v=0)
> >>>>    MACHINE1:8084_SOLR(ephemeral v=0)
> >>>> /COLLECTIONS (v=1 children=1)
> >>>>    COLLECTION1(v=0 children=2)"{"configName":"configuration1"}"
> >>>>        LEADER_ELECT(v=0 children=2)
> >>>>            SHARD1(V=0 children=1)
> >>>>                ELECTION(v=0 children=2)
> >>>>
> >>>> 87186203314552835-MACHINE1:8081_SOLR_-N_0000000096(ephemeral v=0)
> >>>>
> >>>> 87186203314552836-MACHINE1:8083_SOLR_-N_0000000084(ephemeral v=0)
> >>>>            SHARD2(v=0 children=1)
> >>>>                ELECTION(v=0 children=2)
> >>>>
> >>>> 231301391392833539-MACHINE1:8084_SOLR_-N_0000000085(ephemeral v=0)
> >>>>
> >>>> 159243797356740611-MACHINE1:8082_SOLR_-N_0000000084(ephemeral v=0)
> >>>>        LEADERS (v=0 children=2)
> >>>>            SHARD1 (ephemeral
> >>>> v=0)"{"core":"","node_name":"MACHINE1:8081_solr","base_url":"
> >>>> http://MACHINE1:8081/solr"}";
> >>>>            SHARD2 (ephemeral
> >>>> v=0)"{"core":"","node_name":"MACHINE1:8082_solr","base_url":"
> >>>> http://MACHINE1:8082/solr"}";
> >>>> /OVERSEER_ELECT (v=0 children=2)
> >>>>    ELECTION (v=0 children=4)
> >>>>        231301391392833539-MACHINE1:8084_SOLR_-N_0000000251(ephemeral
> >>> v=0)
> >>>>        87186203314552835-MACHINE1:8081_SOLR_-N_0000000248(ephemeral
> >>> v=0)
> >>>>        159243797356740611-MACHINE1:8082_SOLR_-N_0000000250(ephemeral
> >>> v=0)
> >>>>        87186203314552836-MACHINE1:8083_SOLR_-N_0000000249(ephemeral
> >>> v=0)
> >>>>    LEADER (emphemeral
> >>>> v=0)"{"id":"87186203314552835-MACHINE1:8081_solr-n_000000248"}"
> >>>>
> >>>>
> >>>>
> >>>> On Mon, Feb 27, 2012 at 2:47 PM, Mark Miller <markrmil...@gmail.com>
> >>> wrote:
> >>>>
> >>>>>
> >>>>> On Feb 27, 2012, at 2:22 PM, Matthew Parker wrote:
> >>>>>
> >>>>>> Thanks for your reply Mark.
> >>>>>>
> >>>>>> I believe the build was towards the begining of the month. The
> >>>>>> solr.spec.version is 4.0.0.2012.01.10.38.09
> >>>>>>
> >>>>>> I cannot access the clusterstate.json contents. I clicked on it a
> >>> couple
> >>>>> of
> >>>>>> times, but nothing happens. Is that stored on disk somewhere?
> >>>>>
> >>>>> Are you using the new admin UI? That has recently been updated to
> work
> >>>>> better with cloud - it had some troubles not too long ago. If you
> are,
> >>> you
> >>>>> should trying using the old admin UI's zookeeper page - that should
> >>> show
> >>>>> the cluster state.
> >>>>>
> >>>>> That being said, there has been a lot of bug fixes over the past
> month
> >>> -
> >>>>> so you may just want to update to a recent version.
> >>>>>
> >>>>>>
> >>>>>> I configured a custom request handler to calculate an unique
> document
> >>> id
> >>>>>> based on the file's url.
> >>>>>>
> >>>>>> On Mon, Feb 27, 2012 at 1:13 PM, Mark Miller <markrmil...@gmail.com
> >
> >>>>> wrote:
> >>>>>>
> >>>>>>> Hey Matt - is your build recent?
> >>>>>>>
> >>>>>>> Can you visit the cloud/zookeeper page in the admin and send the
> >>>>> contents
> >>>>>>> of the clusterstate.json node?
> >>>>>>>
> >>>>>>> Are you using a custom index chain or anything out of the ordinary?
> >>>>>>>
> >>>>>>>
> >>>>>>> - Mark
> >>>>>>>
> >>>>>>> On Feb 27, 2012, at 12:26 PM, Matthew Parker wrote:
> >>>>>>>
> >>>>>>>> TWIMC:
> >>>>>>>>
> >>>>>>>> Environment
> >>>>>>>> =========
> >>>>>>>> Apache SOLR rev-1236154
> >>>>>>>> Apache Zookeeper 3.3.4
> >>>>>>>> Windows 7
> >>>>>>>> JDK 1.6.0_23.b05
> >>>>>>>>
> >>>>>>>> I have built a SOLR Cloud instance with 4 nodes using the embeded
> >>> Jetty
> >>>>>>>> servers.
> >>>>>>>>
> >>>>>>>> I created a 3 node zookeeper ensemble to manage the solr
> >>> configuration
> >>>>>>> data.
> >>>>>>>>
> >>>>>>>> All the instances run on one server so I've had to move ports
> around
> >>>>> for
> >>>>>>>> the various applications.
> >>>>>>>>
> >>>>>>>> I start the 3 zookeeper nodes.
> >>>>>>>>
> >>>>>>>> I started the first instance of solr cloud with the parameter to
> >>> have
> >>>>> two
> >>>>>>>> shards.
> >>>>>>>>
> >>>>>>>> The start the remaining 3 solr nodes.
> >>>>>>>>
> >>>>>>>> The system comes up fine. No errors thrown.
> >>>>>>>>
> >>>>>>>> I can view the solr cloud console and I can see the SOLR
> >>> configuration
> >>>>>>>> files managed by ZooKeeper.
> >>>>>>>>
> >>>>>>>> I published data into the SOLR Cloud instances from SharePoint
> using
> >>>>>>> Apache
> >>>>>>>> Manifold 0.4-incubating. Manifold is setup to publish the data
> into
> >>>>>>>> collection1, which is the only collection defined in the cluster.
> >>>>>>>>
> >>>>>>>> When I query the data from collection1 as per the solr wiki, the
> >>>>> results
> >>>>>>>> are inconsistent. Sometimes all the results are there, other times
> >>>>>>> nothing
> >>>>>>>> comes back at all.
> >>>>>>>>
> >>>>>>>> It seems to be having an issue auto replicating the data across
> the
> >>>>>>> cloud.
> >>>>>>>>
> >>>>>>>> Is there some specific setting I might have missed? Based upon
> what
> >>> I
> >>>>>>> read,
> >>>>>>>> I thought that SOLR cloud would take care of distributing and
> >>>>> replicating
> >>>>>>>> the data automatically. Do you have to tell it what shard to
> publish
> >>>>> the
> >>>>>>>> data into as well?
> >>>>>>>>
> >>>>>>>> Any help would be appreciated.
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>>
> >>>>>>>> Matt
> >>>>>>>>
> >>>>>>>> ------------------------------
> >>>>>>>> This e-mail and any files transmitted with it may be proprietary.
> >>>>>>> Please note that any views or opinions presented in this e-mail are
> >>>>> solely
> >>>>>>> those of the author and do not necessarily represent those of
> Apogee
> >>>>>>> Integration.
> >>>>>>>
> >>>>>>> - Mark Miller
> >>>>>>> lucidimagination.com
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>> Matt
> >>>>>>
> >>>>>> ------------------------------
> >>>>>> This e-mail and any files transmitted with it may be proprietary.
> >>>>> Please note that any views or opinions presented in this e-mail are
> >>> solely
> >>>>> those of the author and do not necessarily represent those of Apogee
> >>>>> Integration.
> >>>>>
> >>>>> - Mark Miller
> >>>>> lucidimagination.com
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>>> ------------------------------
> >>>> This e-mail and any files transmitted with it may be proprietary.
> >>> Please note that any views or opinions presented in this e-mail are
> solely
> >>> those of the author and do not necessarily represent those of Apogee
> >>> Integration.
> >>>
> >>> - Mark Miller
> >>> lucidimagination.com
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>
> >
> > ------------------------------
> > This e-mail and any files transmitted with it may be proprietary.
>  Please note that any views or opinions presented in this e-mail are solely
> those of the author and do not necessarily represent those of Apogee
> Integration.
>
> - Mark Miller
> lucidimagination.com
>
>
>
>
>
>
>
>
>
>
>
>

------------------------------
This e-mail and any files transmitted with it may be proprietary.  Please note 
that any views or opinions presented in this e-mail are solely those of the 
author and do not necessarily represent those of Apogee Integration.

Reply via email to