Really good point on the ids, I completely overlooked that matter.
I will give it a try.
Thanks again.

On Thu, Feb 16, 2012 at 5:00 PM, Dmitry Kan <dmitry....@gmail.com> wrote:

> Each document in SOLR will correspond to one db record and since both
> databases have the same schema, you can't index two records from two
> databases into the same SOLR document.
>
> So after indexing, you should have 7k different documents, each of which
> holds data from a db record.
>
> Also one problem I see here is that since the record id in each table is
> unique only within the table and (most probably) not globally, there will
> be collisions. To aviod this, I would prepend a record_id with some static
> value, like: concat("t1",  CONVERT(id, CHAR(8))).
>
> Dmitry
>
> On Thu, Feb 16, 2012 at 4:47 PM, Radu Toev <radut...@gmail.com> wrote:
>
> > I'm not sure I follow.
> > The idea is to have only one document. Do the multiple documents have the
> > same structure then(different datasources), and if so how are they
> actually
> > indexed?
> >
> > Thanks.
> >
> > On Thu, Feb 16, 2012 at 4:40 PM, Dmitry Kan <dmitry....@gmail.com>
> wrote:
> >
> > > I think the problem here is that initially you trying to create
> separate
> > > documents for two different tables, while your config is aiming to
> create
> > > only one document. Here there is one solution (not tried by me):
> > >
> > > ------
> > > You can have multiple documents generated by the same data-config:
> > >
> > > <dataConfig>
> > >  <dataSource name="ds1" .../>
> > >  <dataSource name="ds2" .../>
> > >  <dataSource name="ds3" .../>
> > >  <document>
> > >   <entity blah blah rootEntity="false">
> > >       <entity blah blah this is a document>
> > >          <entity sets unique id/>
> > >       </document>
> > >       <document blah blah this is another document>
> > >          <entity sets unique id>
> > >       </document>
> > >  </document>
> > > </dataConfig>
> > >
> > > It's the 'rootEntity="false" that makes the child entity a document.
> > > ------
> > >
> > > Dmitry
> > >
> > > On Thu, Feb 16, 2012 at 2:37 PM, Radu Toev <radut...@gmail.com> wrote:
> > >
> > > > <dataConfig>
> > > >  <dataSource
> > > >     name="s"
> > > >     driver="com.microsoft.sqlserver.jdbc.SQLServerDriver"
> > > >     url=""
> > > >     user=""
> > > >     password=""/>
> > > >  <dataSource
> > > >     name="p"
> > > >  driver="com.microsoft.sqlserver.jdbc.SQLServerDriver"
> > > >     url=""
> > > >     user=""
> > > >     password=""/>
> > > >  <document>
> > > >    <entity name="ms"
> > > >        datasource="s"
> > > > query="SELECT m.id as id, m.serial as m_machine_serial, m.ivk as
> > > > m_machine_ivk, m.sitename as m_sitename, m.deliveryDate as
> > > m_delivery_date,
> > > > m.hotsite as m_hotsite, m.guardian as m_guardian, m.warranty as
> > > m_warranty,
> > > > m.contract as m_contract,
> > > >   st.name as m_st_name, pm.name as m_pm_name, p.name as m_p_name,
> > > > sv.shortName as m_sv_name, c.clusterMajor as m_c_cluster_major,
> > > > c.clusterMinor as m_c_cluster_minor, c.country as m_c_country, c.code
> > as
> > > > m_c_code
> > > >   FROM Machine AS m
> > > >   LEFT JOIN SystemType AS st ON m.fk_systemType=st.id
> > > >   LEFT JOIN ProductModel AS pm ON fk_productModel = pm.id
> > > >   LEFT JOIN Platform AS p ON m.fk_platform = p.id
> > > >   LEFT JOIN SoftwareVersion AS sv ON fk_softwareVersion = sv.id
> > > >   LEFT JOIN Country AS c ON fk_country = c.id"
> > > > readOnly="true"
> > > > transformer="DateFormatTransformer">
> > > > <field column="id" />
> > > > <field column="m_machine_serial"/>
> > > > <field column="m_machine_ivk"/>
> > > > <field column="m_sitename"/>
> > > > <filed column="m_delivery_date" dateTimeFormat="yyyy-MM-dd"/>
> > > > <field column="m_hotsite"/>
> > > > <field column="m_guardian"/>
> > > > <field column="m_warranty"/>
> > > > <field column="m_contract"/>
> > > > <field column="m_st_name"/>
> > > > <field column="m_pm_name"/>
> > > > <field column="m_p_name"/>
> > > > <field column="m_sv_name"/>
> > > > <field column="m_c_cluster_major"/>
> > > > <field column="m_c_cluster_minor"/>
> > > > <field column="m_c_country"/>
> > > > <field column="m_c_code"/>
> > > >   </entity>
> > > >
> > > >   <entity name="mp"
> > > >        datasource="p"
> > > > query="SELECT m.id as id, m.serial as m_machine_serial, m.ivk as
> > > > m_machine_ivk, m.sitename as m_sitename, m.deliveryDate as
> > > m_delivery_date,
> > > > m.hotsite as m_hotsite, m.guardian as m_guardian, m.warranty as
> > > m_warranty,
> > > > m.contract as m_contract,
> > > >   st.name as m_st_name, pm.name as m_pm_name, p.name as m_p_name,
> > > > sv.shortName as m_sv_name, c.clusterMajor as m_c_cluster_major,
> > > > c.clusterMinor as m_c_cluster_minor, c.country as m_c_country, c.code
> > as
> > > > m_c_code
> > > >   FROM Machine AS m
> > > >   LEFT JOIN SystemType AS st ON m.fk_systemType=st.id
> > > >   LEFT JOIN ProductModel AS pm ON fk_productModel = pm.id
> > > >   LEFT JOIN Platform AS p ON m.fk_platform = p.id
> > > >   LEFT JOIN SoftwareVersion AS sv ON fk_softwareVersion = sv.id
> > > >   LEFT JOIN Country AS c ON fk_country = c.id"
> > > > readOnly="true"
> > > > transformer="DateFormatTransformer">
> > > > <field column="id" />
> > > > <field column="m_machine_serial"/>
> > > > <field column="m_machine_ivk"/>
> > > > <field column="m_sitename"/>
> > > > <filed column="m_delivery_date" dateTimeFormat="yyyy-MM-dd"/>
> > > > <field column="m_hotsite"/>
> > > > <field column="m_guardian"/>
> > > > <field column="m_warranty"/>
> > > > <field column="m_contract"/>
> > > > <field column="m_st_name"/>
> > > > <field column="m_pm_name"/>
> > > > <field column="m_p_name"/>
> > > > <field column="m_sv_name"/>
> > > > <field column="m_c_cluster_major"/>
> > > > <field column="m_c_cluster_minor"/>
> > > > <field column="m_c_country"/>
> > > > <field column="m_c_code"/>
> > > >   </entity>
> > > >  </document>
> > > > </dataConfig>
> > > >
> > > > I've removed the connection params
> > > > The unique key is id.
> > > >
> > > > On Thu, Feb 16, 2012 at 2:27 PM, Dmitry Kan <dmitry....@gmail.com>
> > > wrote:
> > > >
> > > > > OK, maybe you can show the db-data-config.xml just in case?
> > > > > Also in schema.xml, does you <uniqueKey> correspond to the unique
> > field
> > > > in
> > > > > the db?
> > > > >
> > > > > On Thu, Feb 16, 2012 at 2:13 PM, Radu Toev <radut...@gmail.com>
> > wrote:
> > > > >
> > > > > > I tried running with just one datasource(the one that has 6k
> > entries)
> > > > and
> > > > > > it indexes them ok.
> > > > > > The same, if I do sepparately the 1k database. It indexes ok.
> > > > > >
> > > > > > On Thu, Feb 16, 2012 at 2:11 PM, Dmitry Kan <
> dmitry....@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > > It sounds a bit, as if SOLR stopped processing data once it
> > queried
> > > > all
> > > > > > > from the smaller dataset. That's why you have 2000. If you just
> > > have
> > > > a
> > > > > > > handler pointed to the bigger data set (6k), do you manage to
> get
> > > all
> > > > > 6k
> > > > > > db
> > > > > > > entries into solr?
> > > > > > >
> > > > > > > On Thu, Feb 16, 2012 at 1:46 PM, Radu Toev <radut...@gmail.com
> >
> > > > wrote:
> > > > > > >
> > > > > > > > 1. Nothing in the logs
> > > > > > > > 2. No.
> > > > > > > >
> > > > > > > > On Thu, Feb 16, 2012 at 12:44 PM, Dmitry Kan <
> > > dmitry....@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > 1. Do you see any errors / exceptions in the logs?
> > > > > > > > > 2. Could you have duplicates?
> > > > > > > > >
> > > > > > > > > On Thu, Feb 16, 2012 at 10:15 AM, Radu Toev <
> > > radut...@gmail.com>
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hello,
> > > > > > > > > >
> > > > > > > > > > I created a data-config.xml file where I define a
> > datasource
> > > > and
> > > > > an
> > > > > > > > > entity
> > > > > > > > > > with 12 fields.
> > > > > > > > > > In my use case I have 2 databases with the same schema,
> so
> > I
> > > > want
> > > > > > to
> > > > > > > > > > combine in one index the 2 databases.
> > > > > > > > > > I defined a second dataSource tag and duplicateed the
> > entity
> > > > with
> > > > > > its
> > > > > > > > > > field(changed the name and the datasource).
> > > > > > > > > > What I'm expecting is to get around 7k results(I have
> > around
> > > 6k
> > > > > in
> > > > > > > the
> > > > > > > > > > first db and 1k in the second). However I'm getting a
> total
> > > of
> > > > > 2k.
> > > > > > > > > > Where could be the problem?
> > > > > > > > > >
> > > > > > > > > > Thanks
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Regards,
> > > > > > > > >
> > > > > > > > > Dmitry Kan
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Regards,
> > > > > > >
> > > > > > > Dmitry Kan
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Regards,
> > > > >
> > > > > Dmitry Kan
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Regards,
> > >
> > > Dmitry Kan
> > >
> >
>
>
>
> --
> Regards,
>
> Dmitry Kan
>

Reply via email to