Hi All, I am new to Solr. I looking forward for Solr to index data that is partitioned into multiple databases and tables and have questions regarding dataconfig.xml. I have given the doubts at the end.
Lets say I have 3 mysql databases each with 3 tables. Db1 : Tbl1, Tbl2, Tbl3 Db2 : Tbl1, Tbl2, Tbl3 Db3 : Tbl1, Tbl2, Tbl3 All databases have the same number of tables with same table names as shown above. All tables have exactly the same structure as well. Each table has three fields: id, name, category Since the data is distributed this way, I don't have a way to search for a particular record using 'name'. I must look for it in all the 9 tables. This is not scalable when lets say I have 20 databases each with 20 tables, meaning 400 queries needed to find a single record. Solr seemed like the solution to help. I followed the wiki tutorials: http://wiki.apache.org/solr/DataImportHandler http://wiki.apache.org/solr/DIHQuickStart http://wiki.apache.org/solr/DataImportHandlerFaq The following are my config files so far: ================ solrconfig.xml ================ <requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler"> <lst name="defaults"> <str name="config">data-config.xml</str> </lst> </requestHandler> ================ dataconfig.xml (so far) ================ <dataConfig> <dataSource type="JdbcDataSource" name="ds1" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/Db1" user="user-name" password="password" /> <dataSource type="JdbcDataSource" name="ds2" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/Db2" user="user-name" password="password" /> <dataSource type="JdbcDataSource" name="ds3" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/Db3" user="user-name" password="password" /> <document> <entity name="record11" dataSource="ds1" query="select id,name,category from Tbl1"></entity> <entity name="record12" dataSource="ds1" query="select id,name,category from Tbl2"></entity> <entity name="record13" dataSource="ds1" query="select id,name,category from Tbl3"></entity> <entity name="record21" dataSource="ds2" query="select id,name,category from Tbl1"></entity> <entity name="record22" dataSource="ds2" query="select id,name,category from Tbl2"></entity> <entity name="record23" dataSource="ds2" query="select id,name,category from Tbl3"></entity> <entity name="record31" dataSource="ds3" query="select id,name,category from Tbl1"></entity> <entity name="record32" dataSource="ds3" query="select id,name,category from Tbl2"></entity> <entity name="record33" dataSource="ds3" query="select id,name,category from Tbl3"></entity> </document> </dataConfig> ================ Doubts/ Questions: ================ - Is this the right away to achieve indexing this data? - Is there a better way to achieve this? Imagine 20 databases with 20 tables each translates to 400 lines in the XML. This doesn't scale for something like 200 databases and 200 tables each. Will solr continue to work/ index properly if I had 40000 entity rows without going out of memory? - I will really want that I can search thru the complete database for a 'name' and do things like 'category' filtering etc easily independent of the entity name/ datasource. For me they are all records of the same type. Thanks and Best Regards, Jayant -- www.jkg.in | http://www.jkg.in/contact-me/ Jayant Kr. Gandhi