Here's a quick demo I wrote at one point. I haven't run it in a while, but you should be able to get the idea.
package jdbc; import org.apache.solr.client.solrj.SolrServerException; import org.apache.solr.client.solrj.impl.StreamingUpdateSolrServer; import org.apache.solr.client.solrj.impl.XMLResponseParser; import org.apache.solr.common.SolrInputDocument; import java.io.IOException; import java.sql.*; import java.util.ArrayList; import java.util.Collection; public class Indexer { public static void main(String[] args) { startIndex("http://localhost:8983/solr"); } private static void startIndex(String url) { Connection con = DataSource.getConnection(); try { long start = System.currentTimeMillis(); // Create a multi-threaded communications channel to the Solr server. Full interface (3.3) at: // http://lucene.apache.org/solr/api/org/apache/solr/client/solrj/impl/StreamingUpdateSolrServer.html StreamingUpdateSolrServer server = new StreamingUpdateSolrServer(url, 10, 4); // You may want to set these timeouts higer, Solr occasionally will have long pauses while // segments merge. server.setSoTimeout(1000); // socket read timeout server.setConnectionTimeout(100); //server.setDefaultMaxConnectionsPerHost(100); //server.setMaxTotalConnections(100); //server.setFollowRedirects(false); // defaults to false // allowCompression defaults to false. // Server side must support gzip or deflate for this to have any effect. //server.setAllowCompression(true); server.setMaxRetries(1); // defaults to 0. > 1 not recommended. server.setParser(new XMLResponseParser()); // binary parser is used by default doDocuments(server, con); server.commit(); // Only needs to be done at the end, autocommit or commitWithin should // do the rest. long endTime = System.currentTimeMillis(); System.out.println("Total Time Taken->" + (endTime - start) + " mils"); } catch (Exception e) { e.printStackTrace(); String msg = e.getMessage(); System.out.println(msg); } } private static void doDocuments(StreamingUpdateSolrServer server, Connection con) throws SQLException, IOException, SolrServerException { Statement st = con.createStatement(); ResultSet rs = st.executeQuery("select id,title,text from test"); // SolrInputDocument interface (3.3) at // http://lucene.apache.org/solr/api/org/apache/solr/common/SolrInputDocument.html Collection<SolrInputDocument> docs = new ArrayList<SolrInputDocument>(); int total = 0; int counter = 0; while (rs.next()) { SolrInputDocument doc = new SolrInputDocument(); // DO NOT move this outside the while loop // or be sure to call doc.clear() String id = rs.getString("id"); String title = rs.getString("title"); String text = rs.getString("text"); doc.addField("id", id); doc.addField("title", title); doc.addField("text", text); docs.add(doc); ++counter; ++total; if (counter > 1000) { // Completely arbitrary, just batch up more than one document for throughput! server.add(docs); docs.clear(); counter = 0; } } System.out.println("Total " + total + " Docs added succesfully"); } } // Trivial class showing connecting to a MySql database server via jdbc... class DataSource { public static Connection getConnection() { Connection conn = null; try { Class.forName("com.mysql.jdbc.Driver").newInstance(); System.out.println("Driver Loaded......"); conn = DriverManager.getConnection("jdbc:mysql://172.16.0.169:3306/test?" + "user=testuser&password=test123"); System.out.println("Connection build......"); } catch (Exception ex) { System.out.println(ex); } return conn; } public static void closeConnection(Connection con) { try { if (con != null) con.close(); } catch (SQLException e) { e.printStackTrace(); } } } On Mon, Dec 12, 2011 at 2:57 PM, Brian Lamb <brian.l...@journalexperts.com> wrote: > Thanks all. Erick, is there documentation on doing things with SolrJ and a > JDBC connection? > > On Mon, Dec 12, 2011 at 1:34 PM, Erick Erickson > <erickerick...@gmail.com>wrote: > >> You might want to consider just doing the whole >> thing in SolrJ with a JDBC connection. When things >> get complex, it's sometimes more straightforward. >> >> Best >> Erick... >> >> P.S. Yes, it's pretty standard to have a single >> field be the destination for several copyField >> directives. >> >> On Mon, Dec 12, 2011 at 12:48 PM, Gora Mohanty <g...@mimirtech.com> wrote: >> > On Mon, Dec 12, 2011 at 2:24 AM, Brian Lamb >> > <brian.l...@journalexperts.com> wrote: >> >> Hi all, >> >> >> >> I have a few questions about how the MySQL data import works. It seems >> it >> >> creates a separate connection for each entity I create. Is there any >> way to >> >> avoid this? >> > >> > Not sure, but I do not think that it is possible. However, from your >> description >> > below, I think that you are unnecessarily multiplying entities. >> > >> >> By nature of my schema, I have several multivalued fields. Each one I >> >> populate with a separate entity. Is there a better way to do it? For >> >> example, could I pull in all the singular data in one sitting and then >> come >> >> back in later and populate with the multivalued items. >> > >> > Not quite sure as to what you mean. Would it be possible for you >> > to post your schema.xml, and the DIH configuration file? Preferably, >> > put these on pastebin.com, and send us links. Also, you should >> > obfuscate details like access passwords. >> > >> >> An alternate approach in some cases would be to do a GROUP_CONCAT and >> then >> >> populate the multivalued column with some transformation. Is that >> possible? >> > [...] >> > >> > This is how we have been handling it. A complete description would >> > be long, but here is the gist of it: >> > * A transformer will be needed. In this case, we found it easiest >> > to use a Java-based transformer. Thus, your entity should include >> > something like >> > <entity name="myname" dataSource="mysource" >> > transformer="com.mycompany.search.solr.handler.JobsNumericTransformer...> >> > ... >> > </entity> >> > Here, the class name to be used for the transformer attribute follows >> > the usual Java rules, and the .jar needs to be made available to Solr. >> > * The SELECT statement for the entity looks something like >> > select group_concat( myfield SEPARATOR '@||@')... >> > The separator should be something that does not occur in your >> > normal data stream. >> > * Within the entity, define >> > <field column="myfield"/> >> > * There are complications involved if NULL values are allowed >> > for the field, in which case you would need to use COALESCE, >> > maybe along with CAST >> > * The transformer would look up "myfield", split along the separator, >> > and populate the multi-valued field. >> > >> > This *is* a little complicated, so I would also like to hear about >> > possible alternatives. >> > >> > Regards, >> > Gora >>