On Mon, Dec 12, 2011 at 2:24 AM, Brian Lamb <brian.l...@journalexperts.com> wrote: > Hi all, > > I have a few questions about how the MySQL data import works. It seems it > creates a separate connection for each entity I create. Is there any way to > avoid this?
Not sure, but I do not think that it is possible. However, from your description below, I think that you are unnecessarily multiplying entities. > By nature of my schema, I have several multivalued fields. Each one I > populate with a separate entity. Is there a better way to do it? For > example, could I pull in all the singular data in one sitting and then come > back in later and populate with the multivalued items. Not quite sure as to what you mean. Would it be possible for you to post your schema.xml, and the DIH configuration file? Preferably, put these on pastebin.com, and send us links. Also, you should obfuscate details like access passwords. > An alternate approach in some cases would be to do a GROUP_CONCAT and then > populate the multivalued column with some transformation. Is that possible? [...] This is how we have been handling it. A complete description would be long, but here is the gist of it: * A transformer will be needed. In this case, we found it easiest to use a Java-based transformer. Thus, your entity should include something like <entity name="myname" dataSource="mysource" transformer="com.mycompany.search.solr.handler.JobsNumericTransformer...> ... </entity> Here, the class name to be used for the transformer attribute follows the usual Java rules, and the .jar needs to be made available to Solr. * The SELECT statement for the entity looks something like select group_concat( myfield SEPARATOR '@||@')... The separator should be something that does not occur in your normal data stream. * Within the entity, define <field column="myfield"/> * There are complications involved if NULL values are allowed for the field, in which case you would need to use COALESCE, maybe along with CAST * The transformer would look up "myfield", split along the separator, and populate the multi-valued field. This *is* a little complicated, so I would also like to hear about possible alternatives. Regards, Gora