On Mon, Dec 12, 2011 at 2:24 AM, Brian Lamb
<brian.l...@journalexperts.com> wrote:
> Hi all,
>
> I have a few questions about how the MySQL data import works. It seems it
> creates a separate connection for each entity I create. Is there any way to
> avoid this?

Not sure, but I do not think that it is possible. However, from your description
below, I think that you are unnecessarily multiplying entities.

> By nature of my schema, I have several multivalued fields. Each one I
> populate with a separate entity. Is there a better way to do it? For
> example, could I pull in all the singular data in one sitting and then come
> back in later and populate with the multivalued items.

Not quite sure as to what you mean. Would it be possible for you
to post your schema.xml, and the DIH configuration file? Preferably,
put these on pastebin.com, and send us links. Also, you should
obfuscate details like access passwords.

> An alternate approach in some cases would be to do a GROUP_CONCAT and then
> populate the multivalued column with some transformation. Is that possible?
[...]

This is how we have been handling it. A complete description would
be long, but here is the gist of it:
* A transformer will be needed. In this case, we found it easiest
  to use a Java-based transformer. Thus, your entity should include
  something like
  <entity name="myname" dataSource="mysource"
transformer="com.mycompany.search.solr.handler.JobsNumericTransformer...>
  ...
  </entity>
 Here, the class name to be used for the transformer attribute follows
 the usual Java rules, and the .jar needs to be made available to Solr.
* The SELECT statement for the entity looks something like
  select group_concat( myfield SEPARATOR '@||@')...
  The separator should be something that does not occur in your
  normal data stream.
* Within the entity, define
   <field column="myfield"/>
* There are complications involved if NULL values are allowed
   for the field, in which case you would need to use COALESCE,
   maybe along with CAST
* The transformer would look up "myfield", split along the separator,
   and populate the multi-valued field.

This *is* a little complicated, so I would also like to hear about
possible alternatives.

Regards,
Gora

Reply via email to