Re: Data Import from RDBMS+File

Alexandre Rafalovitch Mon, 08 Jul 2013 09:24:06 -0700

You can mix and match the data sources in nested entities, yes. Just make
sure that you declare your data sources at the top and refer to them
properly. As per documentation:
"Ensure that the dataSource is of type DataSource<Reader> (FileDataSource,
URLDataSource)". So you need to declare one at the top of the file, next to
JDBC one.


I would also recommend using explicit names for both data sources and when
declaring entity references. By default, DIH will find JDBC data source and
use that, but that can cause subtle bugs later when multiple data sources
are introduced.

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Mon, Jul 8, 2013 at 11:18 AM, Raheel Hasan <raheelhasan....@gmail.com>wrote:

> ok great.....
>
> can I use this EntityProcessor within JdbcDataSource?
>
> Like this:
>
> <dataConfig>
>   <dataSource type="JdbcDataSource"
>               driver="com.mysql.jdbc.Driver"
>               url="jdbc:mysql://localhost/db_1"
>               user="root"
>               password=""
>               autoCommit="true"
>               />
>
>   <document>
>
>   <entity name="table_1_fetch"
>  query="SELECT field_1 FROM table_1 WHERE ('${dataimporter.request.clean}'
> != 'false' OR added_on > '${dataimporter.last_index_time}')">
>
> <entity name="genesis_case_documents"
> query="SELECT original_document FROM case_documents WHERE case_md5
> ='${genesis_case_info.case_md5}'">
>  </entity>
>  <entity processor="PlainTextEntityProcessor"
> name="table_2_from_file_fetch" url="http://localhost/project_1/files/a.txt
> "
> dataSource="data-source-name">
>   <field column="plainText" name="text"/>
> </entity>
>
>
>
> By the way, I currently load the field into "text_en_splitting" as defined
> in schema.xml...
>
>
>
>
> On Mon, Jul 8, 2013 at 7:59 PM, Alexandre Rafalovitch <arafa...@gmail.com
> >wrote:
>
> > http://wiki.apache.org/solr/DataImportHandler#PlainTextEntityProcessoror
> > http://wiki.apache.org/solr/DataImportHandler#LineEntityProcessor ?
> >
> > The file name gets exposed as a ${entityname.fieldname} variable. You can
> > probably copy/manipulate it with a transformer on the external entity
> > before it hits an inner one.
> >
> > Regards,
> >   Alex.
> >
> > Personal website: http://www.outerthoughts.com/
> > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> > - Time is the quality of nature that keeps events from happening all at
> > once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
> >
> >
> > On Mon, Jul 8, 2013 at 10:42 AM, Raheel Hasan <raheelhasan....@gmail.com
> > >wrote:
> >
> > > On this page (http://wiki.apache.org/solr/DataImportHandler), I cant
> see
> > > how its possible. Perhaps there is another guide..
> > >
> > > Basically, this is what I am doing:
> > > Index data from multiple tables into Solr (see here
> > > http://wiki.apache.org/solr/DIHQuickStart). I need to skip 1 very big
> > > heavy
> > > table as it only have 1 field that is a complete file. So I want to
> skip
> > > the step of loading that file per record into my RDB and then indexing
> > > it... Instead, I want to directly index that file with the rest of the
> > > records from coming from database...
> > >
> > >
> > >
> > >
> > > On Mon, Jul 8, 2013 at 7:30 PM, Alexandre Rafalovitch <
> > arafa...@gmail.com
> > > >wrote:
> > >
> > > > Did you have a chance to look at DIH with nested entities yet? That's
> > > > probably the way to go to start out.
> > > >
> > > > Or a custom client, of course. Or, ETL solutions that support Solr
> > (e.g.
> > > > Apache Flume - not personally tested yet).
> > > >
> > > > Regards,
> > > >    Alex.
> > > >
> > > > Personal website: http://www.outerthoughts.com/
> > > > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> > > > - Time is the quality of nature that keeps events from happening all
> at
> > > > once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
> > book)
> > > >
> > > >
> > > > On Mon, Jul 8, 2013 at 10:08 AM, Raheel Hasan <
> > raheelhasan....@gmail.com
> > > > >wrote:
> > > >
> > > > > Hi everyone,
> > > > >
> > > > > I am looking for a way to import/index data such that i load data
> > from
> > > > > table_1 and instead of joining from table_2, i import the rest of
> the
> > > > > "joined" data from a file instead. The name of the file comes from
> a
> > > > field
> > > > > from table_1....
> > > > >
> > > > > Is it possible? and is it easily possible?
> > > > >
> > > > > --
> > > > > Regards,
> > > > > Raheel Hasan
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Regards,
> > > Raheel Hasan
> > >
> >
>
>
>
> --
> Regards,
> Raheel Hasan
>

Re: Data Import from RDBMS+File

Reply via email to