Thank you for your response.  I will look into this link.  Also, sorry I did 
not specify the file type.   I am working with XML files.




~~~~~~~~~~~~~~~~~~~~~~~
William Kevin Miller

ECS Federal, Inc.
USPS/MTSC
(405) 573-2158


-----Original Message-----
From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] 
Sent: Monday, June 12, 2017 1:26 PM
To: solr-user
Subject: Re: DIH issue with streaming xml file

Solr 6.5.1 DIH setup has - somewhat broken - RSS example (redone as ATOM 
example in 6.6) that shows how to get stuff from https URL. You can see the 
atom example here:
https://github.com/apache/lucene-solr/blob/releases/lucene-solr/6.6.0/solr/example/example-DIH/solr/atom/conf/atom-data-config.xml


The main issue however is that you are not saying what format is that list of 
file on the server. Is that a plain list? Is it XML with files? Are you doing 
directory listing?

Regards,
   Alex.
----
http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 12 June 2017 at 14:11, Miller, William K - Norman, OK - Contractor 
<william.k.mil...@usps.gov.invalid> wrote:
> Thank you for your response.  That is the issue that I am having.  I cannot 
> figure out how to get the list of files from the remote server.  I have tried 
> changing the parent Entity Processor to the XPathEntityProcessor and the 
> baseDir to a url using https.  This did not work as it was looking for a 
> "foreach" attribute.  Is there an Entity Processor that can be used to get 
> the list of files from an https source or am I going to have to use solrj or 
> create a custom entity processor?
>
>
>
>
> ~~~~~~~~~~~~~~~~~~~~~~~
> William Kevin Miller
>
> ECS Federal, Inc.
> USPS/MTSC
> (405) 573-2158
>
>
> -----Original Message-----
> From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
> Sent: Monday, June 12, 2017 12:57 PM
> To: solr-user
> Subject: Re: DIH issue with streaming xml file
>
> How do you get a list of URLs for the files on the remote server? That's 
> probably the first issue. Once you have the URLs in an outside entity or two, 
> you can feed them one by one into the inner entity.
>
> Regards,
>    Alex.
>
> ----
> http://www.solr-start.com/ - Resources for Solr users, new and 
> experienced
>
> On 12 June 2017 at 09:39, Miller, William K - Norman, OK - Contractor < 
> william.k.mil...@usps.gov.invalid> wrote:
>
>> I am using Solr 6.5.1 and working on importing xml files using the 
>> DataImportHandler.  I am wanting to get the files from a remote 
>> server, but I am dealing with multiple xml files in multiple folders.
>> I am using a nested entity in my dataConfig.  Below is an example of 
>> how I have my dataConfig set up.  I got most of this from an online 
>> reference.  In this example I am getting the xml files from a folder 
>> on the Solr server, but as I mentioned above I want to get the files 
>> from a remote server.  I have looked at the different Entity 
>> Processors for the DIH, but have not seen anything that seems to work.
>> Is there a way to configure the below code to let me do this?
>>
>>
>>
>>
>>
>> <dataConfig>
>>
>>
>>
>>                 <dataSource name="hbk" encoding="UTF-8"
>> type="FileDataSource" />
>>
>>                 <document name="hbk">
>>
>>                                 <!--
>>
>>             Pickupdir fetches all files matching the filename regex 
>> in the supplied directory
>>
>>             and passes them to other entities which parse the file 
>> contents.
>>
>>         -->
>>
>>
>>
>>                                 <entity
>>
>>             name="pickupdir"
>>
>>             processor="FileListEntityProcessor"
>>
>>             rootEntity="false"
>>
>>             dataSource="null"
>>
>>             fileName="^[\w\d-]+\.xml$"
>>
>>             baseDir="/var/solr/data/hbk/data/xml/"
>>
>>             recursive="true"
>>
>>
>>
>>         >
>>
>>                                                 <!--
>>
>>
>> Pickupxmlfile parses standard Solr update XML.
>>
>>                                                 -->
>>
>>
>>
>>                                                 <entity
>>
>>                                                                 name="xml"
>>
>>
>> pk="itemId"
>>
>>
>> processor="XPathEntityProcessor"
>>
>>
>> transformer="RegexTransformer,TemplateTransformer"
>>
>>
>> datasource="pickupdir"
>>
>>
>> stream="true"
>>
>>
>> xsl="/var/solr/data/hbk/data/xsl/solr_timdex.xsl"
>>
>>
>> url="${pickupdir.fileAbsolutePath}"
>>
>>
>> forEach="/eflow/section | /eflow/section/item"
>>
>>                                                 >
>>
>>
>>
>>                                                                 
>> <field column="sectionId" xpath="/eflow/section/@id" 
>> commonField="true" />
>>
>>                                                                 
>> <field column="sectionTitle" xpath="/eflow/section/@title" commonField="true"
>> />
>>
>>                                                                 
>> <field column="sectionNo" xpath="/eflow/section/@secno" 
>> commonField="true" />
>>
>>                                                                 
>> <field column="hbkNo" xpath="/eflow/section/@hbkno" 
>> commonField="true" />
>>
>>                                                                 
>> <field column="volumeNo" xpath="/eflow/section/@volno" 
>> commonField="true" />
>>
>>
>>
>>                                                                 
>> <field column="itemId" xpath="/eflow/section/item/@id" />
>>
>>                                                                 
>> <field column="itemTitle" xpath="/eflow/section/item/@title" />
>>
>>                                                                 
>> <field column="itemNo" xpath="/eflow/section/item/@mit" />
>>
>>                                                                 
>> <field column="itemFile" xpath="/eflow/section/item/@file" />
>>
>>                                                                 
>> <field column="itemType" xpath="/eflow/section/item/@type" />
>>
>>                                                 </entity>
>>
>>                                 </entity>
>>
>>                 </document>
>>
>> </dataConfig>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> ~~~~~~~~~~~~~~~~~~~~~~~
>>
>> William Kevin Miller
>>
>> [image: ecsLogo]
>>
>> ECS Federal, Inc.
>>
>> USPS/MTSC
>>
>> (405) 573-2158
>>
>>
>>

Reply via email to