Seems to be working now!!! Thanks a lot !!! On Wed, Aug 11, 2021 at 6:22 PM ritika jain <[email protected]> wrote:
> Hi , > > Yes this works only the difference is when a single file is ingested we > are having ingested one as C:/Users/Dell/Desktop/abc.txt/.-with a UNWANTED > slash at end > > *The file spec part should include the file name.:- *This way I have > tried, I am getting Access denied. Also checked about all the Access is > granted to the user who is accessing > > On Wed, Aug 11, 2021 at 4:43 PM Karl Wright <[email protected]> wrote: > >> The "path" attribute is not meant to include terminal file names, only >> directories. I'm surprised that this works at all. The file spec part >> should include the file name. >> >> Karl >> >> >> On Wed, Aug 11, 2021 at 2:14 AM ritika jain <[email protected]> >> wrote: >> >>> *Dynamic Job * >>> >>> {"job":{"_children_":[{"_type_":"id","_value_":"1628595470228"},{"_type_":"description","_value_":"DEMo >>> TEMP >>> API-1628595484"},{"_type_":"repository_connection","_value_":"Demo_Repo"},{"_type_":"document_specification","_children_":[{"_type_":"startpoint","include":[{"_attribute_indexable":"yes","_attribute_filespec":"\/*.pdf","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.doc","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.docx","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.docb","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.dotx","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.dot","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.docm","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.ppt","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.pptx","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.wpd","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.wp","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.wp5","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.wp4","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.wp6","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.wp7","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.png","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.jpg","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.jpeg","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.gif","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.bmp","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.mpg","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.xlsm","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.xlsb","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.xls","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.xlsx","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.doc","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.mpeg","_value_":"","_attribute_type":"file"},{"_attribute_filespec":"*","_value_":"","_attribute_type":"directory"}],"_attribute_path":*"windows\/Job\/Demo >>> School >>> Network\/Information\/restpuntion.docx"*,"_value_":""},{"_type_":"maxlength","_value_":"","_attribute_value":"2000000"},{"_type_":"security","_value_":"","_attribute_value":"on"},{"_type_":"sharesecurity","_value_":"","_attribute_value":"on"},{"_type_":"parentfoldersecurity","_value_":"","_attribute_value":"on"}]},{"_type_":"pipelinestage","_children_":[{"_type_":"stage_id","_value_":"0"},{"_type_":"stage_isoutput","_value_":"false"},{"_type_":"stage_connectionname","_value_":"Tika"},{"_type_":"stage_specification","_children_":[{"_type_":"keepAllMetadata","_value_":"","_attribute_value":"true"},{"_type_":"ignoreException","_value_":"","_attribute_value":"true"},{"_type_":"lowerNames","_value_":"","_attribute_value":"false"},{"_type_":"writeLimit","_value_":"","_attribute_value":""},{"_type_":"boilerplateprocessor","_value_":"","_attribute_value":"de.l3s.boilerpipe.extractors.KeepEverythingExtractor"}]}]},{"_type_":"pipelinestage","_children_":[{"_type_":"stage_id","_value_":"1"},{"_type_":"stage_prerequisite","_value_":"0"},{"_type_":"stage_isoutput","_value_":"false"},{"_type_":"stage_connectionname","_value_":"Metadata >>> >>> Adjuster"},{"_type_":"stage_specification","_children_":[{"_type_":"expression","_attribute_parameter":"d_connector_type","_value_":"","_attribute_value":"FileShare"},{"_type_":"expression","_attribute_parameter":"d_description","_value_":"","_attribute_value":"\"${dc:description}\""},{"_type_":"keepAllMetadata","_value_":"","_attribute_value":"true"},{"_type_":"filterEmpty","_value_":"","_attribute_value":"true"}]}]},{"_type_":"pipelinestage","_children_":[{"_type_":"stage_id","_value_":"2"},{"_type_":"stage_prerequisite","_value_":"1"},{"_type_":"stage_isoutput","_value_":"true"},{"_type_":"stage_connectionname","_value_":"Deltares_Output"},{"_type_":"stage_specification"}]},{"_type_":"start_mode","_value_":"manual"},{"_type_":"run_mode","_value_":"scan >>> >>> once"},{"_type_":"hopcount_mode","_value_":"accurate"},{"_type_":"priority","_value_":"1"},{"_type_":"recrawl_interval","_value_":"86400000"},{"_type_":"max_recrawl_interval","_value_":"infinite"},{"_type_":"expiration_interval","_value_":"infinite"},{"_type_":"reseed_interval","_value_":"3600000"}]}} >>> >>> >>> *Other Manual Job* >>> >>> {"job":{"_children_":[{"_type_":"id","_value_":"1599130705168"},{"_type_":"description","_value_":"Demo_job"},{"_type_":"repository_connection","_value_":"mas_Repo"},{"_type_":"document_specification","_children_":[{"_type_":"startpoint","include":[{"_attribute_indexable":"yes","_attribute_filespec":"\/*.pdf","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.doc","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.docm","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.docx","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.docb","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.dot","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.dotx","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.wpd >>> >>> ","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.pptx","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.ppt","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.wp","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.wp4","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.wp5","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.wp6","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.wp7","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.xlsm >>> >>> ","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.xls","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.xls","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.xlsb","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.xlsx","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.png","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.jpg","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.jpeg","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.bmp","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.gif","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.mpeg","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.mpg","_value_":"","_attribute_type":"file"},{"_attribute_filespec":"*","_value_":"","_attribute_type":"directory"}],"_attribute_path":"*windows\/Job\/Demo >>> School >>> Network\/Information\*","_value_":""},{"_type_":"maxlength","_value_":"","_attribute_value":"5000000"},{"_type_":"security","_value_":"","_attribute_value":"on"},{"_type_":"sharesecurity","_value_":"","_attribute_value":"on"},{"_type_":"parentfoldersecurity","_value_":"","_attribute_value":"off"}]},{"_type_":"pipelinestage","_children_":[{"_type_":"stage_id","_value_":"0"},{"_type_":"stage_isoutput","_value_":"false"},{"_type_":"stage_connectionname","_value_":"Tika"},{"_type_":"stage_specification","_children_":[{"_type_":"keepAllMetadata","_value_":"","_attribute_value":"true"},{"_type_":"lowerNames","_value_":"","_attribute_value":"false"},{"_type_":"writeLimit","_value_":"","_attribute_value":""},{"_type_":"ignoreException","_value_":"","_attribute_value":"true"},{"_type_":"boilerplateprocessor","_value_":"","_attribute_value":"de.l3s.boilerpipe.extractors.KeepEverythingExtractor"}]}]},{"_type_":"pipelinestage","_children_":[{"_type_":"stage_id","_value_":"1"},{"_type_":"stage_prerequisite","_value_":"0"},{"_type_":"stage_isoutput","_value_":"false"},{"_type_":"stage_connectionname","_value_":"Metadata >>> >>> Adjuster"},{"_type_":"stage_specification","_children_":[{"_type_":"expression","_attribute_parameter":"d_connector_type","_value_":"","_attribute_value":"FileShare"},{"_type_":"expression","_attribute_parameter":"d_description","_value_":"","_attribute_value":"\"${dc:description}\" >>> >>> "},{"_type_":"keepAllMetadata","_value_":"","_attribute_value":"true"},{"_type_":"filterEmpty","_value_":"","_attribute_value":"true"}]}]},{"_type_":"pipelinestage","_children_":[{"_type_":"stage_id","_value_":"2"},{"_type_":"stage_prerequisite","_value_":"1"},{"_type_":"stage_isoutput","_value_":"true"},{"_type_":"stage_connectionname","_value_":"Deltares_Output"},{"_type_":"stage_specification"}]},{"_type_":"start_mode","_value_":"manual"},{"_type_":"run_mode","_value_":"scan >>> >>> once"},{"_type_":"hopcount_mode","_value_":"accurate"},{"_type_":"priority","_value_":"5"},{"_type_":"recrawl_interval","_value_":"86400000"},{"_type_":"max_recrawl_interval","_value_":"infinite"},{"_type_":"expiration_interval","_value_":"infinite"},{"_type_":"reseed_interval","_value_":"3600000"}]}} >>> >>> Basically these two job structures are fully same.Except Path:- is >>> mentioned as 1) Complete path till File location 2) only path till folders. >>> >>> In the first case the ingestion file has a slash at the end and In second >>> case we don't. >>> >>> >>> Thanks' >>> >>> Ritika >>> >>> >>> On Tue, Aug 10, 2021 at 6:52 PM Karl Wright <[email protected]> wrote: >>> >>>> I am sorry, but I'm having trouble understanding how exactly you are >>>> configuring the JCIFS connector in these two cases. Can you view the job >>>> in each case and provide cut-and-paste of the view? >>>> >>>> Karl >>>> >>>> >>>> On Tue, Aug 10, 2021 at 9:09 AM ritika jain <[email protected]> >>>> wrote: >>>> >>>>> Hi All, >>>>> >>>>> I am using Window shares connector in 2.14 manifoldcf version and >>>>> Elastic as output. >>>>> I have created a dynamic manifoldcf job API via which a job will be >>>>> created in manifoldcf with inclusions list and path, only particular file >>>>> path is to be mentioned . Example file path:- >>>>> C:/Users/Dell/Desktop/abc.txt. >>>>> >>>>> A job will be created to crawl only this single file . >>>>> *Issue is :-* >>>>> When this job ingest document in Elastic search there is slash, that >>>>> is getting appended in the end >>>>> >>>>> *Ingested file is* :- C:/Users/Dell/Desktop/abc.txt/ >>>>> >>>>> But when same file is crawled via Manifoldcf job settings by >>>>> mentioning path till folder structure (as manual job creation does not >>>>> allow file path till particular file it allows till folders only). >>>>> It does not append / >>>>> >>>>> *Ingested file in this case:-* >>>>> C:/Users/Dell/Desktop/abc.txt >>>>> as expected original file. >>>>> >>>>> *Query* >>>>> Why is this the case as it makes searching in ES ambiguous. >>>>> >>>>> Thanks >>>>> Ritika >>>>> >>>>> >>>>>
