Thanks for your advice. I also found this method which so far has been able to traverse all the documents in the folder and index them in Solr.
public static void showFiles(File[] files) { for (File file : files) { if (file.isDirectory()) { System.out.println("Directory: " + file.getName()); showFiles(file.listFiles()); // Calls same method again. } else { System.out.println("File: " + file.getName()); } }} The problem for this is that it is indexing all the files regardless of the formats, instead of just those formats in post.jar. So I guess still have to "steal" some codes from there to detect the file format? As for files that contains non-English characters (Eg; Chinese characters), it is currently not able to read the Chinese characters, and it is all read as a series of "???". Any idea how to solve this problem? Thank you. Regards, Edwin On 16 October 2015 at 21:16, Duck Geraint (ext) GBJH < geraint.d...@syngenta.com> wrote: > Also, check this link for SolrJ example code (including the recursion): > https://lucidworks.com/blog/2012/02/14/indexing-with-solrj/ > > Geraint > > > Geraint Duck > Data Scientist > Toxicology and Health Sciences > Syngenta UK > Email: geraint.d...@syngenta.com > > -----Original Message----- > From: Jan Høydahl [mailto:jan....@cominvent.com] > Sent: 16 October 2015 12:14 > To: solr-user@lucene.apache.org > Subject: Re: Recursively scan documents for indexing in a folder in SolrJ > > SolrJ does not have any file crawler built in. > But you are free to steal code from SimplePostTool.java related to > directory traversal, and then index each document found using SolrJ. > > Note that SimplePostTool.java tries to be smart with what endpoint to post > files to, xml, csv and json content will be posted to /update while office > docs go to /update/extract > > -- > Jan Høydahl, search solution architect > Cominvent AS - www.cominvent.com > > > 16. okt. 2015 kl. 05.22 skrev Zheng Lin Edwin Yeo <edwinye...@gmail.com > >: > > > > Hi, > > > > I understand that in SimplePostTool (post.jar), there is this command > > to automatically detect content types in a folder, and recursively > > scan it for documents for indexing into a collection: > > bin/post -c gettingstarted afolder/ > > > > This has been useful for me to do mass indexing of all the files that > > are in the folder. Now that I'm moving to production and plans to use > > SolrJ to do the indexing as it can do more things like robustness > > checks and retires for indexes that fails. > > > > However, I can't seems to find a way to do the same in SolrJ. Is it > > possible for this to be done in SolrJ? I'm using Solr 5.3.0 > > > > Thank you. > > > > Regards, > > Edwin > > > ________________________________ > > > Syngenta Limited, Registered in England No 2710846;Registered Office : > Syngenta Limited, European Regional Centre, Priestley Road, Surrey Research > Park, Guildford, Surrey, GU2 7YH, United Kingdom > ________________________________ > This message may contain confidential information. If you are not the > designated recipient, please notify the sender immediately, and delete the > original and any copies. Any use of the message by you is prohibited. >