Thanks Karl.
Ameya On Thu, Jul 31, 2014 at 4:24 PM, Karl Wright <daddy...@gmail.com> wrote: > Hi Ameya, > > You cannot just comment out that line; instead you must supply an input > stream. But you can create a null input stream, for example: > > data.setBinary(new ByteArrayInputStream(new byte[0]),0); > > Karl > > > On Thu, Jul 31, 2014 at 4:22 PM, Ameya Aware <ameya.aw...@gmail.com> > wrote: > >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> long fileBytes = file.length(); >> RepositoryDocument data = new RepositoryDocument(); >> data.setBinary(is,fileBytes); >> String fileName = file.getName(); >> data.setFileName(fileName); >> data.setMimeType(mapExtensionToMimeType(fileName)); >> >> <<<<<<<<<<<<<<<<<<<<<<<<<<< >> >> >> do i just need to comment out 3rd line i.e. data.setBinary(is,fileBytes); >> ?? >> >> >> Thanks, >> Ameya >> >> >> On Thu, Jul 31, 2014 at 4:17 PM, Ameya Aware <ameya.aw...@gmail.com> >> wrote: >> >>> I could not exactly locate the position where this is happening. >>> >>> Can you please help me out with the changes? >>> >>> Thanks, >>> Ameya >>> >>> >>> >>> On Thu, Jul 31, 2014 at 4:10 PM, Karl Wright <daddy...@gmail.com> wrote: >>> >>>> Hi Ameya, >>>> >>>> Since you are already modifying the connector for your purposes, >>>> nothing is stopping you from modifying it further to not fetch the document >>>> and instead substitute an empty input stream. >>>> >>>> Karl >>>> >>>> >>>> >>>> On Thu, Jul 31, 2014 at 3:03 PM, Ameya Aware <ameya.aw...@gmail.com> >>>> wrote: >>>> >>>>> Hi, >>>>> >>>>> i have modified code a little to add different metadata fields such as >>>>> below (FileConnector.java): >>>>> >>>>> data.addField("created", new >>>>> Date((attr.creationTime().toMillis()))); >>>>> data.addField("last_accessed", new >>>>> Date(attr.lastAccessTime().toMillis())); >>>>> data.addField("last_modified", new >>>>> Date(file.lastModified())); >>>>> data.addField("size", file.length()); >>>>> >>>>> >>>>> which are being passed to Solr. >>>>> >>>>> Now can i stop MCF from reading a file and sending that content and >>>>> just passed above information to Solr? >>>>> >>>>> >>>>> Thanks, >>>>> Ameya >>>>> >>>>> >>>>> On Thu, Jul 31, 2014 at 2:57 PM, Karl Wright <daddy...@gmail.com> >>>>> wrote: >>>>> >>>>>> Hi Ameya, >>>>>> >>>>>> The file system connector does not retrieve any metadata for a >>>>>> document at all. So I'm not sure what metadata you are talking about. >>>>>> >>>>>> Karl >>>>>> >>>>>> >>>>>> >>>>>> On Thu, Jul 31, 2014 at 2:44 PM, Ameya Aware <ameya.aw...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> So the thing here is i am not looking for any data or content of any >>>>>>> of files. I am just interested in metadata of file. >>>>>>> >>>>>>> So i thought it should be possible to not read any file and just get >>>>>>> metadata of file and give to Solr. >>>>>>> >>>>>>> This should save lots of time. >>>>>>> >>>>>>> Is it possible to do this? >>>>>>> >>>>>>> Thanks, >>>>>>> Ameya >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Thu, Jul 31, 2014 at 2:13 PM, Karl Wright <daddy...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Ameya, >>>>>>>> >>>>>>>> (1) Please look at the Simple History report. Note what kinds of >>>>>>>> documents are being fetched, what kinds are being indexed, and how >>>>>>>> long it >>>>>>>> is taking. I have noted from your previous posts that you seem to be >>>>>>>> indexing a lot of very large EXE files. This is useless and you >>>>>>>> should be >>>>>>>> excluding them. >>>>>>>> >>>>>>>> (2) Please look in the manifoldcf.log file for evidence that >>>>>>>> fetches and/or Solr indexing requests are being retried due to errors. >>>>>>>> It >>>>>>>> doesn't take many documents being chronically retried before forward >>>>>>>> progress drops to near zero. >>>>>>>> >>>>>>>> (3) If you look into (1) & (2) and everything seems fine, it may be >>>>>>>> a misalignment between availability of several kinds of resources that >>>>>>>> is >>>>>>>> the problem. Please get a thread dump of the agents process while it >>>>>>>> is >>>>>>>> crawling, using jstack. Post that thread dump and we can tell you >>>>>>>> what to >>>>>>>> look at next. >>>>>>>> >>>>>>>> Karl >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Jul 31, 2014 at 2:07 PM, Ameya Aware <ameya.aw...@gmail.com >>>>>>>> > wrote: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> >>>>>>>>> I am using filesystem connector to index my entire C drive using >>>>>>>>> Solr as output connector. >>>>>>>>> >>>>>>>>> Initial 100000 documents were crawled and indexed successfully in >>>>>>>>> couple of hours but after that indexing slowed down badly (around >>>>>>>>> 15-20 >>>>>>>>> documents per min). >>>>>>>>> >>>>>>>>> >>>>>>>>> I am not able to figure out whether there is issue with MCF or >>>>>>>>> Solr. >>>>>>>>> >>>>>>>>> >>>>>>>>> Can you advice me how to proceed with this? >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Ameya >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >