Also, please send the file logs/manifoldcf.log as well -- as a text file. Karl
On Fri, Jul 18, 2014 at 11:12 AM, Karl Wright <daddy...@gmail.com> wrote: > Could you please get a thread dump and send that to me? Please send as a > text file not a screen shot. > > To get a thread dump, get the process ID of the agents process, and use > the jdk's jstack utility to obtain the dump. > > Thanks, > Karl > > > > On Fri, Jul 18, 2014 at 11:08 AM, Ameya Aware <ameya.aw...@gmail.com> > wrote: > >> yeah.. i thought so that it should not effect in 4000 documents. >> >> I am using filesystem connector to crawl all of my C drive and output >> connection is null. >> >> There are no error logs in MCF. MCF is standstill at same screen since >> half an hour. >> >> Attaching some snapshots for your reference. >> >> >> Thanks, >> Ameya >> >> >> >> >> On Fri, Jul 18, 2014 at 11:02 AM, Karl Wright <daddy...@gmail.com> wrote: >> >>> Hi Ameya, >>> >>> 4000 documents is nothing at all. We have load tests which I run on >>> every release that include more than 100000 documents on a crawl. >>> >>> Can you be more specific about the case that you say "hung up"? >>> Specifically: >>> >>> (1) What kind of crawl is this? SharePoint? Web? >>> (2) Are there any errors in the manifoldcf log? >>> >>> Thanks, >>> Karl >>> >>> >>> >>> >>> >>> On Fri, Jul 18, 2014 at 10:59 AM, Ameya Aware <ameya.aw...@gmail.com> >>> wrote: >>> >>>> Hi Karl, >>>> >>>> I spent some time going through PostgreSQL 9.3 manual. >>>> I configured PostgreSQL for MCF and saw the significant change in >>>> performance time. >>>> >>>> I ran it yesterday for some 4000 documents. When i started running >>>> again today, the performance was very poor and after 200 documents, it hung >>>> up. >>>> >>>> Is it because of periodic maintenance it needs? Also, i would want to >>>> know where and how exactly VACUUM FULL command needs to be used? >>>> >>>> Thanks, >>>> Ameya >>>> >>>> >>>> On Thu, Jul 17, 2014 at 2:13 PM, Karl Wright <daddy...@gmail.com> >>>> wrote: >>>> >>>>> It is fine; I am running Postgresql 9.3 here. >>>>> >>>>> Karl >>>>> >>>>> >>>>> On Thu, Jul 17, 2014 at 2:08 PM, Ameya Aware <ameya.aw...@gmail.com> >>>>> wrote: >>>>> >>>>>> is PostgreySQL 9.3 version good because i already have it in my >>>>>> machine.. Though documentation says "ManifoldCF has been tested >>>>>> against version 8.3.7, 8.4.5 and 9.1 of PostgreSQL. " >>>>>> >>>>>> Ameya >>>>>> >>>>>> >>>>>> On Thu, Jul 17, 2014 at 1:09 PM, Karl Wright <daddy...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> If you haven't configured MCF to use PostgreSQL, then you are using >>>>>>> Derby, which is not recommended for production use. >>>>>>> >>>>>>> Instructions on how to set up MCF to use PostgreSQL are available on >>>>>>> the MCF site on the how-to-build-and-deploy page. Configuring >>>>>>> PostgreSQL >>>>>>> for millions or tens of millions of documents will require someone to >>>>>>> learn >>>>>>> about PostgreSQL and how to administer it. The how-to-build-and-deploy >>>>>>> page provides some (old) guidelines and hints, but if I were you I'd >>>>>>> read >>>>>>> the postgresql manual for the version you install. >>>>>>> >>>>>>> Karl >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Thu, Jul 17, 2014 at 1:04 PM, Ameya Aware <ameya.aw...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Ooh ok. >>>>>>>> >>>>>>>> Actually i have never configured PostgreySQL yet. i am simply using >>>>>>>> binary distribution of MCF to configure file system connectors to >>>>>>>> connect >>>>>>>> to Solr. >>>>>>>> >>>>>>>> Do i need to configure PostgreySQL?? How can i proceed from here to >>>>>>>> check performance measurements? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Ameya >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Jul 17, 2014 at 12:10 PM, Karl Wright <daddy...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Yes. Also have a look at the how-to-build-and-deploy page for >>>>>>>>> hints on how to configure PostgreSQL for maximum performance. >>>>>>>>> >>>>>>>>> ManifoldCF's performance is almost entirely based on the >>>>>>>>> database. If you are using PostgreSQL, which is the fastest >>>>>>>>> ManifoldCF >>>>>>>>> choice, you should be able to see in the logs when queries take a long >>>>>>>>> time, or when indexes are automatically rebuilt. Could you provide >>>>>>>>> any >>>>>>>>> information as to what your overall system setup looks like? >>>>>>>>> >>>>>>>>> Karl >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Jul 17, 2014 at 11:32 AM, Ameya Aware < >>>>>>>>> ameya.aw...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> >>>>>>>>>> http://manifoldcf.apache.org/release/trunk/en_US/performance-tuning.html >>>>>>>>>> >>>>>>>>>> This page? >>>>>>>>>> >>>>>>>>>> Ameya >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Thu, Jul 17, 2014 at 11:28 AM, Karl Wright <daddy...@gmail.com >>>>>>>>>> > wrote: >>>>>>>>>> >>>>>>>>>>> Hi Ameya, >>>>>>>>>>> >>>>>>>>>>> Have you read the performance page? >>>>>>>>>>> >>>>>>>>>>> Karl >>>>>>>>>>> >>>>>>>>>>> Sent from my Windows Phone >>>>>>>>>>> ------------------------------ >>>>>>>>>>> From: Ameya Aware >>>>>>>>>>> Sent: 7/17/2014 11:27 AM >>>>>>>>>>> To: user@manifoldcf.apache.org >>>>>>>>>>> Subject: Performance issues >>>>>>>>>>> >>>>>>>>>>> Hi >>>>>>>>>>> >>>>>>>>>>> I have millions of documents to crawl and send them to Solr. >>>>>>>>>>> >>>>>>>>>>> But when i run it for thousands documents, it takes too much >>>>>>>>>>> time for it or sometimes it even hangs up. >>>>>>>>>>> >>>>>>>>>>> So what could be the way to reduce the performance time? >>>>>>>>>>> >>>>>>>>>>> Also, i do not need content of the documents, i just need >>>>>>>>>>> metadata, so can i skip content part from reading and fetching and >>>>>>>>>>> will >>>>>>>>>>> that improve performance time? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Ameya >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >