Hi Karl,
Here is the explain. There isn't such sort...
mysql> explain SELECT
t0.id,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount,t0.priorityset
FROM jobqueue t0 FORCE INDEX (i1358228295210) WHERE t0 IN ('P','G') AND
t0.checkaction='R' AND t0.checktime<=1358649661663 AND EXISTS(SELECT 'x'
FROM jobs t1 WHERE t1.status IN ('A','a') AND t1.id=t0.jobid AND
t1.priority=5) AND NOT EXISTS(SELECT 'x' FROM jobqueue t2 WHERE
t2.dochash=t0.dochash AND t2.status IN ('A','F','a','f','D','d') AND
t2.jobid!=t0.jobid) AND NOT EXISTS(SELECT 'x' FROM prereqevents t3,events
t4 WHERE t0.id=t3.owner AND t3.eventname=t4.name) ORDER BY t0.docpriority
ASC LIMIT 4800;
+----+--------------------+-------+--------+----------------------------------------------+----------------+---------+-------------------------+------+-------------+
| id | select_type | table | type | possible_keys
| key | key_len | ref | rows
| Extra |
+----+--------------------+-------+--------+----------------------------------------------+----------------+---------+-------------------------+------+-------------+
| 1 | PRIMARY | t0 | index | NULL
| I1358228295210 | 25 | NULL | 4800 |
Using where |
| 4 | DEPENDENT SUBQUERY | t3 | ref | I1358228295216
| I1358228295216 | 8 | manifoldcf.t0.id | 1 |
|
| 4 | DEPENDENT SUBQUERY | t4 | eq_ref | PRIMARY
| PRIMARY | 767 | manifoldcf.t3.eventname | 1
| Using index |
| 3 | DEPENDENT SUBQUERY | t2 | ref |
I1358228295209,I1358228295212,I1358228295211 | I1358228295209 | 122 |
manifoldcf.t0.dochash | 1 | Using where |
| 2 | DEPENDENT SUBQUERY | t1 | eq_ref | PRIMARY,I1358228295219
| PRIMARY | 8 | manifoldcf.t0.jobid | 1 |
Using where |
+----+--------------------+-------+--------+----------------------------------------------+----------------+---------+-------------------------+------+-------------+
5 rows in set (0.00 sec)
Regards,
Shigeki
2013/1/21 Karl Wright <[email protected]>
> takes too long
> MIME-Version: 1.0
> Content-Type: multipart/alternative; boundary=14dae9399bd9676e5704d3c356e9
>
> --14dae9399bd9676e5704d3c356e9
> Content-Type: text/plain; charset="utf-8"
> Content-Transfer-Encoding: 7bit
>
> Can you get an EXPLAIN for this query? It sounds like it is
> disregarding the hint for some reason.
>
> Karl
>
> Sent from my Windows Phone
> From: Shigeki Kobayashi
> Sent: 1/20/2013 9:37 PM
> To: [email protected]
> Subject: Re: Crawling new/updated files using Windows share connection
> takes too long
> Hi Karl.
>
> I configured MySQL 5.5 to run MCF this time.
> The version of MCF is trunk 1.1dev downloaded on Dec, 12th. , which you
> fixed
> the slow query using "FORCE INDEX". Solr is 4.0
>
> I thought is was fixed but the log shows that the following are slow
> queries.
> -------------------------------------------------------------------
> # Time: 130120 11:41:10
> # User@Host: manifoldcf[manifoldcf] @ localhost [127.0.0.1]
> # Query_time: 8.761087 Lock_time: 0.000163 Rows_sent: 17 Rows_examined:
> 6365233
> SET timestamp=1358649670;
> SELECT t0.id
> ,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount,t0.priorityset
> FROM jobqueue t0 FORCE INDEX (i1358228295210) WHERE t0.status IN ('P','G')
> AND t0.checkaction='R' AND t0.checktime<=1358649661663 AND EXISTS(SELECT
> 'x' FROM jobs t1 WHERE t1.status IN ('A','a') AND t1.id=t0.jobid AND
> t1.priority=5) AND NOT EXISTS(SELECT 'x' FROM jobqueue t2 WHERE
> t2.dochash=t0.dochash AND t2.status IN ('A','F','a','f','D','d') AND
> t2.jobid!=t0.jobid) AND NOT EXISTS(SELECT 'x' FROM prereqevents t3,events
> t4 WHERE t0.id=t3.owner AND t3.eventname=t4.name) ORDER BY t0.docpriority
> ASC LIMIT 4800;
>
> # Time: 130120 11:41:18
> # User@Host: manifoldcf[manifoldcf] @ localhost [127.0.0.1]
> # Query_time: 7.714277 Lock_time: 0.000123 Rows_sent: 0 Rows_examined:
> 6365182
> SET timestamp=1358649678;
> SELECT docpriority,jobid,dochash,docid FROM jobqueue t0 FORCE INDEX
> (i1358228295210) WHERE status IN ('P','G') AND checkaction='R' AND
> checktime<=1358649661663 AND EXISTS(SELECT 'x' FROM jobs t1 WHERE t1.status
> IN ('A','a') AND t1.id=t0.jobid) ORDER BY docpriority ASC LIMIT 1;
>
> Regards,
>
>
> Shigeki
>
>
>
> 2013/1/18 Karl Wright <[email protected]>
>
> > Hi Shigeki,
> >
> > What database is ManifoldCF configured to use in this case? Do you
> > see any indication of slow queries in the ManifoldCF log?
> >
> >
> > Karl
> >
> > On Fri, Jan 18, 2013 at 5:27 AM, Shigeki Kobayashi
> > <[email protected]> wrote:
> > > Hello
> > >
> > >
> > > I would like some advice to improve crawling time of new/updated files
> > using
> > > Windows share connection.
> > >
> > > I crawl file in Windows server and index them into Solr.
> > >
> > > Currently, the second crawling of two hundred thousands files takes
> > over 5
> > > hours, even though any files are not updated, created, deleted.
> > >
> > > I assume MCF does the following processes (let me know if I am wrong)
> > >
> > > - obtain updated time of a file
> > > - compare the updated time with the one MCF obtained last time
> crawling(
> > > probably stored in DB)
> > > - if they are different MCF recognizes the file is to be indexed.
> > >
> > > If the above processes are done for two thousands files, what part of
> the
> > > processes could take time the most? obtaining updated time? reading
> data
> > > from DB? what could be done to increase the crawling time do you think?
> > >
> > > Please give me some advice.
> > >
> > >
> > > Regards,
> > >
> > > Shigeki
> > >
> > >
> >
>
> --14dae9399bd9676e5704d3c356e9
> Content-Type: text/html; charset="utf-8"
> Content-Transfer-Encoding: quoted-printable
>
> <html><head><meta content=3D"text/html; charset=3Dutf-8"
> http-equiv=3D"Cont=
> ent-Type"></head><body><div><div style=3D"font-family: Calibri,sans-serif;
> =
> font-size: 11pt;"><br>Can you get an EXPLAIN for this query? It
> sound=
> s like it is disregarding the hint for some
> reason.<br><br>Karl<br><br>Sent=
> from my Windows Phone<br></div></div><hr><span style=3D"font-family:
> Tahom=
> a,sans-serif; font-size: 10pt; font-weight: bold;">From: </span><span
> style=
> =3D"font-family: Tahoma,sans-serif; font-size: 10pt;">Shigeki
> Kobayashi</sp=
> an><br><span style=3D"font-family: Tahoma,sans-serif; font-size: 10pt;
> font=
> -weight: bold;">Sent: </span><span style=3D"font-family:
> Tahoma,sans-serif;=
> font-size: 10pt;">1/20/2013 9:37 PM</span><br><span style=3D"font-family:
> =
> Tahoma,sans-serif; font-size: 10pt; font-weight: bold;">To: </span><span
> st=
> yle=3D"font-family: Tahoma,sans-serif; font-size: 10pt;">[email protected]
> =
> ache.org</span><br><span style=3D"font-family: Tahoma,sans-serif;
> font-size=
> : 10pt; font-weight: bold;">Subject: </span><span style=3D"font-family:
> Tah=
> oma,sans-serif; font-size: 10pt;">Re: Crawling new/updated files using
> Wind=
> ows share connection takes too long</span><br><br></body></html><div
> dir=3D=
> "ltr">Hi Karl.<div><br></div><div>I configured MySQL 5.5 to run MCF this
> ti=
> me.</div><div>The version of MCF is trunk 1.1dev downloaded on Dec, 12th.
> ,=
> which you fixed</div><div style>the slow query using "FORCE
> INDEX&quo=
> t;. Solr is 4.0</div>
>
> <div style><br></div><div style>I thought is was fixed but the log shows
> th=
> at =C2=A0the following are slow queries.=C2=A0</div><div><div
> class=3D"gmai=
>
> l_extra">------------------------------------------------------------------=
> -</div>
> <div class=3D"gmail_extra">
> <div class=3D"gmail_extra"># Time: 130120 11:41:10</div><div
> class=3D"gmail=
> _extra"># User@Host: manifoldcf[manifoldcf] @ localhost
> [127.0.0.1]</div><d=
> iv class=3D"gmail_extra"># Query_time: 8.761087 =C2=A0Lock_time: 0.000163
> R=
> ows_sent: 17 =C2=A0Rows_examined: 6365233</div>
>
> <div class=3D"gmail_extra">SET timestamp=3D1358649670;</div><div
> class=3D"g=
> mail_extra">SELECT <a href=3D"http://t0.id">t0.id
> </a>,t0.jobid,t0.dochash,t=
> 0.docid,t0.status,t0.failtime,t0.failcount,t0.priorityset FROM jobqueue t0
> =
> FORCE INDEX (i1358228295210) WHERE t0.status IN ('P','G')
> A=
> ND t0.checkaction=3D'R' AND t0.checktime<=3D1358649661663 AND
> EX=
> ISTS(SELECT 'x' FROM jobs t1 WHERE t1.status IN
> ('A','a=
> ') AND <a href=3D"http://t1.id">t1.id</a>=3Dt0.jobid AND
> t1.priority=3D=
> 5) AND NOT EXISTS(SELECT 'x' FROM jobqueue t2 WHERE
> t2.dochash=3Dt0=
> .dochash AND t2.status IN
> ('A','F','a','f',=
> 'D','d') AND t2.jobid!=3Dt0.jobid) AND NOT EXISTS(SELECT
> &#=
> 39;x' FROM prereqevents t3,events t4 WHERE <a href=3D"http://t0.id
> ">t0.=
> id</a>=3Dt3.owner AND t3.eventname=3D<a href=3D"http://t4.name">t4.name
> </a>=
> ) ORDER BY t0.docpriority ASC LIMIT 4800;</div>
>
> <div class=3D"gmail_extra"><br></div><div class=3D"gmail_extra"># Time:
> 130=
> 120 11:41:18<br></div><div class=3D"gmail_extra"># User@Host:
> manifoldcf[ma=
> nifoldcf] @ localhost [127.0.0.1]<br></div><div class=3D"gmail_extra">#
> Que=
> ry_time: 7.714277 =C2=A0Lock_time: 0.000123 Rows_sent: 0
> =C2=A0Rows_examine=
> d: 6365182</div>
>
> <div class=3D"gmail_extra">SET timestamp=3D1358649678;</div><div
> class=3D"g=
> mail_extra">SELECT docpriority,jobid,dochash,docid FROM jobqueue t0 FORCE
> I=
> NDEX (i1358228295210) WHERE status IN ('P','G') AND
> checkac=
> tion=3D'R' AND checktime<=3D1358649661663 AND EXISTS(SELECT
> '=
> ;x' FROM jobs t1 WHERE t1.status IN ('A','a') AND <a
> hr=
> ef=3D"http://t1.id">t1.id</a>=3Dt0.jobid) =C2=A0ORDER BY docpriority ASC
> LI=
> MIT 1;</div>
>
> <div><br></div><div class=3D"gmail_extra" style>Regards,</div><div
> class=3D=
> "gmail_extra"><br></div><div class=3D"gmail_extra"><br></div><div
> class=3D"=
> gmail_extra" style>Shigeki</div><div class=3D"gmail_extra"><br></div><div
> c=
> lass=3D"gmail_extra">
>
> <br></div><br><div class=3D"gmail_quote">2013/1/18 Karl Wright <span
> dir=3D=
> "ltr"><<a href=3D"mailto:[email protected]"
> target=3D"_blank">daddywri@=
> gmail.com</a>></span><br><blockquote class=3D"gmail_quote"
> style=3D"marg=
> in:0px 0px 0px
> 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,20=
> 4);border-left-style:solid;padding-left:1ex">
>
> Hi Shigeki,<br>
> <br>
> What database is ManifoldCF configured to use in this case? =C2=A0Do
> you<br=
> >
> see any indication of slow queries in the ManifoldCF log?<br>
> <br>
> <br>
> Karl<br>
> <br>
> On Fri, Jan 18, 2013 at 5:27 AM, Shigeki Kobayashi<br>
> <<a href=3D"mailto:[email protected]
> ">shigeki.kobayash=
> [email protected]</a>> wrote:<br>
> > Hello<br>
> ><br>
> ><br>
> > I would like some advice to improve crawling time of new/updated
> files=
> using<br>
> > Windows share connection.<br>
> ><br>
> > I crawl file in Windows server and index them into Solr.<br>
> ><br>
> > Currently, the second crawling of two hundred thousands files takes =
> =C2=A0over 5<br>
> > hours, even though any files are not updated, created, deleted.<br>
> ><br>
> > I assume MCF does the following processes (let me know if I am
> wrong)<=
> br>
> ><br>
> > - obtain updated time of a file<br>
> > - compare the updated time with the one MCF obtained last time
> crawlin=
> g(<br>
> > probably stored in DB)<br>
> > - if they are different MCF recognizes the file is to be indexed.<br>
> ><br>
> > If the above processes are done for two thousands files, what part of
> =
> the<br>
> > processes could take time the most? obtaining updated time? reading
> da=
> ta<br>
> > from DB? what could be done to increase the crawling time do you
> think=
> ?<br>
> ><br>
> > Please give me some advice.<br>
> ><br>
> ><br>
> > Regards,<br>
> ><br>
> > Shigeki<br>
> ><br>
> ><br>
> </blockquote></div><br><br clear=3D"all"><div><br></div><br><div><font
> face=
> =3D"'ms gothic', monospace">=C2=A0</font></div><div><br></div>
> </div></div></div>
>
> --14dae9399bd9676e5704d3c356e9--
>