Hi giocondo, Ilya, Sasha,

It seems to me that the problem is not in distinct-values:

<operation name="PPFnDistinctValues" position="11:14" time="3.207"
calls="467">

<operation name="PPVariable" descriptor="1" variable-name="tmp"
position="11:30" time="2.282" calls="22341"/>

</operation>


means that fn:distinct-values() took only 1 second.

The actual problem is here:

<operation name="PPIntersect" doc-order="false" position="12:50"
time="20.731" calls="22806">

<operation name="PPSXptr" position="12:50" time="17.913" calls="10342429">

<operation name="PPVariable" descriptor="1" variable-name="tmp"
position="12:50" time="2.116" calls="10410906"/>

</operation>

....


We have PPSXptr inside "for"-cycle which again and again sorts the same
sequence of nodes ($tmp variable) to perform instersect then.

Sasha, Ilya, probably we could optimize that with an additional PPSXptr flag
which forces it to behave like PPStore (reusing the previous result)? Or
even more generally (and useful even if we implement XQuery 1.1 group by) -
we should insert PPStore(s) as high as possible within the second branch of
the PPReturn. Sasha, as far as I remember there should be kind of such
analysis implemented already?


Ivan Shcheklein,
Sedna Team


Hello,
>
>  This is common problem in XQuery. There hardly is a way of
> optimizing it without introducing special "group by" operator to
> FLWOR. We are currently working on it.
>
>  What version of Sedna do you use? Development branch contains
> optimized "distinct-values" function, so it may be faster.
>
> Ilya Taranov,
> Sedna Team.
>
> On Mon, Mar 14, 2011 at 3:04 PM, giocondo sticca
> <[email protected]> wrote:
> > Hi,
> >
> > I have performance problem with this query:
> >
> > declare namespace s="http://www.schemata.it/lml/1.0";;
> > declare namespace l="http://www.schemata.it/lml/1.0/linker";;
> >
> > let $tmp := for $b in ftindex-scan("ft_body","art.
> > 15","nosort")/../s:meta/s:classification/s:index
> >
> > return $b
> >
> > return
> >
> > <result>
> > {for $i in distinct-values($tmp)
> > return
> > <section><name>{$i}</name><total>{count($tmp[.=$i])}</total></section>
> > }
> > </result>
> >
> > Total time: 56500ms
> >
> > I have also tried  this variant, but total time is always too high
> >
> > declare namespace s="http://www.schemata.it/lml/1.0";;
> > declare namespace l="http://www.schemata.it/lml/1.0/linker";;
> >
> > let $tmp := for $b in ftindex-scan("ft_body","art.
> > 15","nosort")/../s:meta/s:classification/s:index
> >
> > return $b
> >
> > return
> >
> > <result>
> > {for $sec in distinct-values($tmp)
> > return <section><name>{$sec}</name><total>{count($tmp intersect
> > index-scan("test",$sec,"EQ"))}</total></section>
> > }
> > </result>
> >
> > Total time: 20453ms
> >
> > Any suggestion ?
> >
> > Thanks.
> >
> > P.S.
> >
> > This is the profile of the second query:
> >
> > <profile xmlns="http://www.modis.ispras.ru/sedna";>
> >   <total-time>23.970</total-time>
> > </profile><prolog xmlns="http://www.modis.ispras.ru/sedna";>
> >   <namespace prefix="l" uri="http://www.schemata.it/lml/1.0/linker"/>
> >   <namespace prefix="s" uri="http://www.schemata.it/lml/1.0"/>
> > </prolog><query xmlns="http://www.modis.ispras.ru/sedna";>
> >   <operation xmlns="" name="PPQueryRoot" time="23.970" calls="1">
> >     <operation name="PPLet" position="4:5" time="23.968" calls="2">
> >       <produces>
> >         <variable descriptor="1" name="tmp"/>
> >       </produces>
> >       <operation name="PPReturn" position="4:17" time="2.272"
> calls="22341">
> >         <produces>
> >           <variable descriptor="0" name="b"/>
> >         </produces>
> >         <operation name="PPDDO" position="4:93" time="2.255"
> calls="22341">
> >           <operation name="PPAxisChild" step="child::element(s:index)"
> > position="4:93" time="2.109" calls="22341">
> >             <operation name="PPAxisChild"
> > step="child::element(s:classification)" position="4:76" time="1.986"
> > calls="22341">
> >               <operation name="PPAxisChild" step="child::element(s:meta)"
> > position="4:69" time="1.820" calls="22341">
> >                 <operation name="PPAxisParent" step="parent::node()"
> > position="4:66" time="1.634" calls="22341">
> >                   <operation name="PPSeqChecker" mode="node"
> position="4:66"
> > time="1.460" calls="22341">
> >                     <operation name="PPFtIndexScan" position="4:23"
> > time="1.456" calls="22341">
> >                       <operation name="PPConst" type="xs:string"
> > value="ft_body" position="4:36" time="0.000" calls="2"/>
> >                       <operation name="PPConst" type="xs:string"
> value="art.
> > 15" position="4:46" time="0.000" calls="2"/>
> >                       <operation name="PPConst" type="xs:string"
> > value="nosort" position="4:56" time="0.000" calls="2"/>
> >                     </operation>
> >                   </operation>
> >                 </operation>
> >               </operation>
> >             </operation>
> >           </operation>
> >         </operation>
> >         <operation name="PPVariable" descriptor="0" variable-name="b"
> > position="6:8" time="0.004" calls="44680"/>
> >       </operation>
> >       <operation name="PPElementConstructor" element-name="result"
> > deep-copy="true" namespace-inside="false" position="10:1" time="23.968"
> > calls="2">
> >         <operation name="PPSequence" position="10:1" time="23.962"
> > calls="467">
> >           <operation name="PPSpaceSequence" doc-order="false"
> > position="11:1" time="23.962" calls="467">
> >             <operation name="PPReturn" position="11:6" time="23.962"
> > calls="467">
> >               <produces>
> >                 <variable descriptor="2" name="sec"/>
> >               </produces>
> >               <operation name="PPFnDistinctValues" position="11:14"
> > time="3.207" calls="467">
> >                 <operation name="PPVariable" descriptor="1"
> > variable-name="tmp" position="11:30" time="2.282" calls="22341"/>
> >               </operation>
> >               <operation name="PPElementConstructor"
> element-name="section"
> > deep-copy="false" namespace-inside="false" position="12:8" time="20.755"
> > calls="932">
> >                 <operation name="PPSequence" position="12:8"
> time="20.752"
> > calls="1398">
> >                   <operation name="PPElementConstructor"
> element-name="name"
> > deep-copy="false" namespace-inside="false" position="12:17" time="0.006"
> > calls="932">
> >                     <operation name="PPSequence" position="12:17"
> > time="0.001" calls="932">
> >                       <operation name="PPSpaceSequence" doc-order="false"
> > position="12:23" time="0.001" calls="932">
> >                         <operation name="PPVariable" descriptor="2"
> > variable-name="sec" position="12:24" time="0.001" calls="932"/>
> >                       </operation>
> >                     </operation>
> >                   </operation>
> >                   <operation name="PPElementConstructor"
> > element-name="total" deep-copy="false" namespace-inside="false"
> > position="12:36" time="20.745" calls="932">
> >                     <operation name="PPSequence" position="12:36"
> > time="20.737" calls="932">
> >                       <operation name="PPSpaceSequence" doc-order="false"
> > position="12:43" time="20.736" calls="932">
> >                         <operation name="PPFnCount" position="12:44"
> > time="20.736" calls="932">
> >                           <operation name="PPIntersect" doc-order="false"
> > position="12:50" time="20.731" calls="22806">
> >                             <operation name="PPSXptr" position="12:50"
> > time="17.913" calls="10342429">
> >                               <operation name="PPVariable" descriptor="1"
> > variable-name="tmp" position="12:50" time="2.116" calls="10410906"/>
> >                             </operation>
> >                             <operation name="PPSXptr" position="12:65"
> > time="1.511" calls="487246">
> >                               <operation name="PPIndexScan"
> > index-scan-condition="EQ" position="12:65" time="0.222" calls="487266">
> >                                 <operation name="PPConst"
> type="xs:string"
> > value="test" position="12:76" time="0.000" calls="932"/>
> >                                 <operation name="PPVariable"
> descriptor="2"
> > variable-name="sec" position="12:83" time="0.000" calls="932"/>
> >                                 <operation name="PPConst"
> type="xs:integer"
> > value="0" position="12:65" time="0.000" calls="0"/>
> >                               </operation>
> >                             </operation>
> >                           </operation>
> >                         </operation>
> >                       </operation>
> >                     </operation>
> >                   </operation>
> >                 </operation>
> >               </operation>
> >             </operation>
> >           </operation>
> >         </operation>
> >       </operation>
> >     </operation>
> >   </operation>
> > </query>
> >
> > All the tests was performed on a DELL PowerEdge 2009 (Quad-Core XEON
> > E5410, 4GB RAM, 3 x 750GB Raid SAS) with Debian Lenny (Kernel version
> > 2.6.26-2-amd64) and Sedna 3.4.228
> >
> >
> ------------------------------------------------------------------------------
> > Colocation vs. Managed Hosting
> > A question and answer guide to determining the best fit
> > for your organization - today and in the future.
> > http://p.sf.net/sfu/internap-sfd2d
> > _______________________________________________
> > Sedna-discussion mailing list
> > [email protected]
> > https://lists.sourceforge.net/lists/listinfo/sedna-discussion
> >
> >
>
>
>
> --
> Thanks in advance,
> Ilya Taranov.
>
> ------------------------------------------------------------------------------
> Colocation vs. Managed Hosting
> A question and answer guide to determining the best fit
> for your organization - today and in the future.
> http://p.sf.net/sfu/internap-sfd2d
> _______________________________________________
> Sedna-discussion mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/sedna-discussion
>
------------------------------------------------------------------------------
Colocation vs. Managed Hosting
A question and answer guide to determining the best fit
for your organization - today and in the future.
http://p.sf.net/sfu/internap-sfd2d
_______________________________________________
Sedna-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/sedna-discussion

Reply via email to