Added: websites/staging/lucy/trunk/content/docs/0.5.0/perl/Lucy/Analysis/StandardTokenizer.html ============================================================================== --- websites/staging/lucy/trunk/content/docs/0.5.0/perl/Lucy/Analysis/StandardTokenizer.html (added) +++ websites/staging/lucy/trunk/content/docs/0.5.0/perl/Lucy/Analysis/StandardTokenizer.html Wed Sep 28 12:07:48 2016 @@ -0,0 +1,163 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> +<html lang="en"> + <head> + <meta http-equiv="Content-Type" content="text/html;charset=UTF-8"> + <title>Lucy::Analysis::StandardTokenizer â Apache Lucy Documentation</title> + <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css"> + </head> + + <body> + + <div id="lucy-rigid_wrapper"> + + <div id="lucy-top" class="container_16 lucy-white_box_3d"> + + <div id="lucy-logo_box" class="grid_8"> + <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache Lucyâ¢"></a> + </div> <!-- lucy-logo_box --> + + <div #id="lucy-top_nav_box" class="grid_8"> + <div id="lucy-top_nav_bar" class="container_8"> + <ul> + <li><a href="http://www.apache.org/" title="Apache Software Foundation">Apache Software Foundation</a></li> + <li><a href="http://www.apache.org/licenses/" title="License">License</a></li> + <li><a href="http://www.apache.org/foundation/sponsorship.html" title="Sponsorship">Sponsorship</a></li> + <li><a href="http://www.apache.org/foundation/thanks.html" title="Thanks">Thanks</a></li> + <li><a href="http://www.apache.org/security/ " title="Security">Security</a></li> + </ul> + </div> <!-- lucy-top_nav_bar --> + <p><a href="http://www.apache.org/">Apache</a> » <a href="/">Lucy</a> » <a href="/docs/">Docs</a> » <a href="/docs/0.5.0/">0.5.0</a> » <a href="/docs/0.5.0/perl/">Perl</a> » <a href="/docs/0.5.0/perl/Lucy/">Lucy</a> » <a href="/docs/0.5.0/perl/Lucy/Analysis/">Analysis</a></p> + <form name="lucy-top_search_box" id="lucy-top_search_box" action="http://www.google.com/search" method="get"> + <input value="*.apache.org" name="sitesearch" type="hidden"/> + <input type="text" name="q" id="query" style="width:85%"> + <input type="submit" id="submit" value="Search"> + </form> + </div> <!-- lucy-top_nav_box --> + + <div class="clear"></div> + + </div> <!-- lucy-top --> + + <div id="lucy-main_content" class="container_16 lucy-white_box_3d"> + + <div class="grid_4" id="lucy-left_nav_box"> + <h6>About</h6> + <ul> + <li><a href="/">Welcome</a></li> + <li><a href="/clownfish.html">Clownfish</a></li> + <li><a href="/faq.html">FAQ</a></li> + <li><a href="/people.html">People</a></li> + </ul> + <h6>Resources</h6> + <ul> + <li><a href="/download.html">Download</a></li> + <li><a href="/mailing_lists.html">Mailing Lists</a></li> + <li><a href="/docs/">Documentation</a></li> + <li><a href="http://wiki.apache.org/lucy/">Wiki</a></li> + <li><a href="https://issues.apache.org/jira/browse/LUCY">Issue Tracker</a></li> + <li><a href="/version_control.html">Version Control</a></li> + </ul> + <h6>Related Projects</h6> + <ul> + <li><a href="http://lucene.apache.org/core/">Lucene</a></li> + <li><a href="http://dezi.org/">Dezi</a></li> + <li><a href="http://lucene.apache.org/solr/">Solr</a></li> + <li><a href="http://lucenenet.apache.org/">Lucene.NET</a></li> + <li><a href="http://lucene.apache.org/pylucene/">PyLucene</a></li> + </ul> + </div> <!-- lucy-left_nav_box --> + + <div id="lucy-main_content_box" class="grid_9"> + <div> +<a name='___top' class='dummyTopAnchor' ></a> + +<h2><a class='u' +name="NAME" +>NAME</a></h2> + +<p>Lucy::Analysis::StandardTokenizer - Split a string into tokens.</p> + +<h2><a class='u' +name="SYNOPSIS" +>SYNOPSIS</a></h2> + +<pre>my $tokenizer = Lucy::Analysis::StandardTokenizer->new; + +# Then... once you have a tokenizer, put it into a PolyAnalyzer: +my $polyanalyzer = Lucy::Analysis::PolyAnalyzer->new( + analyzers => [ $tokenizer, $normalizer, $stemmer ], );</pre> + +<h2><a class='u' +name="DESCRIPTION" +>DESCRIPTION</a></h2> + +<p>Generically, +“tokenizing” is a process of breaking up a string into an array of “tokens”. +For instance, +the string “three blind mice” might be tokenized into “three”, +“blind”, +“mice”.</p> + +<p>Lucy::Analysis::StandardTokenizer breaks up the text at the word boundaries defined in Unicode Standard Annex #29. +It then returns those words that contain alphabetic or numeric characters.</p> + +<h2><a class='u' +name="CONSTRUCTORS" +>CONSTRUCTORS</a></h2> + +<h3><a class='u' +name="new" +>new</a></h3> + +<pre>my $tokenizer = Lucy::Analysis::StandardTokenizer->new;</pre> + +<p>Constructor. +Takes no arguments.</p> + +<h2><a class='u' +name="METHODS" +>METHODS</a></h2> + +<h3><a class='u' +name="transform" +>transform</a></h3> + +<pre>my $inversion = $standard_tokenizer->transform($inversion);</pre> + +<p>Take a single <a href="../../Lucy/Analysis/Inversion.html" class="podlinkpod" +>Inversion</a> as input and returns an Inversion, +either the same one (presumably transformed in some way), +or a new one.</p> + +<ul> +<li><b>inversion</b> - An inversion.</li> +</ul> + +<h2><a class='u' +name="INHERITANCE" +>INHERITANCE</a></h2> + +<p>Lucy::Analysis::StandardTokenizer isa <a href="../../Lucy/Analysis/Analyzer.html" class="podlinkpod" +>Lucy::Analysis::Analyzer</a> isa Clownfish::Obj.</p> + +</div> + + </div> <!-- lucy-main_content_box --> + <div class="clear"></div> + + </div> <!-- lucy-main_content --> + + <div id="lucy-copyright" class="container_16"> + <p>Copyright © 2010-2015 The Apache Software Foundation, Licensed under the + <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>. + <br/> + Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache Lucy project logo are trademarks of The + Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their + respective owners. + </p> + </div> <!-- lucy-copyright --> + + </div> <!-- lucy-rigid_wrapper --> + + </body> +</html>
Added: websites/staging/lucy/trunk/content/docs/0.5.0/perl/Lucy/Analysis/Token.html ============================================================================== --- websites/staging/lucy/trunk/content/docs/0.5.0/perl/Lucy/Analysis/Token.html (added) +++ websites/staging/lucy/trunk/content/docs/0.5.0/perl/Lucy/Analysis/Token.html Wed Sep 28 12:07:48 2016 @@ -0,0 +1,242 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> +<html lang="en"> + <head> + <meta http-equiv="Content-Type" content="text/html;charset=UTF-8"> + <title>Lucy::Analysis::Token â Apache Lucy Documentation</title> + <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css"> + </head> + + <body> + + <div id="lucy-rigid_wrapper"> + + <div id="lucy-top" class="container_16 lucy-white_box_3d"> + + <div id="lucy-logo_box" class="grid_8"> + <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache Lucyâ¢"></a> + </div> <!-- lucy-logo_box --> + + <div #id="lucy-top_nav_box" class="grid_8"> + <div id="lucy-top_nav_bar" class="container_8"> + <ul> + <li><a href="http://www.apache.org/" title="Apache Software Foundation">Apache Software Foundation</a></li> + <li><a href="http://www.apache.org/licenses/" title="License">License</a></li> + <li><a href="http://www.apache.org/foundation/sponsorship.html" title="Sponsorship">Sponsorship</a></li> + <li><a href="http://www.apache.org/foundation/thanks.html" title="Thanks">Thanks</a></li> + <li><a href="http://www.apache.org/security/ " title="Security">Security</a></li> + </ul> + </div> <!-- lucy-top_nav_bar --> + <p><a href="http://www.apache.org/">Apache</a> » <a href="/">Lucy</a> » <a href="/docs/">Docs</a> » <a href="/docs/0.5.0/">0.5.0</a> » <a href="/docs/0.5.0/perl/">Perl</a> » <a href="/docs/0.5.0/perl/Lucy/">Lucy</a> » <a href="/docs/0.5.0/perl/Lucy/Analysis/">Analysis</a></p> + <form name="lucy-top_search_box" id="lucy-top_search_box" action="http://www.google.com/search" method="get"> + <input value="*.apache.org" name="sitesearch" type="hidden"/> + <input type="text" name="q" id="query" style="width:85%"> + <input type="submit" id="submit" value="Search"> + </form> + </div> <!-- lucy-top_nav_box --> + + <div class="clear"></div> + + </div> <!-- lucy-top --> + + <div id="lucy-main_content" class="container_16 lucy-white_box_3d"> + + <div class="grid_4" id="lucy-left_nav_box"> + <h6>About</h6> + <ul> + <li><a href="/">Welcome</a></li> + <li><a href="/clownfish.html">Clownfish</a></li> + <li><a href="/faq.html">FAQ</a></li> + <li><a href="/people.html">People</a></li> + </ul> + <h6>Resources</h6> + <ul> + <li><a href="/download.html">Download</a></li> + <li><a href="/mailing_lists.html">Mailing Lists</a></li> + <li><a href="/docs/">Documentation</a></li> + <li><a href="http://wiki.apache.org/lucy/">Wiki</a></li> + <li><a href="https://issues.apache.org/jira/browse/LUCY">Issue Tracker</a></li> + <li><a href="/version_control.html">Version Control</a></li> + </ul> + <h6>Related Projects</h6> + <ul> + <li><a href="http://lucene.apache.org/core/">Lucene</a></li> + <li><a href="http://dezi.org/">Dezi</a></li> + <li><a href="http://lucene.apache.org/solr/">Solr</a></li> + <li><a href="http://lucenenet.apache.org/">Lucene.NET</a></li> + <li><a href="http://lucene.apache.org/pylucene/">PyLucene</a></li> + </ul> + </div> <!-- lucy-left_nav_box --> + + <div id="lucy-main_content_box" class="grid_9"> + <div> +<a name='___top' class='dummyTopAnchor' ></a> + +<h2><a class='u' +name="NAME" +>NAME</a></h2> + +<p>Lucy::Analysis::Token - Unit of text.</p> + +<h2><a class='u' +name="SYNOPSIS" +>SYNOPSIS</a></h2> + +<pre> my $token = Lucy::Analysis::Token->new( + text => 'blind', + start_offset => 8, + end_offset => 13, + ); + + $token->set_text('mice');</pre> + +<h2><a class='u' +name="DESCRIPTION" +>DESCRIPTION</a></h2> + +<p>Token is the fundamental unit used by Apache Lucy’s Analyzer subclasses. +Each Token has 5 attributes: <code>text</code>, +<code>start_offset</code>, +<code>end_offset</code>, +<code>boost</code>, +and <code>pos_inc</code>.</p> + +<p>The <code>text</code> attribute is a Unicode string encoded as UTF-8.</p> + +<p><code>start_offset</code> is the start point of the token text, +measured in Unicode code points from the top of the stored field; <code>end_offset</code> delimits the corresponding closing boundary. +<code>start_offset</code> and <code>end_offset</code> locate the Token within a larger context, +even if the Token’s text attribute gets modified – by stemming, +for instance. +The Token for “beating” in the text “beating a dead horse” begins life with a start_offset of 0 and an end_offset of 7; after stemming, +the text is “beat”, +but the start_offset is still 0 and the end_offset is still 7. +This allows “beating” to be highlighted correctly after a search matches “beat”.</p> + +<p><code>boost</code> is a per-token weight. +Use this when you want to assign more or less importance to a particular token, +as you might for emboldened text within an HTML document, +for example. +(Note: The field this token belongs to must be spec’d to use a posting of type RichPosting.)</p> + +<p><code>pos_inc</code> is the POSition INCrement, +measured in Tokens. +This attribute, +which defaults to 1, +is a an advanced tool for manipulating phrase matching. +Ordinarily, +Tokens are assigned consecutive position numbers: 0, +1, +and 2 for <code>"three blind mice"</code>. +However, +if you set the position increment for “blind” to, +say, +1000, +then the three tokens will end up assigned to positions 0, +1, +and 1001 – and will no longer produce a phrase match for the query <code>"three blind mice"</code>.</p> + +<h2><a class='u' +name="CONSTRUCTORS" +>CONSTRUCTORS</a></h2> + +<h3><a class='u' +name="new" +>new</a></h3> + +<pre>my $token = Lucy::Analysis::Token->new( + text => $text, # required + start_offset => $start_offset, # required + end_offset => $end_offset, # required + boost => 1.0, # optional + pos_inc => 1, # optional +);</pre> + +<ul> +<li><b>text</b> - A string.</li> + +<li><b>start_offset</b> - Start offset into the original document in Unicode code points.</li> + +<li><b>start_offset</b> - End offset into the original document in Unicode code points.</li> + +<li><b>boost</b> - Per-token weight.</li> + +<li><b>pos_inc</b> - Position increment for phrase matching.</li> +</ul> + +<h2><a class='u' +name="METHODS" +>METHODS</a></h2> + +<h3><a class='u' +name="get_text" +>get_text</a></h3> + +<pre>my $text = $token->get_text;</pre> + +<p>Get the token's text.</p> + +<h3><a class='u' +name="set_text" +>set_text</a></h3> + +<pre>$token->set_text($text);</pre> + +<p>Set the token's text.</p> + +<h3><a class='u' +name="get_start_offset" +>get_start_offset</a></h3> + +<pre>my $int = $token->get_start_offset();</pre> + +<h3><a class='u' +name="get_end_offset" +>get_end_offset</a></h3> + +<pre>my $int = $token->get_end_offset();</pre> + +<h3><a class='u' +name="get_boost" +>get_boost</a></h3> + +<pre>my $float = $token->get_boost();</pre> + +<h3><a class='u' +name="get_pos_inc" +>get_pos_inc</a></h3> + +<pre>my $int = $token->get_pos_inc();</pre> + +<h3><a class='u' +name="get_len" +>get_len</a></h3> + +<pre>my $int = $token->get_len();</pre> + +<h2><a class='u' +name="INHERITANCE" +>INHERITANCE</a></h2> + +<p>Lucy::Analysis::Token isa Clownfish::Obj.</p> + +</div> + + </div> <!-- lucy-main_content_box --> + <div class="clear"></div> + + </div> <!-- lucy-main_content --> + + <div id="lucy-copyright" class="container_16"> + <p>Copyright © 2010-2015 The Apache Software Foundation, Licensed under the + <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>. + <br/> + Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache Lucy project logo are trademarks of The + Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their + respective owners. + </p> + </div> <!-- lucy-copyright --> + + </div> <!-- lucy-rigid_wrapper --> + + </body> +</html> Added: websites/staging/lucy/trunk/content/docs/0.5.0/perl/Lucy/Docs/Cookbook.html ============================================================================== --- websites/staging/lucy/trunk/content/docs/0.5.0/perl/Lucy/Docs/Cookbook.html (added) +++ websites/staging/lucy/trunk/content/docs/0.5.0/perl/Lucy/Docs/Cookbook.html Wed Sep 28 12:07:48 2016 @@ -0,0 +1,140 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> +<html lang="en"> + <head> + <meta http-equiv="Content-Type" content="text/html;charset=UTF-8"> + <title>Lucy::Docs::Cookbook â Apache Lucy Documentation</title> + <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css"> + </head> + + <body> + + <div id="lucy-rigid_wrapper"> + + <div id="lucy-top" class="container_16 lucy-white_box_3d"> + + <div id="lucy-logo_box" class="grid_8"> + <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache Lucyâ¢"></a> + </div> <!-- lucy-logo_box --> + + <div #id="lucy-top_nav_box" class="grid_8"> + <div id="lucy-top_nav_bar" class="container_8"> + <ul> + <li><a href="http://www.apache.org/" title="Apache Software Foundation">Apache Software Foundation</a></li> + <li><a href="http://www.apache.org/licenses/" title="License">License</a></li> + <li><a href="http://www.apache.org/foundation/sponsorship.html" title="Sponsorship">Sponsorship</a></li> + <li><a href="http://www.apache.org/foundation/thanks.html" title="Thanks">Thanks</a></li> + <li><a href="http://www.apache.org/security/ " title="Security">Security</a></li> + </ul> + </div> <!-- lucy-top_nav_bar --> + <p><a href="http://www.apache.org/">Apache</a> » <a href="/">Lucy</a> » <a href="/docs/">Docs</a> » <a href="/docs/0.5.0/">0.5.0</a> » <a href="/docs/0.5.0/perl/">Perl</a> » <a href="/docs/0.5.0/perl/Lucy/">Lucy</a> » <a href="/docs/0.5.0/perl/Lucy/Docs/">Docs</a></p> + <form name="lucy-top_search_box" id="lucy-top_search_box" action="http://www.google.com/search" method="get"> + <input value="*.apache.org" name="sitesearch" type="hidden"/> + <input type="text" name="q" id="query" style="width:85%"> + <input type="submit" id="submit" value="Search"> + </form> + </div> <!-- lucy-top_nav_box --> + + <div class="clear"></div> + + </div> <!-- lucy-top --> + + <div id="lucy-main_content" class="container_16 lucy-white_box_3d"> + + <div class="grid_4" id="lucy-left_nav_box"> + <h6>About</h6> + <ul> + <li><a href="/">Welcome</a></li> + <li><a href="/clownfish.html">Clownfish</a></li> + <li><a href="/faq.html">FAQ</a></li> + <li><a href="/people.html">People</a></li> + </ul> + <h6>Resources</h6> + <ul> + <li><a href="/download.html">Download</a></li> + <li><a href="/mailing_lists.html">Mailing Lists</a></li> + <li><a href="/docs/">Documentation</a></li> + <li><a href="http://wiki.apache.org/lucy/">Wiki</a></li> + <li><a href="https://issues.apache.org/jira/browse/LUCY">Issue Tracker</a></li> + <li><a href="/version_control.html">Version Control</a></li> + </ul> + <h6>Related Projects</h6> + <ul> + <li><a href="http://lucene.apache.org/core/">Lucene</a></li> + <li><a href="http://dezi.org/">Dezi</a></li> + <li><a href="http://lucene.apache.org/solr/">Solr</a></li> + <li><a href="http://lucenenet.apache.org/">Lucene.NET</a></li> + <li><a href="http://lucene.apache.org/pylucene/">PyLucene</a></li> + </ul> + </div> <!-- lucy-left_nav_box --> + + <div id="lucy-main_content_box" class="grid_9"> + <div> +<a name='___top' class='dummyTopAnchor' ></a> + +<h2><a class='u' +name="NAME" +>NAME</a></h2> + +<p>Lucy::Docs::Cookbook - Apache Lucy recipes</p> + +<h2><a class='u' +name="DESCRIPTION" +>DESCRIPTION</a></h2> + +<p>The Cookbook provides thematic documentation covering some of Apache Lucy’s more sophisticated features. +For a step-by-step introduction to Lucy, +see <a href="../../Lucy/Docs/Tutorial.html" class="podlinkpod" +>Tutorial</a>.</p> + +<h3><a class='u' +name="Chapters" +>Chapters</a></h3> + +<ul> +<li><a href="../../Lucy/Docs/Cookbook/FastUpdates.html" class="podlinkpod" +>FastUpdates</a> - While index updates are fast on average, +worst-case update performance may be significantly slower. +To make index updates consistently quick, +we must manually intervene to control the process of index segment consolidation.</li> + +<li><a href="../../Lucy/Docs/Cookbook/CustomQuery.html" class="podlinkpod" +>CustomQuery</a> - Explore Lucy’s support for custom query types by creating a “PrefixQuery” class to handle trailing wildcards.</li> + +<li><a href="../../Lucy/Docs/Cookbook/CustomQueryParser.html" class="podlinkpod" +>CustomQueryParser</a> - Define your own custom search query syntax using <a href="../../Lucy/Search/QueryParser.html" class="podlinkpod" +>QueryParser</a> and Parse::RecDescent.</li> +</ul> + +<h3><a class='u' +name="Materials" +>Materials</a></h3> + +<p>Some of the recipes in the Cookbook reference the completed <a href="../../Lucy/Docs/Tutorial.html" class="podlinkpod" +>Tutorial</a> application. +These materials can be found in the <code>sample</code> directory at the root of the Lucy distribution:</p> + +<pre>sample/indexer.pl # indexing app +sample/search.cgi # search app +sample/us_constitution # corpus</pre> + +</div> + + </div> <!-- lucy-main_content_box --> + <div class="clear"></div> + + </div> <!-- lucy-main_content --> + + <div id="lucy-copyright" class="container_16"> + <p>Copyright © 2010-2015 The Apache Software Foundation, Licensed under the + <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>. + <br/> + Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache Lucy project logo are trademarks of The + Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their + respective owners. + </p> + </div> <!-- lucy-copyright --> + + </div> <!-- lucy-rigid_wrapper --> + + </body> +</html> Added: websites/staging/lucy/trunk/content/docs/0.5.0/perl/Lucy/Docs/Cookbook/CustomQuery.html ============================================================================== --- websites/staging/lucy/trunk/content/docs/0.5.0/perl/Lucy/Docs/Cookbook/CustomQuery.html (added) +++ websites/staging/lucy/trunk/content/docs/0.5.0/perl/Lucy/Docs/Cookbook/CustomQuery.html Wed Sep 28 12:07:48 2016 @@ -0,0 +1,409 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> +<html lang="en"> + <head> + <meta http-equiv="Content-Type" content="text/html;charset=UTF-8"> + <title>Lucy::Docs::Cookbook::CustomQuery â Apache Lucy Documentation</title> + <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css"> + </head> + + <body> + + <div id="lucy-rigid_wrapper"> + + <div id="lucy-top" class="container_16 lucy-white_box_3d"> + + <div id="lucy-logo_box" class="grid_8"> + <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache Lucyâ¢"></a> + </div> <!-- lucy-logo_box --> + + <div #id="lucy-top_nav_box" class="grid_8"> + <div id="lucy-top_nav_bar" class="container_8"> + <ul> + <li><a href="http://www.apache.org/" title="Apache Software Foundation">Apache Software Foundation</a></li> + <li><a href="http://www.apache.org/licenses/" title="License">License</a></li> + <li><a href="http://www.apache.org/foundation/sponsorship.html" title="Sponsorship">Sponsorship</a></li> + <li><a href="http://www.apache.org/foundation/thanks.html" title="Thanks">Thanks</a></li> + <li><a href="http://www.apache.org/security/ " title="Security">Security</a></li> + </ul> + </div> <!-- lucy-top_nav_bar --> + <p><a href="http://www.apache.org/">Apache</a> » <a href="/">Lucy</a> » <a href="/docs/">Docs</a> » <a href="/docs/0.5.0/">0.5.0</a> » <a href="/docs/0.5.0/perl/">Perl</a> » <a href="/docs/0.5.0/perl/Lucy/">Lucy</a> » <a href="/docs/0.5.0/perl/Lucy/Docs/">Docs</a> » <a href="/docs/0.5.0/perl/Lucy/Docs/Cookbook/">Cookbook</a></p> + <form name="lucy-top_search_box" id="lucy-top_search_box" action="http://www.google.com/search" method="get"> + <input value="*.apache.org" name="sitesearch" type="hidden"/> + <input type="text" name="q" id="query" style="width:85%"> + <input type="submit" id="submit" value="Search"> + </form> + </div> <!-- lucy-top_nav_box --> + + <div class="clear"></div> + + </div> <!-- lucy-top --> + + <div id="lucy-main_content" class="container_16 lucy-white_box_3d"> + + <div class="grid_4" id="lucy-left_nav_box"> + <h6>About</h6> + <ul> + <li><a href="/">Welcome</a></li> + <li><a href="/clownfish.html">Clownfish</a></li> + <li><a href="/faq.html">FAQ</a></li> + <li><a href="/people.html">People</a></li> + </ul> + <h6>Resources</h6> + <ul> + <li><a href="/download.html">Download</a></li> + <li><a href="/mailing_lists.html">Mailing Lists</a></li> + <li><a href="/docs/">Documentation</a></li> + <li><a href="http://wiki.apache.org/lucy/">Wiki</a></li> + <li><a href="https://issues.apache.org/jira/browse/LUCY">Issue Tracker</a></li> + <li><a href="/version_control.html">Version Control</a></li> + </ul> + <h6>Related Projects</h6> + <ul> + <li><a href="http://lucene.apache.org/core/">Lucene</a></li> + <li><a href="http://dezi.org/">Dezi</a></li> + <li><a href="http://lucene.apache.org/solr/">Solr</a></li> + <li><a href="http://lucenenet.apache.org/">Lucene.NET</a></li> + <li><a href="http://lucene.apache.org/pylucene/">PyLucene</a></li> + </ul> + </div> <!-- lucy-left_nav_box --> + + <div id="lucy-main_content_box" class="grid_9"> + <div> +<a name='___top' class='dummyTopAnchor' ></a> + +<h2><a class='u' +name="NAME" +>NAME</a></h2> + +<p>Lucy::Docs::Cookbook::CustomQuery - Sample subclass of Query</p> + +<h2><a class='u' +name="DESCRIPTION" +>DESCRIPTION</a></h2> + +<p>Explore Apache Lucy’s support for custom query types by creating a “PrefixQuery” class to handle trailing wildcards.</p> + +<pre>my $prefix_query = PrefixQuery->new( + field => 'content', + query_string => 'foo*', +); +my $hits = $searcher->hits( query => $prefix_query ); +...</pre> + +<h3><a class='u' +name="Query,_Compiler,_and_Matcher" +>Query, +Compiler, +and Matcher</a></h3> + +<p>To add support for a new query type, +we need three classes: a Query, +a Compiler, +and a Matcher.</p> + +<ul> +<li>PrefixQuery - a subclass of <a href="../../../Lucy/Search/Query.html" class="podlinkpod" +>Query</a>, +and the only class that client code will deal with directly.</li> + +<li>PrefixCompiler - a subclass of <a href="../../../Lucy/Search/Compiler.html" class="podlinkpod" +>Compiler</a>, +whose primary role is to compile a PrefixQuery to a PrefixMatcher.</li> + +<li>PrefixMatcher - a subclass of <a href="../../../Lucy/Search/Matcher.html" class="podlinkpod" +>Matcher</a>, +which does the heavy lifting: it applies the query to individual documents and assigns a score to each match.</li> +</ul> + +<p>The PrefixQuery class on its own isn’t enough because a Query object’s role is limited to expressing an abstract specification for the search. +A Query is basically nothing but metadata; execution is left to the Query’s companion Compiler and Matcher.</p> + +<p>Here’s a simplified sketch illustrating how a Searcher’s hits() method ties together the three classes.</p> + +<pre>sub hits { + my ( $self, $query ) = @_; + my $compiler = $query->make_compiler( + searcher => $self, + boost => $query->get_boost, + ); + my $matcher = $compiler->make_matcher( + reader => $self->get_reader, + need_score => 1, + ); + my @hits = $matcher->capture_hits; + return \@hits; +}</pre> + +<h4><a class='u' +name="PrefixQuery" +>PrefixQuery</a></h4> + +<p>Our PrefixQuery class will have two attributes: a query string and a field name.</p> + +<pre>package PrefixQuery; +use base qw( Lucy::Search::Query ); +use Carp; +use Scalar::Util qw( blessed ); + +# Inside-out member vars and hand-rolled accessors. +my %query_string; +my %field; +sub get_query_string { my $self = shift; return $query_string{$$self} } +sub get_field { my $self = shift; return $field{$$self} }</pre> + +<p>PrefixQuery’s constructor collects and validates the attributes.</p> + +<pre>sub new { + my ( $class, %args ) = @_; + my $query_string = delete $args{query_string}; + my $field = delete $args{field}; + my $self = $class->SUPER::new(%args); + confess("'query_string' param is required") + unless defined $query_string; + confess("Invalid query_string: '$query_string'") + unless $query_string =~ /\*\s*$/; + confess("'field' param is required") + unless defined $field; + $query_string{$$self} = $query_string; + $field{$$self} = $field; + return $self; +}</pre> + +<p>Since this is an inside-out class, +we’ll need a destructor:</p> + +<pre>sub DESTROY { + my $self = shift; + delete $query_string{$$self}; + delete $field{$$self}; + $self->SUPER::DESTROY; +}</pre> + +<p>The equals() method determines whether two Queries are logically equivalent:</p> + +<pre>sub equals { + my ( $self, $other ) = @_; + return 0 unless blessed($other); + return 0 unless $other->isa("PrefixQuery"); + return 0 unless $field{$$self} eq $field{$$other}; + return 0 unless $query_string{$$self} eq $query_string{$$other}; + return 1; +}</pre> + +<p>The last thing we’ll need is a make_compiler() factory method which kicks out a subclass of <a href="../../../Lucy/Search/Compiler.html" class="podlinkpod" +>Compiler</a>.</p> + +<pre>sub make_compiler { + my ( $self, %args ) = @_; + my $subordinate = delete $args{subordinate}; + my $compiler = PrefixCompiler->new( %args, parent => $self ); + $compiler->normalize unless $subordinate; + return $compiler; +}</pre> + +<h4><a class='u' +name="PrefixCompiler" +>PrefixCompiler</a></h4> + +<p>PrefixQuery’s make_compiler() method will be called internally at search-time by objects which subclass <a href="../../../Lucy/Search/Searcher.html" class="podlinkpod" +>Searcher</a> – such as <a href="../../../Lucy/Search/IndexSearcher.html" class="podlinkpod" +>IndexSearchers</a>.</p> + +<p>A Searcher is associated with a particular collection of documents. +These documents may all reside in one index, +as with IndexSearcher, +or they may be spread out across multiple indexes on one or more machines, +as with LucyX::Remote::ClusterSearcher.</p> + +<p>Searcher objects have access to certain statistical information about the collections they represent; for instance, +a Searcher can tell you how many documents are in the collection…</p> + +<pre>my $maximum_number_of_docs_in_collection = $searcher->doc_max;</pre> + +<p>… or how many documents a specific term appears in:</p> + +<pre>my $term_appears_in_this_many_docs = $searcher->doc_freq( + field => 'content', + term => 'foo', +);</pre> + +<p>Such information can be used by sophisticated Compiler implementations to assign more or less heft to individual queries or sub-queries. +However, +we’re not going to bother with weighting for this demo; we’ll just assign a fixed score of 1.0 to each matching document.</p> + +<p>We don’t need to write a constructor, +as it will suffice to inherit new() from Lucy::Search::Compiler. +The only method we need to implement for PrefixCompiler is make_matcher().</p> + +<pre>package PrefixCompiler; +use base qw( Lucy::Search::Compiler ); + +sub make_matcher { + my ( $self, %args ) = @_; + my $seg_reader = $args{reader}; + + # Retrieve low-level components LexiconReader and PostingListReader. + my $lex_reader + = $seg_reader->obtain("Lucy::Index::LexiconReader"); + my $plist_reader + = $seg_reader->obtain("Lucy::Index::PostingListReader"); + + # Acquire a Lexicon and seek it to our query string. + my $substring = $self->get_parent->get_query_string; + $substring =~ s/\*.\s*$//; + my $field = $self->get_parent->get_field; + my $lexicon = $lex_reader->lexicon( field => $field ); + return unless $lexicon; + $lexicon->seek($substring); + + # Accumulate PostingLists for each matching term. + my @posting_lists; + while ( defined( my $term = $lexicon->get_term ) ) { + last unless $term =~ /^\Q$substring/; + my $posting_list = $plist_reader->posting_list( + field => $field, + term => $term, + ); + if ($posting_list) { + push @posting_lists, $posting_list; + } + last unless $lexicon->next; + } + return unless @posting_lists; + + return PrefixMatcher->new( posting_lists => \@posting_lists ); +}</pre> + +<p>PrefixCompiler gets access to a <a href="../../../Lucy/Index/SegReader.html" class="podlinkpod" +>SegReader</a> object when make_matcher() gets called. +From the SegReader and its sub-components <a href="../../../Lucy/Index/LexiconReader.html" class="podlinkpod" +>LexiconReader</a> and <a href="../../../Lucy/Index/PostingListReader.html" class="podlinkpod" +>PostingListReader</a>, +we acquire a <a href="../../../Lucy/Index/Lexicon.html" class="podlinkpod" +>Lexicon</a>, +scan through the Lexicon’s unique terms, +and acquire a <a href="../../../Lucy/Index/PostingList.html" class="podlinkpod" +>PostingList</a> for each term that matches our prefix.</p> + +<p>Each of these PostingList objects represents a set of documents which match the query.</p> + +<h4><a class='u' +name="PrefixMatcher" +>PrefixMatcher</a></h4> + +<p>The Matcher subclass is the most involved.</p> + +<pre>package PrefixMatcher; +use base qw( Lucy::Search::Matcher ); + +# Inside-out member vars. +my %doc_ids; +my %tick; + +sub new { + my ( $class, %args ) = @_; + my $posting_lists = delete $args{posting_lists}; + my $self = $class->SUPER::new(%args); + + # Cheesy but simple way of interleaving PostingList doc sets. + my %all_doc_ids; + for my $posting_list (@$posting_lists) { + while ( my $doc_id = $posting_list->next ) { + $all_doc_ids{$doc_id} = undef; + } + } + my @doc_ids = sort { $a <=> $b } keys %all_doc_ids; + $doc_ids{$$self} = \@doc_ids; + + # Track our position within the array of doc ids. + $tick{$$self} = -1; + + return $self; +} + +sub DESTROY { + my $self = shift; + delete $doc_ids{$$self}; + delete $tick{$$self}; + $self->SUPER::DESTROY; +}</pre> + +<p>The doc ids must be in order, +or some will be ignored; hence the <code>sort</code> above.</p> + +<p>In addition to the constructor and destructor, +there are three methods that must be overridden.</p> + +<p>next() advances the Matcher to the next valid matching doc.</p> + +<pre>sub next { + my $self = shift; + my $doc_ids = $doc_ids{$$self}; + my $tick = ++$tick{$$self}; + return 0 if $tick >= scalar @$doc_ids; + return $doc_ids->[$tick]; +}</pre> + +<p>get_doc_id() returns the current document id, +or 0 if the Matcher is exhausted. +(<a href="../../../Lucy/Docs/DocIDs.html" class="podlinkpod" +>Document numbers</a> start at 1, +so 0 is a sentinel.)</p> + +<pre>sub get_doc_id { + my $self = shift; + my $tick = $tick{$$self}; + my $doc_ids = $doc_ids{$$self}; + return $tick < scalar @$doc_ids ? $doc_ids->[$tick] : 0; +}</pre> + +<p>score() conveys the relevance score of the current match. +We’ll just return a fixed score of 1.0:</p> + +<pre>sub score { 1.0 }</pre> + +<h3><a class='u' +name="Usage" +>Usage</a></h3> + +<p>To get a basic feel for PrefixQuery, +insert the FlatQueryParser module described in <a href="../../../Lucy/Docs/Cookbook/CustomQueryParser.html" class="podlinkpod" +>CustomQueryParser</a> (which supports PrefixQuery) into the search.cgi sample app.</p> + +<pre>my $parser = FlatQueryParser->new( schema => $searcher->get_schema ); +my $query = $parser->parse($q);</pre> + +<p>If you’re planning on using PrefixQuery in earnest, +though, +you may want to change up analyzers to avoid stemming, +because stemming – another approach to prefix conflation – is not perfectly compatible with prefix searches.</p> + +<pre># Polyanalyzer with no SnowballStemmer. +my $analyzer = Lucy::Analysis::PolyAnalyzer->new( + analyzers => [ + Lucy::Analysis::StandardTokenizer->new, + Lucy::Analysis::Normalizer->new, + ], +);</pre> + +</div> + + </div> <!-- lucy-main_content_box --> + <div class="clear"></div> + + </div> <!-- lucy-main_content --> + + <div id="lucy-copyright" class="container_16"> + <p>Copyright © 2010-2015 The Apache Software Foundation, Licensed under the + <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>. + <br/> + Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache Lucy project logo are trademarks of The + Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their + respective owners. + </p> + </div> <!-- lucy-copyright --> + + </div> <!-- lucy-rigid_wrapper --> + + </body> +</html> Added: websites/staging/lucy/trunk/content/docs/0.5.0/perl/Lucy/Docs/Cookbook/CustomQueryParser.html ============================================================================== --- websites/staging/lucy/trunk/content/docs/0.5.0/perl/Lucy/Docs/Cookbook/CustomQueryParser.html (added) +++ websites/staging/lucy/trunk/content/docs/0.5.0/perl/Lucy/Docs/Cookbook/CustomQueryParser.html Wed Sep 28 12:07:48 2016 @@ -0,0 +1,327 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> +<html lang="en"> + <head> + <meta http-equiv="Content-Type" content="text/html;charset=UTF-8"> + <title>Lucy::Docs::Cookbook::CustomQueryParser â Apache Lucy Documentation</title> + <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css"> + </head> + + <body> + + <div id="lucy-rigid_wrapper"> + + <div id="lucy-top" class="container_16 lucy-white_box_3d"> + + <div id="lucy-logo_box" class="grid_8"> + <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache Lucyâ¢"></a> + </div> <!-- lucy-logo_box --> + + <div #id="lucy-top_nav_box" class="grid_8"> + <div id="lucy-top_nav_bar" class="container_8"> + <ul> + <li><a href="http://www.apache.org/" title="Apache Software Foundation">Apache Software Foundation</a></li> + <li><a href="http://www.apache.org/licenses/" title="License">License</a></li> + <li><a href="http://www.apache.org/foundation/sponsorship.html" title="Sponsorship">Sponsorship</a></li> + <li><a href="http://www.apache.org/foundation/thanks.html" title="Thanks">Thanks</a></li> + <li><a href="http://www.apache.org/security/ " title="Security">Security</a></li> + </ul> + </div> <!-- lucy-top_nav_bar --> + <p><a href="http://www.apache.org/">Apache</a> » <a href="/">Lucy</a> » <a href="/docs/">Docs</a> » <a href="/docs/0.5.0/">0.5.0</a> » <a href="/docs/0.5.0/perl/">Perl</a> » <a href="/docs/0.5.0/perl/Lucy/">Lucy</a> » <a href="/docs/0.5.0/perl/Lucy/Docs/">Docs</a> » <a href="/docs/0.5.0/perl/Lucy/Docs/Cookbook/">Cookbook</a></p> + <form name="lucy-top_search_box" id="lucy-top_search_box" action="http://www.google.com/search" method="get"> + <input value="*.apache.org" name="sitesearch" type="hidden"/> + <input type="text" name="q" id="query" style="width:85%"> + <input type="submit" id="submit" value="Search"> + </form> + </div> <!-- lucy-top_nav_box --> + + <div class="clear"></div> + + </div> <!-- lucy-top --> + + <div id="lucy-main_content" class="container_16 lucy-white_box_3d"> + + <div class="grid_4" id="lucy-left_nav_box"> + <h6>About</h6> + <ul> + <li><a href="/">Welcome</a></li> + <li><a href="/clownfish.html">Clownfish</a></li> + <li><a href="/faq.html">FAQ</a></li> + <li><a href="/people.html">People</a></li> + </ul> + <h6>Resources</h6> + <ul> + <li><a href="/download.html">Download</a></li> + <li><a href="/mailing_lists.html">Mailing Lists</a></li> + <li><a href="/docs/">Documentation</a></li> + <li><a href="http://wiki.apache.org/lucy/">Wiki</a></li> + <li><a href="https://issues.apache.org/jira/browse/LUCY">Issue Tracker</a></li> + <li><a href="/version_control.html">Version Control</a></li> + </ul> + <h6>Related Projects</h6> + <ul> + <li><a href="http://lucene.apache.org/core/">Lucene</a></li> + <li><a href="http://dezi.org/">Dezi</a></li> + <li><a href="http://lucene.apache.org/solr/">Solr</a></li> + <li><a href="http://lucenenet.apache.org/">Lucene.NET</a></li> + <li><a href="http://lucene.apache.org/pylucene/">PyLucene</a></li> + </ul> + </div> <!-- lucy-left_nav_box --> + + <div id="lucy-main_content_box" class="grid_9"> + <div> +<a name='___top' class='dummyTopAnchor' ></a> + +<h2><a class='u' +name="NAME" +>NAME</a></h2> + +<p>Lucy::Docs::Cookbook::CustomQueryParser - Sample subclass of QueryParser.</p> + +<h2><a class='u' +name="DESCRIPTION" +>DESCRIPTION</a></h2> + +<p>Implement a custom search query language using a subclass of <a href="../../../Lucy/Search/QueryParser.html" class="podlinkpod" +>QueryParser</a>.</p> + +<h3><a class='u' +name="The_language" +>The language</a></h3> + +<p>At first, +our query language will support only simple term queries and phrases delimited by double quotes. +For simplicity’s sake, +it will not support parenthetical groupings, +boolean operators, +or prepended plus/minus. +The results for all subqueries will be unioned together – i.e. +joined using an OR – which is usually the best approach for small-to-medium-sized document collections.</p> + +<p>Later, +we’ll add support for trailing wildcards.</p> + +<h3><a class='u' +name="Single-field_parser" +>Single-field parser</a></h3> + +<p>Our initial parser implentation will generate queries against a single fixed field, +“content”, +and it will analyze text using a fixed choice of English EasyAnalyzer. +We won’t subclass Lucy::Search::QueryParser just yet.</p> + +<pre>package FlatQueryParser; +use Lucy::Search::TermQuery; +use Lucy::Search::PhraseQuery; +use Lucy::Search::ORQuery; +use Carp; + +sub new { + my $analyzer = Lucy::Analysis::EasyAnalyzer->new( + language => 'en', + ); + return bless { + field => 'content', + analyzer => $analyzer, + }, __PACKAGE__; +}</pre> + +<p>Some private helper subs for creating TermQuery and PhraseQuery objects will help keep the size of our main parse() subroutine down:</p> + +<pre>sub _make_term_query { + my ( $self, $term ) = @_; + return Lucy::Search::TermQuery->new( + field => $self->{field}, + term => $term, + ); +} + +sub _make_phrase_query { + my ( $self, $terms ) = @_; + return Lucy::Search::PhraseQuery->new( + field => $self->{field}, + terms => $terms, + ); +}</pre> + +<p>Our private _tokenize() method treats double-quote delimited material as a single token and splits on whitespace everywhere else.</p> + +<pre>sub _tokenize { + my ( $self, $query_string ) = @_; + my @tokens; + while ( length $query_string ) { + if ( $query_string =~ s/^\s+// ) { + next; # skip whitespace + } + elsif ( $query_string =~ s/^("[^"]*(?:"|$))// ) { + push @tokens, $1; # double-quoted phrase + } + else { + $query_string =~ s/(\S+)//; + push @tokens, $1; # single word + } + } + return \@tokens; +}</pre> + +<p>The main parsing routine creates an array of tokens by calling _tokenize(), +runs the tokens through through the EasyAnalyzer, +creates TermQuery or PhraseQuery objects according to how many tokens emerge from the EasyAnalyzer’s split() method, +and adds each of the sub-queries to the primary ORQuery.</p> + +<pre>sub parse { + my ( $self, $query_string ) = @_; + my $tokens = $self->_tokenize($query_string); + my $analyzer = $self->{analyzer}; + my $or_query = Lucy::Search::ORQuery->new; + + for my $token (@$tokens) { + if ( $token =~ s/^"// ) { + $token =~ s/"$//; + my $terms = $analyzer->split($token); + my $query = $self->_make_phrase_query($terms); + $or_query->add_child($phrase_query); + } + else { + my $terms = $analyzer->split($token); + if ( @$terms == 1 ) { + my $query = $self->_make_term_query( $terms->[0] ); + $or_query->add_child($query); + } + elsif ( @$terms > 1 ) { + my $query = $self->_make_phrase_query($terms); + $or_query->add_child($query); + } + } + } + + return $or_query; +}</pre> + +<h3><a class='u' +name="Multi-field_parser" +>Multi-field parser</a></h3> + +<p>Most often, +the end user will want their search query to match not only a single ‘content’ field, +but also ‘title’ and so on. +To make that happen, +we have to turn queries such as this…</p> + +<pre>foo AND NOT bar</pre> + +<p>… into the logical equivalent of this:</p> + +<pre>(title:foo OR content:foo) AND NOT (title:bar OR content:bar)</pre> + +<p>Rather than continue with our own from-scratch parser class and write the routines to accomplish that expansion, +we’re now going to subclass Lucy::Search::QueryParser and take advantage of some of its existing methods.</p> + +<p>Our first parser implementation had the “content” field name and the choice of English EasyAnalyzer hard-coded for simplicity, +but we don’t need to do that once we subclass Lucy::Search::QueryParser. +QueryParser’s constructor – which we will inherit, +allowing us to eliminate our own constructor – requires a Schema which conveys field and Analyzer information, +so we can just defer to that.</p> + +<pre>package FlatQueryParser; +use base qw( Lucy::Search::QueryParser ); +use Lucy::Search::TermQuery; +use Lucy::Search::PhraseQuery; +use Lucy::Search::ORQuery; +use PrefixQuery; +use Carp; + +# Inherit new()</pre> + +<p>We’re also going to jettison our _make_term_query() and _make_phrase_query() helper subs and chop our parse() subroutine way down. +Our revised parse() routine will generate Lucy::Search::LeafQuery objects instead of TermQueries and PhraseQueries:</p> + +<pre>sub parse { + my ( $self, $query_string ) = @_; + my $tokens = $self->_tokenize($query_string); + my $or_query = Lucy::Search::ORQuery->new; + for my $token (@$tokens) { + my $leaf_query = Lucy::Search::LeafQuery->new( text => $token ); + $or_query->add_child($leaf_query); + } + return $self->expand($or_query); +}</pre> + +<p>The magic happens in QueryParser’s expand() method, +which walks the ORQuery object we supply to it looking for LeafQuery objects, +and calls expand_leaf() for each one it finds. +expand_leaf() performs field-specific analysis, +decides whether each query should be a TermQuery or a PhraseQuery, +and if multiple fields are required, +creates an ORQuery which mults out e.g. +<code>foo</code> into <code>(title:foo OR content:foo)</code>.</p> + +<h3><a class='u' +name="Extending_the_query_language" +>Extending the query language</a></h3> + +<p>To add support for trailing wildcards to our query language, +we need to override expand_leaf() to accommodate PrefixQuery, +while deferring to the parent class implementation on TermQuery and PhraseQuery.</p> + +<pre>sub expand_leaf { + my ( $self, $leaf_query ) = @_; + my $text = $leaf_query->get_text; + if ( $text =~ /\*$/ ) { + my $or_query = Lucy::Search::ORQuery->new; + for my $field ( @{ $self->get_fields } ) { + my $prefix_query = PrefixQuery->new( + field => $field, + query_string => $text, + ); + $or_query->add_child($prefix_query); + } + return $or_query; + } + else { + return $self->SUPER::expand_leaf($leaf_query); + } +}</pre> + +<p>Ordinarily, +those asterisks would have been stripped when running tokens through the EasyAnalyzer – query strings containing “foo*” would produce TermQueries for the term “foo”. +Our override intercepts tokens with trailing asterisks and processes them as PrefixQueries before <code>SUPER::expand_leaf</code> can discard them, +so that a search for “foo*” can match “food”, +“foosball”, +and so on.</p> + +<h3><a class='u' +name="Usage" +>Usage</a></h3> + +<p>Insert our custom parser into the search.cgi sample app to get a feel for how it behaves:</p> + +<pre>my $parser = FlatQueryParser->new( schema => $searcher->get_schema ); +my $query = $parser->parse( decode( 'UTF-8', $cgi->param('q') || '' ) ); +my $hits = $searcher->hits( + query => $query, + offset => $offset, + num_wanted => $page_size, +); +...</pre> + +</div> + + </div> <!-- lucy-main_content_box --> + <div class="clear"></div> + + </div> <!-- lucy-main_content --> + + <div id="lucy-copyright" class="container_16"> + <p>Copyright © 2010-2015 The Apache Software Foundation, Licensed under the + <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>. + <br/> + Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache Lucy project logo are trademarks of The + Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their + respective owners. + </p> + </div> <!-- lucy-copyright --> + + </div> <!-- lucy-rigid_wrapper --> + + </body> +</html> Added: websites/staging/lucy/trunk/content/docs/0.5.0/perl/Lucy/Docs/Cookbook/FastUpdates.html ============================================================================== --- websites/staging/lucy/trunk/content/docs/0.5.0/perl/Lucy/Docs/Cookbook/FastUpdates.html (added) +++ websites/staging/lucy/trunk/content/docs/0.5.0/perl/Lucy/Docs/Cookbook/FastUpdates.html Wed Sep 28 12:07:48 2016 @@ -0,0 +1,258 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> +<html lang="en"> + <head> + <meta http-equiv="Content-Type" content="text/html;charset=UTF-8"> + <title>Lucy::Docs::Cookbook::FastUpdates â Apache Lucy Documentation</title> + <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css"> + </head> + + <body> + + <div id="lucy-rigid_wrapper"> + + <div id="lucy-top" class="container_16 lucy-white_box_3d"> + + <div id="lucy-logo_box" class="grid_8"> + <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache Lucyâ¢"></a> + </div> <!-- lucy-logo_box --> + + <div #id="lucy-top_nav_box" class="grid_8"> + <div id="lucy-top_nav_bar" class="container_8"> + <ul> + <li><a href="http://www.apache.org/" title="Apache Software Foundation">Apache Software Foundation</a></li> + <li><a href="http://www.apache.org/licenses/" title="License">License</a></li> + <li><a href="http://www.apache.org/foundation/sponsorship.html" title="Sponsorship">Sponsorship</a></li> + <li><a href="http://www.apache.org/foundation/thanks.html" title="Thanks">Thanks</a></li> + <li><a href="http://www.apache.org/security/ " title="Security">Security</a></li> + </ul> + </div> <!-- lucy-top_nav_bar --> + <p><a href="http://www.apache.org/">Apache</a> » <a href="/">Lucy</a> » <a href="/docs/">Docs</a> » <a href="/docs/0.5.0/">0.5.0</a> » <a href="/docs/0.5.0/perl/">Perl</a> » <a href="/docs/0.5.0/perl/Lucy/">Lucy</a> » <a href="/docs/0.5.0/perl/Lucy/Docs/">Docs</a> » <a href="/docs/0.5.0/perl/Lucy/Docs/Cookbook/">Cookbook</a></p> + <form name="lucy-top_search_box" id="lucy-top_search_box" action="http://www.google.com/search" method="get"> + <input value="*.apache.org" name="sitesearch" type="hidden"/> + <input type="text" name="q" id="query" style="width:85%"> + <input type="submit" id="submit" value="Search"> + </form> + </div> <!-- lucy-top_nav_box --> + + <div class="clear"></div> + + </div> <!-- lucy-top --> + + <div id="lucy-main_content" class="container_16 lucy-white_box_3d"> + + <div class="grid_4" id="lucy-left_nav_box"> + <h6>About</h6> + <ul> + <li><a href="/">Welcome</a></li> + <li><a href="/clownfish.html">Clownfish</a></li> + <li><a href="/faq.html">FAQ</a></li> + <li><a href="/people.html">People</a></li> + </ul> + <h6>Resources</h6> + <ul> + <li><a href="/download.html">Download</a></li> + <li><a href="/mailing_lists.html">Mailing Lists</a></li> + <li><a href="/docs/">Documentation</a></li> + <li><a href="http://wiki.apache.org/lucy/">Wiki</a></li> + <li><a href="https://issues.apache.org/jira/browse/LUCY">Issue Tracker</a></li> + <li><a href="/version_control.html">Version Control</a></li> + </ul> + <h6>Related Projects</h6> + <ul> + <li><a href="http://lucene.apache.org/core/">Lucene</a></li> + <li><a href="http://dezi.org/">Dezi</a></li> + <li><a href="http://lucene.apache.org/solr/">Solr</a></li> + <li><a href="http://lucenenet.apache.org/">Lucene.NET</a></li> + <li><a href="http://lucene.apache.org/pylucene/">PyLucene</a></li> + </ul> + </div> <!-- lucy-left_nav_box --> + + <div id="lucy-main_content_box" class="grid_9"> + <div> +<a name='___top' class='dummyTopAnchor' ></a> + +<h2><a class='u' +name="NAME" +>NAME</a></h2> + +<p>Lucy::Docs::Cookbook::FastUpdates - Near real-time index updates</p> + +<h2><a class='u' +name="DESCRIPTION" +>DESCRIPTION</a></h2> + +<p>While index updates are fast on average, +worst-case update performance may be significantly slower. +To make index updates consistently quick, +we must manually intervene to control the process of index segment consolidation.</p> + +<h3><a class='u' +name="The_problem" +>The problem</a></h3> + +<p>Ordinarily, +modifying an index is cheap. +New data is added to new segments, +and the time to write a new segment scales more or less linearly with the number of documents added during the indexing session.</p> + +<p>Deletions are also cheap most of the time, +because we don’t remove documents immediately but instead mark them as deleted, +and adding the deletion mark is cheap.</p> + +<p>However, +as new segments are added and the deletion rate for existing segments increases, +search-time performance slowly begins to degrade. +At some point, +it becomes necessary to consolidate existing segments, +rewriting their data into a new segment.</p> + +<p>If the recycled segments are small, +the time it takes to rewrite them may not be significant. +Every once in a while, +though, +a large amount of data must be rewritten.</p> + +<h3><a class='u' +name="Procrastinating_and_playing_catch-up" +>Procrastinating and playing catch-up</a></h3> + +<p>The simplest way to force fast index updates is to avoid rewriting anything.</p> + +<p>Indexer relies upon <a href="../../../Lucy/Index/IndexManager.html" class="podlinkpod" +>IndexManager</a>’s <a href="../../../Lucy/Index/IndexManager.html#recycle" class="podlinkpod" +>recycle()</a> method to tell it which segments should be consolidated. +If we subclass IndexManager and override the method so that it always returns an empty array, +we get consistently quick performance:</p> + +<pre>package NoMergeManager; +use base qw( Lucy::Index::IndexManager ); +sub recycle { [] } + +package main; +my $indexer = Lucy::Index::Indexer->new( + index => '/path/to/index', + manager => NoMergeManager->new, +); +... +$indexer->commit;</pre> + +<p>However, +we can’t procrastinate forever. +Eventually, +we’ll have to run an ordinary, +uncontrolled indexing session, +potentially triggering a large rewrite of lots of small and/or degraded segments:</p> + +<pre>my $indexer = Lucy::Index::Indexer->new( + index => '/path/to/index', + # manager => NoMergeManager->new, +); +... +$indexer->commit;</pre> + +<h3><a class='u' +name="Acceptable_worst-case_update_time,_slower_degradation" +>Acceptable worst-case update time, +slower degradation</a></h3> + +<p>Never merging anything at all in the main indexing process is probably overkill. +Small segments are relatively cheap to merge; we just need to guard against the big rewrites.</p> + +<p>Setting a ceiling on the number of documents in the segments to be recycled allows us to avoid a mass proliferation of tiny, +single-document segments, +while still offering decent worst-case update speed:</p> + +<pre>package LightMergeManager; +use base qw( Lucy::Index::IndexManager ); + +sub recycle { + my $self = shift; + my $seg_readers = $self->SUPER::recycle(@_); + @$seg_readers = grep { $_->doc_max < 10 } @$seg_readers; + return $seg_readers; +}</pre> + +<p>However, +we still have to consolidate every once in a while, +and while that happens content updates will be locked out.</p> + +<h3><a class='u' +name="Background_merging" +>Background merging</a></h3> + +<p>If it’s not acceptable to lock out updates while the index consolidation process runs, +the alternative is to move the consolidation process out of band, +using <a href="../../../Lucy/Index/BackgroundMerger.html" class="podlinkpod" +>BackgroundMerger</a>.</p> + +<p>It’s never safe to have more than one Indexer attempting to modify the content of an index at the same time, +but a BackgroundMerger and an Indexer can operate simultaneously:</p> + +<pre># Indexing process. +use Scalar::Util qw( blessed ); +my $retries = 0; +while (1) { + eval { + my $indexer = Lucy::Index::Indexer->new( + index => '/path/to/index', + manager => LightMergeManager->new, + ); + $indexer->add_doc($doc); + $indexer->commit; + }; + last unless $@; + if ( blessed($@) and $@->isa("Lucy::Store::LockErr") ) { + # Catch LockErr. + warn "Couldn't get lock ($retries retries)"; + $retries++; + } + else { + die "Write failed: $@"; + } +} + +# Background merge process. +my $manager = Lucy::Index::IndexManager->new; +$manager->set_write_lock_timeout(60_000); +my $bg_merger = Lucy::Index::BackgroundMerger->new( + index => '/path/to/index', + manager => $manager, +); +$bg_merger->commit;</pre> + +<p>The exception handling code becomes useful once you have more than one index modification process happening simultaneously. +By default, +Indexer tries several times to acquire a write lock over the span of one second, +then holds it until <a href="../../../Lucy/Index/Indexer.html#commit" class="podlinkpod" +>commit()</a> completes. +BackgroundMerger handles most of its work without the write lock, +but it does need it briefly once at the beginning and once again near the end. +Under normal loads, +the internal retry logic will resolve conflicts, +but if it’s not acceptable to miss an insert, +you probably want to catch <a href="../../../Lucy/Store/LockErr.html" class="podlinkpod" +>LockErr</a> exceptions thrown by Indexer. +In contrast, +a LockErr from BackgroundMerger probably just needs to be logged.</p> + +</div> + + </div> <!-- lucy-main_content_box --> + <div class="clear"></div> + + </div> <!-- lucy-main_content --> + + <div id="lucy-copyright" class="container_16"> + <p>Copyright © 2010-2015 The Apache Software Foundation, Licensed under the + <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>. + <br/> + Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache Lucy project logo are trademarks of The + Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their + respective owners. + </p> + </div> <!-- lucy-copyright --> + + </div> <!-- lucy-rigid_wrapper --> + + </body> +</html> Added: websites/staging/lucy/trunk/content/docs/0.5.0/perl/Lucy/Docs/DevGuide.html ============================================================================== --- websites/staging/lucy/trunk/content/docs/0.5.0/perl/Lucy/Docs/DevGuide.html (added) +++ websites/staging/lucy/trunk/content/docs/0.5.0/perl/Lucy/Docs/DevGuide.html Wed Sep 28 12:07:48 2016 @@ -0,0 +1,142 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> +<html lang="en"> + <head> + <meta http-equiv="Content-Type" content="text/html;charset=UTF-8"> + <title>Lucy::Docs::DevGuide â Apache Lucy Documentation</title> + <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css"> + </head> + + <body> + + <div id="lucy-rigid_wrapper"> + + <div id="lucy-top" class="container_16 lucy-white_box_3d"> + + <div id="lucy-logo_box" class="grid_8"> + <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache Lucyâ¢"></a> + </div> <!-- lucy-logo_box --> + + <div #id="lucy-top_nav_box" class="grid_8"> + <div id="lucy-top_nav_bar" class="container_8"> + <ul> + <li><a href="http://www.apache.org/" title="Apache Software Foundation">Apache Software Foundation</a></li> + <li><a href="http://www.apache.org/licenses/" title="License">License</a></li> + <li><a href="http://www.apache.org/foundation/sponsorship.html" title="Sponsorship">Sponsorship</a></li> + <li><a href="http://www.apache.org/foundation/thanks.html" title="Thanks">Thanks</a></li> + <li><a href="http://www.apache.org/security/ " title="Security">Security</a></li> + </ul> + </div> <!-- lucy-top_nav_bar --> + <p><a href="http://www.apache.org/">Apache</a> » <a href="/">Lucy</a> » <a href="/docs/">Docs</a> » <a href="/docs/0.5.0/">0.5.0</a> » <a href="/docs/0.5.0/perl/">Perl</a> » <a href="/docs/0.5.0/perl/Lucy/">Lucy</a> » <a href="/docs/0.5.0/perl/Lucy/Docs/">Docs</a></p> + <form name="lucy-top_search_box" id="lucy-top_search_box" action="http://www.google.com/search" method="get"> + <input value="*.apache.org" name="sitesearch" type="hidden"/> + <input type="text" name="q" id="query" style="width:85%"> + <input type="submit" id="submit" value="Search"> + </form> + </div> <!-- lucy-top_nav_box --> + + <div class="clear"></div> + + </div> <!-- lucy-top --> + + <div id="lucy-main_content" class="container_16 lucy-white_box_3d"> + + <div class="grid_4" id="lucy-left_nav_box"> + <h6>About</h6> + <ul> + <li><a href="/">Welcome</a></li> + <li><a href="/clownfish.html">Clownfish</a></li> + <li><a href="/faq.html">FAQ</a></li> + <li><a href="/people.html">People</a></li> + </ul> + <h6>Resources</h6> + <ul> + <li><a href="/download.html">Download</a></li> + <li><a href="/mailing_lists.html">Mailing Lists</a></li> + <li><a href="/docs/">Documentation</a></li> + <li><a href="http://wiki.apache.org/lucy/">Wiki</a></li> + <li><a href="https://issues.apache.org/jira/browse/LUCY">Issue Tracker</a></li> + <li><a href="/version_control.html">Version Control</a></li> + </ul> + <h6>Related Projects</h6> + <ul> + <li><a href="http://lucene.apache.org/core/">Lucene</a></li> + <li><a href="http://dezi.org/">Dezi</a></li> + <li><a href="http://lucene.apache.org/solr/">Solr</a></li> + <li><a href="http://lucenenet.apache.org/">Lucene.NET</a></li> + <li><a href="http://lucene.apache.org/pylucene/">PyLucene</a></li> + </ul> + </div> <!-- lucy-left_nav_box --> + + <div id="lucy-main_content_box" class="grid_9"> + <div> +<a name='___top' class='dummyTopAnchor' ></a> + +<h2><a class='u' +name="NAME" +>NAME</a></h2> + +<p>Lucy::Docs::DevGuide - Quick-start guide to hacking on Apache Lucy.</p> + +<h2><a class='u' +name="DESCRIPTION" +>DESCRIPTION</a></h2> + +<p>The Apache Lucy code base is organized into roughly four layers:</p> + +<ul> +<li>Charmonizer - compiler and OS configuration probing.</li> + +<li>Clownfish - header files.</li> + +<li>C - implementation files.</li> + +<li>Host - binding language.</li> +</ul> + +<p>Charmonizer is a configuration prober which writes a single header file, +“charmony.h”, +describing the build environment and facilitating cross-platform development. +It’s similar to Autoconf or Metaconfig, +but written in pure C.</p> + +<p>The “.cfh” files within the Lucy core are Clownfish header files. +Clownfish is a purpose-built, +declaration-only language which superimposes a single-inheritance object model on top of C which is specifically designed to co-exist happily with variety of “host” languages and to allow limited run-time dynamic subclassing. +For more information see the Clownfish docs, +but if there’s one thing you should know about Clownfish OO before you start hacking, +it’s that method calls are differentiated from functions by capitalization:</p> + +<pre>Indexer_Add_Doc <-- Method, typically uses dynamic dispatch. +Indexer_add_doc <-- Function, always a direct invocation.</pre> + +<p>The C files within the Lucy core are where most of Lucy’s low-level functionality lies. +They implement the interface defined by the Clownfish header files.</p> + +<p>The C core is intentionally left incomplete, +however; to be usable, +it must be bound to a “host” language. +(In this context, +even C is considered a “host” which must implement the missing pieces and be “bound” to the core.) Some of the binding code is autogenerated by Clownfish on a spec customized for each language. +Other pieces are hand-coded in either C (using the host’s C API) or the host language itself.</p> + +</div> + + </div> <!-- lucy-main_content_box --> + <div class="clear"></div> + + </div> <!-- lucy-main_content --> + + <div id="lucy-copyright" class="container_16"> + <p>Copyright © 2010-2015 The Apache Software Foundation, Licensed under the + <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>. + <br/> + Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache Lucy project logo are trademarks of The + Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their + respective owners. + </p> + </div> <!-- lucy-copyright --> + + </div> <!-- lucy-rigid_wrapper --> + + </body> +</html> Added: websites/staging/lucy/trunk/content/docs/0.5.0/perl/Lucy/Docs/DocIDs.html ============================================================================== --- websites/staging/lucy/trunk/content/docs/0.5.0/perl/Lucy/Docs/DocIDs.html (added) +++ websites/staging/lucy/trunk/content/docs/0.5.0/perl/Lucy/Docs/DocIDs.html Wed Sep 28 12:07:48 2016 @@ -0,0 +1,135 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> +<html lang="en"> + <head> + <meta http-equiv="Content-Type" content="text/html;charset=UTF-8"> + <title>Lucy::Docs::DocIDs â Apache Lucy Documentation</title> + <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css"> + </head> + + <body> + + <div id="lucy-rigid_wrapper"> + + <div id="lucy-top" class="container_16 lucy-white_box_3d"> + + <div id="lucy-logo_box" class="grid_8"> + <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache Lucyâ¢"></a> + </div> <!-- lucy-logo_box --> + + <div #id="lucy-top_nav_box" class="grid_8"> + <div id="lucy-top_nav_bar" class="container_8"> + <ul> + <li><a href="http://www.apache.org/" title="Apache Software Foundation">Apache Software Foundation</a></li> + <li><a href="http://www.apache.org/licenses/" title="License">License</a></li> + <li><a href="http://www.apache.org/foundation/sponsorship.html" title="Sponsorship">Sponsorship</a></li> + <li><a href="http://www.apache.org/foundation/thanks.html" title="Thanks">Thanks</a></li> + <li><a href="http://www.apache.org/security/ " title="Security">Security</a></li> + </ul> + </div> <!-- lucy-top_nav_bar --> + <p><a href="http://www.apache.org/">Apache</a> » <a href="/">Lucy</a> » <a href="/docs/">Docs</a> » <a href="/docs/0.5.0/">0.5.0</a> » <a href="/docs/0.5.0/perl/">Perl</a> » <a href="/docs/0.5.0/perl/Lucy/">Lucy</a> » <a href="/docs/0.5.0/perl/Lucy/Docs/">Docs</a></p> + <form name="lucy-top_search_box" id="lucy-top_search_box" action="http://www.google.com/search" method="get"> + <input value="*.apache.org" name="sitesearch" type="hidden"/> + <input type="text" name="q" id="query" style="width:85%"> + <input type="submit" id="submit" value="Search"> + </form> + </div> <!-- lucy-top_nav_box --> + + <div class="clear"></div> + + </div> <!-- lucy-top --> + + <div id="lucy-main_content" class="container_16 lucy-white_box_3d"> + + <div class="grid_4" id="lucy-left_nav_box"> + <h6>About</h6> + <ul> + <li><a href="/">Welcome</a></li> + <li><a href="/clownfish.html">Clownfish</a></li> + <li><a href="/faq.html">FAQ</a></li> + <li><a href="/people.html">People</a></li> + </ul> + <h6>Resources</h6> + <ul> + <li><a href="/download.html">Download</a></li> + <li><a href="/mailing_lists.html">Mailing Lists</a></li> + <li><a href="/docs/">Documentation</a></li> + <li><a href="http://wiki.apache.org/lucy/">Wiki</a></li> + <li><a href="https://issues.apache.org/jira/browse/LUCY">Issue Tracker</a></li> + <li><a href="/version_control.html">Version Control</a></li> + </ul> + <h6>Related Projects</h6> + <ul> + <li><a href="http://lucene.apache.org/core/">Lucene</a></li> + <li><a href="http://dezi.org/">Dezi</a></li> + <li><a href="http://lucene.apache.org/solr/">Solr</a></li> + <li><a href="http://lucenenet.apache.org/">Lucene.NET</a></li> + <li><a href="http://lucene.apache.org/pylucene/">PyLucene</a></li> + </ul> + </div> <!-- lucy-left_nav_box --> + + <div id="lucy-main_content_box" class="grid_9"> + <div> +<a name='___top' class='dummyTopAnchor' ></a> + +<h2><a class='u' +name="NAME" +>NAME</a></h2> + +<p>Lucy::Docs::DocIDs - Characteristics of Apache Lucy document ids.</p> + +<h2><a class='u' +name="DESCRIPTION" +>DESCRIPTION</a></h2> + +<h3><a class='u' +name="Document_ids_are_signed_32-bit_integers" +>Document ids are signed 32-bit integers</a></h3> + +<p>Document ids in Apache Lucy start at 1. +Because 0 is never a valid doc id, +we can use it as a sentinel value:</p> + +<pre>while ( my $doc_id = $posting_list->next ) { + ... +}</pre> + +<h3><a class='u' +name="Document_ids_are_ephemeral" +>Document ids are ephemeral</a></h3> + +<p>The document ids used by Lucy are associated with a single index snapshot. +The moment an index is updated, +the mapping of document ids to documents is subject to change.</p> + +<p>Since IndexReader objects represent a point-in-time view of an index, +document ids are guaranteed to remain static for the life of the reader. +However, +because they are not permanent, +Lucy document ids cannot be used as foreign keys to locate records in external data sources. +If you truly need a primary key field, +you must define it and populate it yourself.</p> + +<p>Furthermore, +the order of document ids does not tell you anything about the sequence in which documents were added to the index.</p> + +</div> + + </div> <!-- lucy-main_content_box --> + <div class="clear"></div> + + </div> <!-- lucy-main_content --> + + <div id="lucy-copyright" class="container_16"> + <p>Copyright © 2010-2015 The Apache Software Foundation, Licensed under the + <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>. + <br/> + Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache Lucy project logo are trademarks of The + Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their + respective owners. + </p> + </div> <!-- lucy-copyright --> + + </div> <!-- lucy-rigid_wrapper --> + + </body> +</html>