Added: websites/staging/lucy/trunk/content/docs/0.5.0/perl/Lucy/Docs/FileFormat.html ============================================================================== --- websites/staging/lucy/trunk/content/docs/0.5.0/perl/Lucy/Docs/FileFormat.html (added) +++ websites/staging/lucy/trunk/content/docs/0.5.0/perl/Lucy/Docs/FileFormat.html Wed Sep 28 12:07:48 2016 @@ -0,0 +1,358 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> +<html lang="en"> + <head> + <meta http-equiv="Content-Type" content="text/html;charset=UTF-8"> + <title>Lucy::Docs::FileFormat â Apache Lucy Documentation</title> + <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css"> + </head> + + <body> + + <div id="lucy-rigid_wrapper"> + + <div id="lucy-top" class="container_16 lucy-white_box_3d"> + + <div id="lucy-logo_box" class="grid_8"> + <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache Lucyâ¢"></a> + </div> <!-- lucy-logo_box --> + + <div #id="lucy-top_nav_box" class="grid_8"> + <div id="lucy-top_nav_bar" class="container_8"> + <ul> + <li><a href="http://www.apache.org/" title="Apache Software Foundation">Apache Software Foundation</a></li> + <li><a href="http://www.apache.org/licenses/" title="License">License</a></li> + <li><a href="http://www.apache.org/foundation/sponsorship.html" title="Sponsorship">Sponsorship</a></li> + <li><a href="http://www.apache.org/foundation/thanks.html" title="Thanks">Thanks</a></li> + <li><a href="http://www.apache.org/security/ " title="Security">Security</a></li> + </ul> + </div> <!-- lucy-top_nav_bar --> + <p><a href="http://www.apache.org/">Apache</a> » <a href="/">Lucy</a> » <a href="/docs/">Docs</a> » <a href="/docs/0.5.0/">0.5.0</a> » <a href="/docs/0.5.0/perl/">Perl</a> » <a href="/docs/0.5.0/perl/Lucy/">Lucy</a> » <a href="/docs/0.5.0/perl/Lucy/Docs/">Docs</a></p> + <form name="lucy-top_search_box" id="lucy-top_search_box" action="http://www.google.com/search" method="get"> + <input value="*.apache.org" name="sitesearch" type="hidden"/> + <input type="text" name="q" id="query" style="width:85%"> + <input type="submit" id="submit" value="Search"> + </form> + </div> <!-- lucy-top_nav_box --> + + <div class="clear"></div> + + </div> <!-- lucy-top --> + + <div id="lucy-main_content" class="container_16 lucy-white_box_3d"> + + <div class="grid_4" id="lucy-left_nav_box"> + <h6>About</h6> + <ul> + <li><a href="/">Welcome</a></li> + <li><a href="/clownfish.html">Clownfish</a></li> + <li><a href="/faq.html">FAQ</a></li> + <li><a href="/people.html">People</a></li> + </ul> + <h6>Resources</h6> + <ul> + <li><a href="/download.html">Download</a></li> + <li><a href="/mailing_lists.html">Mailing Lists</a></li> + <li><a href="/docs/">Documentation</a></li> + <li><a href="http://wiki.apache.org/lucy/">Wiki</a></li> + <li><a href="https://issues.apache.org/jira/browse/LUCY">Issue Tracker</a></li> + <li><a href="/version_control.html">Version Control</a></li> + </ul> + <h6>Related Projects</h6> + <ul> + <li><a href="http://lucene.apache.org/core/">Lucene</a></li> + <li><a href="http://dezi.org/">Dezi</a></li> + <li><a href="http://lucene.apache.org/solr/">Solr</a></li> + <li><a href="http://lucenenet.apache.org/">Lucene.NET</a></li> + <li><a href="http://lucene.apache.org/pylucene/">PyLucene</a></li> + </ul> + </div> <!-- lucy-left_nav_box --> + + <div id="lucy-main_content_box" class="grid_9"> + <div> +<a name='___top' class='dummyTopAnchor' ></a> + +<h2><a class='u' +name="NAME" +>NAME</a></h2> + +<p>Lucy::Docs::FileFormat - Overview of index file format</p> + +<h2><a class='u' +name="DESCRIPTION" +>DESCRIPTION</a></h2> + +<p>It is not necessary to understand the current implementation details of the index file format in order to use Apache Lucy effectively, +but it may be helpful if you are interested in tweaking for high performance, +exotic usage, +or debugging and development.</p> + +<p>On a file system, +an index is a directory. +The files inside have a hierarchical relationship: an index is made up of “segments”, +each of which is an independent inverted index with its own subdirectory; each segment is made up of several component parts.</p> + +<pre>[index]--| + |--snapshot_XXX.json + |--schema_XXX.json + |--write.lock + | + |--seg_1--| + | |--segmeta.json + | |--cfmeta.json + | |--cf.dat-------| + | |--[lexicon] + | |--[postings] + | |--[documents] + | |--[highlight] + | |--[deletions] + | + |--seg_2--| + | |--segmeta.json + | |--cfmeta.json + | |--cf.dat-------| + | |--[lexicon] + | |--[postings] + | |--[documents] + | |--[highlight] + | |--[deletions] + | + |--[...]--| </pre> + +<h3><a class='u' +name="Write-once_philosophy" +>Write-once philosophy</a></h3> + +<p>All segment directory names consist of the string “seg_” followed by a number in base 36: seg_1, +seg_5m, +seg_p9s2 and so on, +with higher numbers indicating more recent segments. +Once a segment is finished and committed, +its name is never re-used and its files are never modified.</p> + +<p>Old segments become obsolete and can be removed when their data has been consolidated into new segments during the process of segment merging and optimization. +A fully-optimized index has only one segment.</p> + +<h3><a class='u' +name="Top-level_entries" +>Top-level entries</a></h3> + +<p>There are a handful of “top-level” files and directories which belong to the entire index rather than to a particular segment.</p> + +<h4><a class='u' +name="snapshot_XXX.json" +>snapshot_XXX.json</a></h4> + +<p>A “snapshot” file, +e.g. +<code>snapshot_m7p.json</code>, +is list of index files and directories. +Because index files, +once written, +are never modified, +the list of entries in a snapshot defines a point-in-time view of the data in an index.</p> + +<p>Like segment directories, +snapshot files also utilize the unique-base-36-number naming convention; the higher the number, +the more recent the file. +The appearance of a new snapshot file within the index directory constitutes an index update. +While a new segment is being written new files may be added to the index directory, +but until a new snapshot file gets written, +a Searcher opening the index for reading won’t know about them.</p> + +<h4><a class='u' +name="schema_XXX.json" +>schema_XXX.json</a></h4> + +<p>The schema file is a Schema object describing the index’s format, +serialized as JSON. +It, +too, +is versioned, +and a given snapshot file will reference one and only one schema file.</p> + +<h4><a class='u' +name="locks" +>locks</a></h4> + +<p>By default, +only one indexing process may safely modify the index at any given time. +Processes reserve an index by laying claim to the <code>write.lock</code> file within the <code>locks/</code> directory. +A smattering of other lock files may be used from time to time, +as well.</p> + +<h3><a class='u' +name="A_segment(8217)s_component_parts" +>A segment’s component parts</a></h3> + +<p>By default, +each segment has up to five logical components: lexicon, +postings, +document storage, +highlight data, +and deletions. +Binary data from these components gets stored in virtual files within the “cf.dat” compound file; metadata is stored in a shared “segmeta.json” file.</p> + +<h4><a class='u' +name="segmeta.json" +>segmeta.json</a></h4> + +<p>The segmeta.json file is a central repository for segment metadata. +In addition to information such as document counts and field numbers, +it also warehouses arbitrary metadata on behalf of individual index components.</p> + +<h4><a class='u' +name="Lexicon" +>Lexicon</a></h4> + +<p>Each indexed field gets its own lexicon in each segment. +The exact files involved depend on the field’s type, +but generally speaking there will be two parts. +First, +there’s a primary <code>lexicon-XXX.dat</code> file which houses a complete term list associating terms with corpus frequency statistics, +postings file locations, +etc. +Second, +one or more “lexicon index” files may be present which contain periodic samples from the primary lexicon file to facilitate fast lookups.</p> + +<h4><a class='u' +name="Postings" +>Postings</a></h4> + +<p>“Posting” is a technical term from the field of <a href="../../Lucy/Docs/IRTheory.html" class="podlinkpod" +>information retrieval</a>, +defined as a single instance of a one term indexing one document. +If you are looking at the index in the back of a book, +and you see that “freedom” is referenced on pages 8, +86, +and 240, +that would be three postings, +which taken together form a “posting list”. +The same terminology applies to an index in electronic form.</p> + +<p>Each segment has one postings file per indexed field. +When a search is performed for a single term, +first that term is looked up in the lexicon. +If the term exists in the segment, +the record in the lexicon will contain information about which postings file to look at and where to look.</p> + +<p>The first thing any posting record tells you is a document id. +By iterating over all the postings associated with a term, +you can find all the documents that match that term, +a process which is analogous to looking up page numbers in a book’s index. +However, +each posting record typically contains other information in addition to document id, +e.g. +the positions at which the term occurs within the field.</p> + +<h4><a class='u' +name="Documents" +>Documents</a></h4> + +<p>The document storage section is a simple database, +organized into two files:</p> + +<ul> +<li><b>documents.dat</b> - Serialized documents.</li> + +<li><b>documents.ix</b> - Document storage index, +a solid array of 64-bit integers where each integer location corresponds to a document id, +and the value at that location points at a file position in the documents.dat file.</li> +</ul> + +<h4><a class='u' +name="Highlight_data" +>Highlight data</a></h4> + +<p>The files which store data used for excerpting and highlighting are organized similarly to the files used to store documents.</p> + +<ul> +<li><b>highlight.dat</b> - Chunks of serialized highlight data, +one per doc id.</li> + +<li><b>highlight.ix</b> - Highlight data index – as with the <code>documents.ix</code> file, +a solid array of 64-bit file pointers.</li> +</ul> + +<h4><a class='u' +name="Deletions" +>Deletions</a></h4> + +<p>When a document is “deleted” from a segment, +it is not actually purged right away; it is merely marked as “deleted” via a deletions file. +Deletions files contains bit vectors with one bit for each document in the segment; if bit #254 is set then document 254 is deleted, +and if that document turns up in a search it will be masked out.</p> + +<p>It is only when a segment’s contents are rewritten to a new segment during the segment-merging process that deleted documents truly go away.</p> + +<h3><a class='u' +name="Compound_Files" +>Compound Files</a></h3> + +<p>If you peer inside an index directory, +you won’t actually find any files named “documents.dat”, +“highlight.ix”, +etc. +unless there is an indexing process underway. +What you will find instead is one “cf.dat” and one “cfmeta.json” file per segment.</p> + +<p>To minimize the need for file descriptors at search-time, +all per-segment binary data files are concatenated together in “cf.dat” at the close of each indexing session. +Information about where each file begins and ends is stored in <code>cfmeta.json</code>. +When the segment is opened for reading, +a single file descriptor per “cf.dat” file can be shared among several readers.</p> + +<h3><a class='u' +name="A_Typical_Search" +>A Typical Search</a></h3> + +<p>Here’s a simplified narrative, +dramatizing how a search for “freedom” against a given segment plays out:</p> + +<ul> +<li>The searcher asks the relevant Lexicon Index, +“Do you know anything about ‘freedom’?” Lexicon Index replies, +“Can’t say for sure, +but if the main Lexicon file does, +‘freedom’ is probably somewhere around byte 21008”.</li> + +<li>The main Lexicon tells the searcher “One moment, +let me scan our records… Yes, +we have 2 documents which contain ‘freedom’. +You’ll find them in seg_6/postings-4.dat starting at byte 66991.”</li> + +<li>The Postings file says “Yep, +we have ‘freedom’, +all right! +Document id 40 has 1 ‘freedom’, +and document 44 has 8. +If you need to know more, +like if any ‘freedom’ is part of the phrase ‘freedom of speech’, +ask me about positions!</li> + +<li>If the searcher is only looking for ‘freedom’ in isolation, +that’s where it stops. +It now knows enough to assign the documents scores against “freedom”, +with the 8-freedom document likely ranking higher than the single-freedom document.</li> +</ul> + +</div> + + </div> <!-- lucy-main_content_box --> + <div class="clear"></div> + + </div> <!-- lucy-main_content --> + + <div id="lucy-copyright" class="container_16"> + <p>Copyright © 2010-2015 The Apache Software Foundation, Licensed under the + <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>. + <br/> + Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache Lucy project logo are trademarks of The + Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their + respective owners. + </p> + </div> <!-- lucy-copyright --> + + </div> <!-- lucy-rigid_wrapper --> + + </body> +</html>
Added: websites/staging/lucy/trunk/content/docs/0.5.0/perl/Lucy/Docs/FileLocking.html ============================================================================== --- websites/staging/lucy/trunk/content/docs/0.5.0/perl/Lucy/Docs/FileLocking.html (added) +++ websites/staging/lucy/trunk/content/docs/0.5.0/perl/Lucy/Docs/FileLocking.html Wed Sep 28 12:07:48 2016 @@ -0,0 +1,181 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> +<html lang="en"> + <head> + <meta http-equiv="Content-Type" content="text/html;charset=UTF-8"> + <title>Lucy::Docs::FileLocking â Apache Lucy Documentation</title> + <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css"> + </head> + + <body> + + <div id="lucy-rigid_wrapper"> + + <div id="lucy-top" class="container_16 lucy-white_box_3d"> + + <div id="lucy-logo_box" class="grid_8"> + <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache Lucyâ¢"></a> + </div> <!-- lucy-logo_box --> + + <div #id="lucy-top_nav_box" class="grid_8"> + <div id="lucy-top_nav_bar" class="container_8"> + <ul> + <li><a href="http://www.apache.org/" title="Apache Software Foundation">Apache Software Foundation</a></li> + <li><a href="http://www.apache.org/licenses/" title="License">License</a></li> + <li><a href="http://www.apache.org/foundation/sponsorship.html" title="Sponsorship">Sponsorship</a></li> + <li><a href="http://www.apache.org/foundation/thanks.html" title="Thanks">Thanks</a></li> + <li><a href="http://www.apache.org/security/ " title="Security">Security</a></li> + </ul> + </div> <!-- lucy-top_nav_bar --> + <p><a href="http://www.apache.org/">Apache</a> » <a href="/">Lucy</a> » <a href="/docs/">Docs</a> » <a href="/docs/0.5.0/">0.5.0</a> » <a href="/docs/0.5.0/perl/">Perl</a> » <a href="/docs/0.5.0/perl/Lucy/">Lucy</a> » <a href="/docs/0.5.0/perl/Lucy/Docs/">Docs</a></p> + <form name="lucy-top_search_box" id="lucy-top_search_box" action="http://www.google.com/search" method="get"> + <input value="*.apache.org" name="sitesearch" type="hidden"/> + <input type="text" name="q" id="query" style="width:85%"> + <input type="submit" id="submit" value="Search"> + </form> + </div> <!-- lucy-top_nav_box --> + + <div class="clear"></div> + + </div> <!-- lucy-top --> + + <div id="lucy-main_content" class="container_16 lucy-white_box_3d"> + + <div class="grid_4" id="lucy-left_nav_box"> + <h6>About</h6> + <ul> + <li><a href="/">Welcome</a></li> + <li><a href="/clownfish.html">Clownfish</a></li> + <li><a href="/faq.html">FAQ</a></li> + <li><a href="/people.html">People</a></li> + </ul> + <h6>Resources</h6> + <ul> + <li><a href="/download.html">Download</a></li> + <li><a href="/mailing_lists.html">Mailing Lists</a></li> + <li><a href="/docs/">Documentation</a></li> + <li><a href="http://wiki.apache.org/lucy/">Wiki</a></li> + <li><a href="https://issues.apache.org/jira/browse/LUCY">Issue Tracker</a></li> + <li><a href="/version_control.html">Version Control</a></li> + </ul> + <h6>Related Projects</h6> + <ul> + <li><a href="http://lucene.apache.org/core/">Lucene</a></li> + <li><a href="http://dezi.org/">Dezi</a></li> + <li><a href="http://lucene.apache.org/solr/">Solr</a></li> + <li><a href="http://lucenenet.apache.org/">Lucene.NET</a></li> + <li><a href="http://lucene.apache.org/pylucene/">PyLucene</a></li> + </ul> + </div> <!-- lucy-left_nav_box --> + + <div id="lucy-main_content_box" class="grid_9"> + <div> +<a name='___top' class='dummyTopAnchor' ></a> + +<h2><a class='u' +name="NAME" +>NAME</a></h2> + +<p>Lucy::Docs::FileLocking - Manage indexes on shared volumes.</p> + +<h2><a class='u' +name="DESCRIPTION" +>DESCRIPTION</a></h2> + +<p>Normally, +index locking is an invisible process. +Exclusive write access is controlled via lockfiles within the index directory and problems only arise if multiple processes attempt to acquire the write lock simultaneously; search-time processes do not ordinarily require locking at all.</p> + +<p>On shared volumes, +however, +the default locking mechanism fails, +and manual intervention becomes necessary.</p> + +<p>Both read and write applications accessing an index on a shared volume need to identify themselves with a unique <code>host</code> id, +e.g. +hostname or ip address. +Knowing the host id makes it possible to tell which lockfiles belong to other machines and therefore must not be removed when the lockfile’s pid number appears not to correspond to an active process.</p> + +<p>At index-time, +the danger is that multiple indexing processes from different machines which fail to specify a unique <code>host</code> id can delete each others’ lockfiles and then attempt to modify the index at the same time, +causing index corruption. +The search-time problem is more complex.</p> + +<p>Once an index file is no longer listed in the most recent snapshot, +Indexer attempts to delete it as part of a post-<a href="lucy:Indexer.Commit" class="podlinkurl" +>lucy:Indexer.Commit</a> cleanup routine. +It is possible that at the moment an Indexer is deleting files which it believes no longer needed, +a Searcher referencing an earlier snapshot is in fact using them. +The more often that an index is either updated or searched, +the more likely it is that this conflict will arise from time to time.</p> + +<p>Ordinarily, +the deletion attempts are not a problem. +On a typical unix volume, +the files will be deleted in name only: any process which holds an open filehandle against a given file will continue to have access, +and the file won’t actually get vaporized until the last filehandle is cleared. +Thanks to “delete on last close semantics”, +an Indexer can’t truly delete the file out from underneath an active Searcher. +On Windows, +where file deletion fails whenever any process holds an open handle, +the situation is different but still workable: Indexer just keeps retrying after each commit until deletion finally succeeds.</p> + +<p>On NFS, +however, +the system breaks, +because NFS allows files to be deleted out from underneath active processes. +Should this happen, +the unlucky read process will crash with a “Stale NFS filehandle” exception.</p> + +<p>Under normal circumstances, +it is neither necessary nor desirable for IndexReaders to secure read locks against an index, +but for NFS we have to make an exception. +LockFactory’s <a href="lucy:LockFactory.Make_Shared_Lock" class="podlinkurl" +>lucy:LockFactory.Make_Shared_Lock</a> method exists for this reason; supplying an IndexManager instance to IndexReader’s constructor activates an internal locking mechanism using <a href="lucy:LockFactory.Make_Shared_Lock" class="podlinkurl" +>lucy:LockFactory.Make_Shared_Lock</a> which prevents concurrent indexing processes from deleting files that are needed by active readers.</p> + +<pre>use Sys::Hostname qw( hostname ); +my $hostname = hostname() or die "Can't get unique hostname"; +my $manager = Lucy::Index::IndexManager->new( host => $hostname ); + +# Index time: +my $indexer = Lucy::Index::Indexer->new( + index => '/path/to/index', + manager => $manager, +); + +# Search time: +my $reader = Lucy::Index::IndexReader->open( + index => '/path/to/index', + manager => $manager, +); +my $searcher = Lucy::Search::IndexSearcher->new( index => $reader );</pre> + +<p>Since shared locks are implemented using lockfiles located in the index directory (as are exclusive locks), +reader applications must have write access for read locking to work. +Stale lock files from crashed processes are ordinarily cleared away the next time the same machine – as identified by the <code>host</code> parameter – opens another IndexReader. +(The classic technique of timing out lock files is not feasible because search processes may lie dormant indefinitely.) However, +please be aware that if the last thing a given machine does is crash, +lock files belonging to it may persist, +preventing deletion of obsolete index data.</p> + +</div> + + </div> <!-- lucy-main_content_box --> + <div class="clear"></div> + + </div> <!-- lucy-main_content --> + + <div id="lucy-copyright" class="container_16"> + <p>Copyright © 2010-2015 The Apache Software Foundation, Licensed under the + <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>. + <br/> + Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache Lucy project logo are trademarks of The + Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their + respective owners. + </p> + </div> <!-- lucy-copyright --> + + </div> <!-- lucy-rigid_wrapper --> + + </body> +</html> Added: websites/staging/lucy/trunk/content/docs/0.5.0/perl/Lucy/Docs/IRTheory.html ============================================================================== --- websites/staging/lucy/trunk/content/docs/0.5.0/perl/Lucy/Docs/IRTheory.html (added) +++ websites/staging/lucy/trunk/content/docs/0.5.0/perl/Lucy/Docs/IRTheory.html Wed Sep 28 12:07:48 2016 @@ -0,0 +1,157 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> +<html lang="en"> + <head> + <meta http-equiv="Content-Type" content="text/html;charset=UTF-8"> + <title>Lucy::Docs::IRTheory â Apache Lucy Documentation</title> + <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css"> + </head> + + <body> + + <div id="lucy-rigid_wrapper"> + + <div id="lucy-top" class="container_16 lucy-white_box_3d"> + + <div id="lucy-logo_box" class="grid_8"> + <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache Lucyâ¢"></a> + </div> <!-- lucy-logo_box --> + + <div #id="lucy-top_nav_box" class="grid_8"> + <div id="lucy-top_nav_bar" class="container_8"> + <ul> + <li><a href="http://www.apache.org/" title="Apache Software Foundation">Apache Software Foundation</a></li> + <li><a href="http://www.apache.org/licenses/" title="License">License</a></li> + <li><a href="http://www.apache.org/foundation/sponsorship.html" title="Sponsorship">Sponsorship</a></li> + <li><a href="http://www.apache.org/foundation/thanks.html" title="Thanks">Thanks</a></li> + <li><a href="http://www.apache.org/security/ " title="Security">Security</a></li> + </ul> + </div> <!-- lucy-top_nav_bar --> + <p><a href="http://www.apache.org/">Apache</a> » <a href="/">Lucy</a> » <a href="/docs/">Docs</a> » <a href="/docs/0.5.0/">0.5.0</a> » <a href="/docs/0.5.0/perl/">Perl</a> » <a href="/docs/0.5.0/perl/Lucy/">Lucy</a> » <a href="/docs/0.5.0/perl/Lucy/Docs/">Docs</a></p> + <form name="lucy-top_search_box" id="lucy-top_search_box" action="http://www.google.com/search" method="get"> + <input value="*.apache.org" name="sitesearch" type="hidden"/> + <input type="text" name="q" id="query" style="width:85%"> + <input type="submit" id="submit" value="Search"> + </form> + </div> <!-- lucy-top_nav_box --> + + <div class="clear"></div> + + </div> <!-- lucy-top --> + + <div id="lucy-main_content" class="container_16 lucy-white_box_3d"> + + <div class="grid_4" id="lucy-left_nav_box"> + <h6>About</h6> + <ul> + <li><a href="/">Welcome</a></li> + <li><a href="/clownfish.html">Clownfish</a></li> + <li><a href="/faq.html">FAQ</a></li> + <li><a href="/people.html">People</a></li> + </ul> + <h6>Resources</h6> + <ul> + <li><a href="/download.html">Download</a></li> + <li><a href="/mailing_lists.html">Mailing Lists</a></li> + <li><a href="/docs/">Documentation</a></li> + <li><a href="http://wiki.apache.org/lucy/">Wiki</a></li> + <li><a href="https://issues.apache.org/jira/browse/LUCY">Issue Tracker</a></li> + <li><a href="/version_control.html">Version Control</a></li> + </ul> + <h6>Related Projects</h6> + <ul> + <li><a href="http://lucene.apache.org/core/">Lucene</a></li> + <li><a href="http://dezi.org/">Dezi</a></li> + <li><a href="http://lucene.apache.org/solr/">Solr</a></li> + <li><a href="http://lucenenet.apache.org/">Lucene.NET</a></li> + <li><a href="http://lucene.apache.org/pylucene/">PyLucene</a></li> + </ul> + </div> <!-- lucy-left_nav_box --> + + <div id="lucy-main_content_box" class="grid_9"> + <div> +<a name='___top' class='dummyTopAnchor' ></a> + +<h2><a class='u' +name="NAME" +>NAME</a></h2> + +<p>Lucy::Docs::IRTheory - Crash course in information retrieval</p> + +<h2><a class='u' +name="DESCRIPTION" +>DESCRIPTION</a></h2> + +<p>Just enough Information Retrieval theory to find your way around Apache Lucy.</p> + +<h3><a class='u' +name="Terminology" +>Terminology</a></h3> + +<p>Lucy uses some terminology from the field of information retrieval which may be unfamiliar to many users. +“Document” and “term” mean pretty much what you’d expect them to, +but others such as “posting” and “inverted index” need a formal introduction:</p> + +<ul> +<li><i>document</i> - An atomic unit of retrieval.</li> + +<li><i>term</i> - An attribute which describes a document.</li> + +<li><i>posting</i> - One term indexing one document.</li> + +<li><i>term list</i> - The complete list of terms which describe a document.</li> + +<li><i>posting list</i> - The complete list of documents which a term indexes.</li> + +<li><i>inverted index</i> - A data structure which maps from terms to documents.</li> +</ul> + +<p>Since Lucy is a practical implementation of IR theory, +it loads these abstract, +distilled definitions down with useful traits. +For instance, +a “posting” in its most rarefied form is simply a term-document pairing; in Lucy, +the class MatchPosting fills this role. +However, +by associating additional information with a posting like the number of times the term occurs in the document, +we can turn it into a ScorePosting, +making it possible to rank documents by relevance rather than just list documents which happen to match in no particular order.</p> + +<h3><a class='u' +name="TF/IDF_ranking_algorithm" +>TF/IDF ranking algorithm</a></h3> + +<p>Lucy uses a variant of the well-established “Term Frequency / Inverse Document Frequency” weighting scheme. +A thorough treatment of TF/IDF is too ambitious for our present purposes, +but in a nutshell, +it means that…</p> + +<ul> +<li>in a search for <code>skate park</code>, +documents which score well for the comparatively rare term <code>skate</code> will rank higher than documents which score well for the more common term <code>park</code>.</li> + +<li>a 10-word text which has one occurrence each of both <code>skate</code> and <code>park</code> will rank higher than a 1000-word text which also contains one occurrence of each.</li> +</ul> + +<p>A web search for “tf idf” will turn up many excellent explanations of the algorithm.</p> + +</div> + + </div> <!-- lucy-main_content_box --> + <div class="clear"></div> + + </div> <!-- lucy-main_content --> + + <div id="lucy-copyright" class="container_16"> + <p>Copyright © 2010-2015 The Apache Software Foundation, Licensed under the + <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>. + <br/> + Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache Lucy project logo are trademarks of The + Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their + respective owners. + </p> + </div> <!-- lucy-copyright --> + + </div> <!-- lucy-rigid_wrapper --> + + </body> +</html> Added: websites/staging/lucy/trunk/content/docs/0.5.0/perl/Lucy/Docs/Tutorial.html ============================================================================== --- websites/staging/lucy/trunk/content/docs/0.5.0/perl/Lucy/Docs/Tutorial.html (added) +++ websites/staging/lucy/trunk/content/docs/0.5.0/perl/Lucy/Docs/Tutorial.html Wed Sep 28 12:07:48 2016 @@ -0,0 +1,165 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> +<html lang="en"> + <head> + <meta http-equiv="Content-Type" content="text/html;charset=UTF-8"> + <title>Lucy::Docs::Tutorial â Apache Lucy Documentation</title> + <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css"> + </head> + + <body> + + <div id="lucy-rigid_wrapper"> + + <div id="lucy-top" class="container_16 lucy-white_box_3d"> + + <div id="lucy-logo_box" class="grid_8"> + <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache Lucyâ¢"></a> + </div> <!-- lucy-logo_box --> + + <div #id="lucy-top_nav_box" class="grid_8"> + <div id="lucy-top_nav_bar" class="container_8"> + <ul> + <li><a href="http://www.apache.org/" title="Apache Software Foundation">Apache Software Foundation</a></li> + <li><a href="http://www.apache.org/licenses/" title="License">License</a></li> + <li><a href="http://www.apache.org/foundation/sponsorship.html" title="Sponsorship">Sponsorship</a></li> + <li><a href="http://www.apache.org/foundation/thanks.html" title="Thanks">Thanks</a></li> + <li><a href="http://www.apache.org/security/ " title="Security">Security</a></li> + </ul> + </div> <!-- lucy-top_nav_bar --> + <p><a href="http://www.apache.org/">Apache</a> » <a href="/">Lucy</a> » <a href="/docs/">Docs</a> » <a href="/docs/0.5.0/">0.5.0</a> » <a href="/docs/0.5.0/perl/">Perl</a> » <a href="/docs/0.5.0/perl/Lucy/">Lucy</a> » <a href="/docs/0.5.0/perl/Lucy/Docs/">Docs</a></p> + <form name="lucy-top_search_box" id="lucy-top_search_box" action="http://www.google.com/search" method="get"> + <input value="*.apache.org" name="sitesearch" type="hidden"/> + <input type="text" name="q" id="query" style="width:85%"> + <input type="submit" id="submit" value="Search"> + </form> + </div> <!-- lucy-top_nav_box --> + + <div class="clear"></div> + + </div> <!-- lucy-top --> + + <div id="lucy-main_content" class="container_16 lucy-white_box_3d"> + + <div class="grid_4" id="lucy-left_nav_box"> + <h6>About</h6> + <ul> + <li><a href="/">Welcome</a></li> + <li><a href="/clownfish.html">Clownfish</a></li> + <li><a href="/faq.html">FAQ</a></li> + <li><a href="/people.html">People</a></li> + </ul> + <h6>Resources</h6> + <ul> + <li><a href="/download.html">Download</a></li> + <li><a href="/mailing_lists.html">Mailing Lists</a></li> + <li><a href="/docs/">Documentation</a></li> + <li><a href="http://wiki.apache.org/lucy/">Wiki</a></li> + <li><a href="https://issues.apache.org/jira/browse/LUCY">Issue Tracker</a></li> + <li><a href="/version_control.html">Version Control</a></li> + </ul> + <h6>Related Projects</h6> + <ul> + <li><a href="http://lucene.apache.org/core/">Lucene</a></li> + <li><a href="http://dezi.org/">Dezi</a></li> + <li><a href="http://lucene.apache.org/solr/">Solr</a></li> + <li><a href="http://lucenenet.apache.org/">Lucene.NET</a></li> + <li><a href="http://lucene.apache.org/pylucene/">PyLucene</a></li> + </ul> + </div> <!-- lucy-left_nav_box --> + + <div id="lucy-main_content_box" class="grid_9"> + <div> +<a name='___top' class='dummyTopAnchor' ></a> + +<h2><a class='u' +name="NAME" +>NAME</a></h2> + +<p>Lucy::Docs::Tutorial - Step-by-step introduction to Apache Lucy.</p> + +<h2><a class='u' +name="DESCRIPTION" +>DESCRIPTION</a></h2> + +<p>Explore Apache Lucy’s basic functionality by starting with a minimalist CGI search app based on Lucy::Simple and transforming it, +step by step, +into an “advanced search” interface utilizing more flexible core modules like <a href="../../Lucy/Index/Indexer.html" class="podlinkpod" +>Indexer</a> and <a href="../../Lucy/Search/IndexSearcher.html" class="podlinkpod" +>IndexSearcher</a>.</p> + +<h3><a class='u' +name="Chapters" +>Chapters</a></h3> + +<ul> +<li><a href="../../Lucy/Docs/Tutorial/SimpleTutorial.html" class="podlinkpod" +>SimpleTutorial</a> - Build a bare-bones search app using Lucy::Simple.</li> + +<li><a href="../../Lucy/Docs/Tutorial/BeyondSimpleTutorial.html" class="podlinkpod" +>BeyondSimpleTutorial</a> - Rebuild the app using core classes like <a href="../../Lucy/Index/Indexer.html" class="podlinkpod" +>Indexer</a> and <a href="../../Lucy/Search/IndexSearcher.html" class="podlinkpod" +>IndexSearcher</a> in place of Lucy::Simple.</li> + +<li><a href="../../Lucy/Docs/Tutorial/FieldTypeTutorial.html" class="podlinkpod" +>FieldTypeTutorial</a> - Experiment with different field characteristics using subclasses of <a href="../../Lucy/Plan/FieldType.html" class="podlinkpod" +>FieldType</a>.</li> + +<li><a href="../../Lucy/Docs/Tutorial/AnalysisTutorial.html" class="podlinkpod" +>AnalysisTutorial</a> - Examine how the choice of <a href="../../Lucy/Analysis/Analyzer.html" class="podlinkpod" +>Analyzer</a> subclass affects search results.</li> + +<li><a href="../../Lucy/Docs/Tutorial/HighlighterTutorial.html" class="podlinkpod" +>HighlighterTutorial</a> - Augment search results with highlighted excerpts.</li> + +<li><a href="../../Lucy/Docs/Tutorial/QueryObjectsTutorial.html" class="podlinkpod" +>QueryObjectsTutorial</a> - Unlock advanced search features by using Query objects instead of query strings.</li> +</ul> + +<h3><a class='u' +name="Source_materials" +>Source materials</a></h3> + +<p>The source material used by the tutorial app – a multi-text-file presentation of the United States constitution – can be found in the <code>sample</code> directory at the root of the Lucy distribution, +along with finished indexing and search apps.</p> + +<pre>sample/indexer.pl # indexing app +sample/search.cgi # search app +sample/us_constitution # corpus</pre> + +<h3><a class='u' +name="Conventions" +>Conventions</a></h3> + +<p>The user is expected to be familiar with OO Perl and basic CGI programming.</p> + +<p>The code in this tutorial assumes a Unix-flavored operating system and the Apache webserver, +but will work with minor modifications on other setups.</p> + +<h3><a class='u' +name="See_also" +>See also</a></h3> + +<p>More advanced and esoteric subjects are covered in <a href="../../Lucy/Docs/Cookbook.html" class="podlinkpod" +>Cookbook</a>.</p> + +</div> + + </div> <!-- lucy-main_content_box --> + <div class="clear"></div> + + </div> <!-- lucy-main_content --> + + <div id="lucy-copyright" class="container_16"> + <p>Copyright © 2010-2015 The Apache Software Foundation, Licensed under the + <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>. + <br/> + Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache Lucy project logo are trademarks of The + Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their + respective owners. + </p> + </div> <!-- lucy-copyright --> + + </div> <!-- lucy-rigid_wrapper --> + + </body> +</html> Added: websites/staging/lucy/trunk/content/docs/0.5.0/perl/Lucy/Docs/Tutorial/AnalysisTutorial.html ============================================================================== --- websites/staging/lucy/trunk/content/docs/0.5.0/perl/Lucy/Docs/Tutorial/AnalysisTutorial.html (added) +++ websites/staging/lucy/trunk/content/docs/0.5.0/perl/Lucy/Docs/Tutorial/AnalysisTutorial.html Wed Sep 28 12:07:48 2016 @@ -0,0 +1,195 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> +<html lang="en"> + <head> + <meta http-equiv="Content-Type" content="text/html;charset=UTF-8"> + <title>Lucy::Docs::Tutorial::AnalysisTutorial â Apache Lucy Documentation</title> + <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css"> + </head> + + <body> + + <div id="lucy-rigid_wrapper"> + + <div id="lucy-top" class="container_16 lucy-white_box_3d"> + + <div id="lucy-logo_box" class="grid_8"> + <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache Lucyâ¢"></a> + </div> <!-- lucy-logo_box --> + + <div #id="lucy-top_nav_box" class="grid_8"> + <div id="lucy-top_nav_bar" class="container_8"> + <ul> + <li><a href="http://www.apache.org/" title="Apache Software Foundation">Apache Software Foundation</a></li> + <li><a href="http://www.apache.org/licenses/" title="License">License</a></li> + <li><a href="http://www.apache.org/foundation/sponsorship.html" title="Sponsorship">Sponsorship</a></li> + <li><a href="http://www.apache.org/foundation/thanks.html" title="Thanks">Thanks</a></li> + <li><a href="http://www.apache.org/security/ " title="Security">Security</a></li> + </ul> + </div> <!-- lucy-top_nav_bar --> + <p><a href="http://www.apache.org/">Apache</a> » <a href="/">Lucy</a> » <a href="/docs/">Docs</a> » <a href="/docs/0.5.0/">0.5.0</a> » <a href="/docs/0.5.0/perl/">Perl</a> » <a href="/docs/0.5.0/perl/Lucy/">Lucy</a> » <a href="/docs/0.5.0/perl/Lucy/Docs/">Docs</a> » <a href="/docs/0.5.0/perl/Lucy/Docs/Tutorial/">Tutorial</a></p> + <form name="lucy-top_search_box" id="lucy-top_search_box" action="http://www.google.com/search" method="get"> + <input value="*.apache.org" name="sitesearch" type="hidden"/> + <input type="text" name="q" id="query" style="width:85%"> + <input type="submit" id="submit" value="Search"> + </form> + </div> <!-- lucy-top_nav_box --> + + <div class="clear"></div> + + </div> <!-- lucy-top --> + + <div id="lucy-main_content" class="container_16 lucy-white_box_3d"> + + <div class="grid_4" id="lucy-left_nav_box"> + <h6>About</h6> + <ul> + <li><a href="/">Welcome</a></li> + <li><a href="/clownfish.html">Clownfish</a></li> + <li><a href="/faq.html">FAQ</a></li> + <li><a href="/people.html">People</a></li> + </ul> + <h6>Resources</h6> + <ul> + <li><a href="/download.html">Download</a></li> + <li><a href="/mailing_lists.html">Mailing Lists</a></li> + <li><a href="/docs/">Documentation</a></li> + <li><a href="http://wiki.apache.org/lucy/">Wiki</a></li> + <li><a href="https://issues.apache.org/jira/browse/LUCY">Issue Tracker</a></li> + <li><a href="/version_control.html">Version Control</a></li> + </ul> + <h6>Related Projects</h6> + <ul> + <li><a href="http://lucene.apache.org/core/">Lucene</a></li> + <li><a href="http://dezi.org/">Dezi</a></li> + <li><a href="http://lucene.apache.org/solr/">Solr</a></li> + <li><a href="http://lucenenet.apache.org/">Lucene.NET</a></li> + <li><a href="http://lucene.apache.org/pylucene/">PyLucene</a></li> + </ul> + </div> <!-- lucy-left_nav_box --> + + <div id="lucy-main_content_box" class="grid_9"> + <div> +<a name='___top' class='dummyTopAnchor' ></a> + +<h2><a class='u' +name="NAME" +>NAME</a></h2> + +<p>Lucy::Docs::Tutorial::AnalysisTutorial - How to choose and use Analyzers.</p> + +<h2><a class='u' +name="DESCRIPTION" +>DESCRIPTION</a></h2> + +<p>Try swapping out the EasyAnalyzer in our Schema for a <a href="../../../Lucy/Analysis/StandardTokenizer.html" class="podlinkpod" +>StandardTokenizer</a>:</p> + +<pre>my $tokenizer = Lucy::Analysis::StandardTokenizer->new; +my $type = Lucy::Plan::FullTextType->new( + analyzer => $tokenizer, +);</pre> + +<p>Search for <code>senate</code>, +<code>Senate</code>, +and <code>Senator</code> before and after making the change and re-indexing.</p> + +<p>Under EasyAnalyzer, +the results are identical for all three searches, +but under StandardTokenizer, +searches are case-sensitive, +and the result sets for <code>Senate</code> and <code>Senator</code> are distinct.</p> + +<h3><a class='u' +name="EasyAnalyzer" +>EasyAnalyzer</a></h3> + +<p>What’s happening is that <a href="../../../Lucy/Analysis/EasyAnalyzer.html" class="podlinkpod" +>EasyAnalyzer</a> is performing more aggressive processing than StandardTokenizer. +In addition to tokenizing, +it’s also converting all text to lower case so that searches are case-insensitive, +and using a “stemming” algorithm to reduce related words to a common stem (<code>senat</code>, +in this case).</p> + +<p>EasyAnalyzer is actually multiple Analyzers wrapped up in a single package. +In this case, +it’s three-in-one, +since specifying a EasyAnalyzer with <code>language => 'en'</code> is equivalent to this snippet creating a <a href="../../../Lucy/Analysis/PolyAnalyzer.html" class="podlinkpod" +>PolyAnalyzer</a>:</p> + +<pre>my $tokenizer = Lucy::Analysis::StandardTokenizer->new; +my $normalizer = Lucy::Analysis::Normalizer->new; +my $stemmer = Lucy::Analysis::SnowballStemmer->new( language => 'en' ); +my $polyanalyzer = Lucy::Analysis::PolyAnalyzer->new( + analyzers => [ $tokenizer, $normalizer, $stemmer ], +);</pre> + +<p>You can add or subtract Analyzers from there if you like. +Try adding a fourth Analyzer, +a SnowballStopFilter for suppressing “stopwords” like “the”, +“if”, +and “maybe”.</p> + +<pre>my $stopfilter = Lucy::Analysis::SnowballStopFilter->new( + language => 'en', +); +my $polyanalyzer = Lucy::Analysis::PolyAnalyzer->new( + analyzers => [ $tokenizer, $normalizer, $stopfilter, $stemmer ], +);</pre> + +<p>Also, +try removing the SnowballStemmer.</p> + +<pre>my $polyanalyzer = Lucy::Analysis::PolyAnalyzer->new( + analyzers => [ $tokenizer, $normalizer ], +);</pre> + +<p>The original choice of a stock English EasyAnalyzer probably still yields the best results for this document collection, +but you get the idea: sometimes you want a different Analyzer.</p> + +<h3><a class='u' +name="When_the_best_Analyzer_is_no_Analyzer" +>When the best Analyzer is no Analyzer</a></h3> + +<p>Sometimes you don’t want an Analyzer at all. +That was true for our “url” field because we didn’t need it to be searchable, +but it’s also true for certain types of searchable fields. +For instance, +“category” fields are often set up to match exactly or not at all, +as are fields like “last_name” (because you may not want to conflate results for “Humphrey” and “Humphries”).</p> + +<p>To specify that there should be no analysis performed at all, +use StringType:</p> + +<pre>my $type = Lucy::Plan::StringType->new; +$schema->spec_field( name => 'category', type => $type );</pre> + +<h3><a class='u' +name="Highlighting_up_next" +>Highlighting up next</a></h3> + +<p>In our next tutorial chapter, +<a href="../../../Lucy/Docs/Tutorial/HighlighterTutorial.html" class="podlinkpod" +>HighlighterTutorial</a>, +we’ll add highlighted excerpts from the “content” field to our search results.</p> + +</div> + + </div> <!-- lucy-main_content_box --> + <div class="clear"></div> + + </div> <!-- lucy-main_content --> + + <div id="lucy-copyright" class="container_16"> + <p>Copyright © 2010-2015 The Apache Software Foundation, Licensed under the + <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>. + <br/> + Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache Lucy project logo are trademarks of The + Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their + respective owners. + </p> + </div> <!-- lucy-copyright --> + + </div> <!-- lucy-rigid_wrapper --> + + </body> +</html> Added: websites/staging/lucy/trunk/content/docs/0.5.0/perl/Lucy/Docs/Tutorial/BeyondSimpleTutorial.html ============================================================================== --- websites/staging/lucy/trunk/content/docs/0.5.0/perl/Lucy/Docs/Tutorial/BeyondSimpleTutorial.html (added) +++ websites/staging/lucy/trunk/content/docs/0.5.0/perl/Lucy/Docs/Tutorial/BeyondSimpleTutorial.html Wed Sep 28 12:07:48 2016 @@ -0,0 +1,246 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> +<html lang="en"> + <head> + <meta http-equiv="Content-Type" content="text/html;charset=UTF-8"> + <title>Lucy::Docs::Tutorial::BeyondSimpleTutorial â Apache Lucy Documentation</title> + <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css"> + </head> + + <body> + + <div id="lucy-rigid_wrapper"> + + <div id="lucy-top" class="container_16 lucy-white_box_3d"> + + <div id="lucy-logo_box" class="grid_8"> + <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache Lucyâ¢"></a> + </div> <!-- lucy-logo_box --> + + <div #id="lucy-top_nav_box" class="grid_8"> + <div id="lucy-top_nav_bar" class="container_8"> + <ul> + <li><a href="http://www.apache.org/" title="Apache Software Foundation">Apache Software Foundation</a></li> + <li><a href="http://www.apache.org/licenses/" title="License">License</a></li> + <li><a href="http://www.apache.org/foundation/sponsorship.html" title="Sponsorship">Sponsorship</a></li> + <li><a href="http://www.apache.org/foundation/thanks.html" title="Thanks">Thanks</a></li> + <li><a href="http://www.apache.org/security/ " title="Security">Security</a></li> + </ul> + </div> <!-- lucy-top_nav_bar --> + <p><a href="http://www.apache.org/">Apache</a> » <a href="/">Lucy</a> » <a href="/docs/">Docs</a> » <a href="/docs/0.5.0/">0.5.0</a> » <a href="/docs/0.5.0/perl/">Perl</a> » <a href="/docs/0.5.0/perl/Lucy/">Lucy</a> » <a href="/docs/0.5.0/perl/Lucy/Docs/">Docs</a> » <a href="/docs/0.5.0/perl/Lucy/Docs/Tutorial/">Tutorial</a></p> + <form name="lucy-top_search_box" id="lucy-top_search_box" action="http://www.google.com/search" method="get"> + <input value="*.apache.org" name="sitesearch" type="hidden"/> + <input type="text" name="q" id="query" style="width:85%"> + <input type="submit" id="submit" value="Search"> + </form> + </div> <!-- lucy-top_nav_box --> + + <div class="clear"></div> + + </div> <!-- lucy-top --> + + <div id="lucy-main_content" class="container_16 lucy-white_box_3d"> + + <div class="grid_4" id="lucy-left_nav_box"> + <h6>About</h6> + <ul> + <li><a href="/">Welcome</a></li> + <li><a href="/clownfish.html">Clownfish</a></li> + <li><a href="/faq.html">FAQ</a></li> + <li><a href="/people.html">People</a></li> + </ul> + <h6>Resources</h6> + <ul> + <li><a href="/download.html">Download</a></li> + <li><a href="/mailing_lists.html">Mailing Lists</a></li> + <li><a href="/docs/">Documentation</a></li> + <li><a href="http://wiki.apache.org/lucy/">Wiki</a></li> + <li><a href="https://issues.apache.org/jira/browse/LUCY">Issue Tracker</a></li> + <li><a href="/version_control.html">Version Control</a></li> + </ul> + <h6>Related Projects</h6> + <ul> + <li><a href="http://lucene.apache.org/core/">Lucene</a></li> + <li><a href="http://dezi.org/">Dezi</a></li> + <li><a href="http://lucene.apache.org/solr/">Solr</a></li> + <li><a href="http://lucenenet.apache.org/">Lucene.NET</a></li> + <li><a href="http://lucene.apache.org/pylucene/">PyLucene</a></li> + </ul> + </div> <!-- lucy-left_nav_box --> + + <div id="lucy-main_content_box" class="grid_9"> + <div> +<a name='___top' class='dummyTopAnchor' ></a> + +<h2><a class='u' +name="NAME" +>NAME</a></h2> + +<p>Lucy::Docs::Tutorial::BeyondSimpleTutorial - A more flexible app structure.</p> + +<h2><a class='u' +name="DESCRIPTION" +>DESCRIPTION</a></h2> + +<h3><a class='u' +name="Goal" +>Goal</a></h3> + +<p>In this tutorial chapter, +we’ll refactor the apps we built in <a href="../../../Lucy/Docs/Tutorial/SimpleTutorial.html" class="podlinkpod" +>SimpleTutorial</a> so that they look exactly the same from the end user’s point of view, +but offer the developer greater possibilites for expansion.</p> + +<p>To achieve this, +we’ll ditch Lucy::Simple and replace it with the classes that it uses internally:</p> + +<ul> +<li><a href="../../../Lucy/Plan/Schema.html" class="podlinkpod" +>Schema</a> - Plan out your index.</li> + +<li><a href="../../../Lucy/Plan/FullTextType.html" class="podlinkpod" +>FullTextType</a> - Field type for full text search.</li> + +<li><a href="../../../Lucy/Analysis/EasyAnalyzer.html" class="podlinkpod" +>EasyAnalyzer</a> - A one-size-fits-all parser/tokenizer.</li> + +<li><a href="../../../Lucy/Index/Indexer.html" class="podlinkpod" +>Indexer</a> - Manipulate index content.</li> + +<li><a href="../../../Lucy/Search/IndexSearcher.html" class="podlinkpod" +>IndexSearcher</a> - Search an index.</li> + +<li><a href="../../../Lucy/Search/Hits.html" class="podlinkpod" +>Hits</a> - Iterate over hits returned by a Searcher.</li> +</ul> + +<h3><a class='u' +name="Adaptations_to_indexer.pl" +>Adaptations to indexer.pl</a></h3> + +<p>After we load our modules…</p> + +<pre>use Lucy::Plan::Schema; +use Lucy::Plan::FullTextType; +use Lucy::Analysis::EasyAnalyzer; +use Lucy::Index::Indexer;</pre> + +<p>… the first item we’re going need is a <a href="../../../Lucy/Plan/Schema.html" class="podlinkpod" +>Schema</a>.</p> + +<p>The primary job of a Schema is to specify what fields are available and how they’re defined. +We’ll start off with three fields: title, +content and url.</p> + +<pre># Create Schema. +my $schema = Lucy::Plan::Schema->new; +my $easyanalyzer = Lucy::Analysis::EasyAnalyzer->new( + language => 'en', +); +my $type = Lucy::Plan::FullTextType->new( + analyzer => $easyanalyzer, +); +$schema->spec_field( name => 'title', type => $type ); +$schema->spec_field( name => 'content', type => $type ); +$schema->spec_field( name => 'url', type => $type );</pre> + +<p>All of the fields are spec’d out using the <a href="../../../Lucy/Plan/FullTextType.html" class="podlinkpod" +>FullTextType</a> FieldType, +indicating that they will be searchable as “full text” – which means that they can be searched for individual words. +The “analyzer”, +which is unique to FullTextType fields, +is what breaks up the text into searchable tokens.</p> + +<p>Next, +we’ll swap our Lucy::Simple object out for an <a href="../../../Lucy/Index/Indexer.html" class="podlinkpod" +>Indexer</a>. +The substitution will be straightforward because Simple has merely been serving as a thin wrapper around an inner Indexer, +and we’ll just be peeling away the wrapper.</p> + +<p>First, +replace the constructor:</p> + +<pre># Create Indexer. +my $indexer = Lucy::Index::Indexer->new( + index => $path_to_index, + schema => $schema, + create => 1, + truncate => 1, +);</pre> + +<p>Next, +have the <code>indexer</code> object <a href="../../../Lucy/Index/Indexer.html#add_doc" class="podlinkpod" +>add_doc()</a> where we were having the <code>lucy</code> object adding the document before:</p> + +<pre>foreach my $filename (@filenames) { + my $doc = parse_file($filename); + $indexer->add_doc($doc); +}</pre> + +<p>There’s only one extra step required: at the end of the app, +you must call commit() explicitly to close the indexing session and commit your changes. +(Lucy::Simple hides this detail, +calling commit() implicitly when it needs to).</p> + +<pre>$indexer->commit;</pre> + +<h3><a class='u' +name="Adaptations_to_search.cgi" +>Adaptations to search.cgi</a></h3> + +<p>In our search app as in our indexing app, +Lucy::Simple has served as a thin wrapper – this time around <a href="../../../Lucy/Search/IndexSearcher.html" class="podlinkpod" +>IndexSearcher</a> and <a href="../../../Lucy/Search/Hits.html" class="podlinkpod" +>Hits</a>. +Swapping out Simple for these two classes is also straightforward:</p> + +<pre>use Lucy::Search::IndexSearcher; + +my $searcher = Lucy::Search::IndexSearcher->new( + index => $path_to_index, +); +my $hits = $searcher->hits( # returns a Hits object, not a hit count + query => $q, + offset => $offset, + num_wanted => $page_size, +); +my $hit_count = $hits->total_hits; # get the hit count here + +... + +while ( my $hit = $hits->next ) { + ... +}</pre> + +<h3><a class='u' +name="Hooray!" +>Hooray!</a></h3> + +<p>Congratulations! +Your apps do the same thing as before… but now they’ll be easier to customize.</p> + +<p>In our next chapter, +<a href="../../../Lucy/Docs/Tutorial/FieldTypeTutorial.html" class="podlinkpod" +>FieldTypeTutorial</a>, +we’ll explore how to assign different behaviors to different fields.</p> + +</div> + + </div> <!-- lucy-main_content_box --> + <div class="clear"></div> + + </div> <!-- lucy-main_content --> + + <div id="lucy-copyright" class="container_16"> + <p>Copyright © 2010-2015 The Apache Software Foundation, Licensed under the + <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>. + <br/> + Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache Lucy project logo are trademarks of The + Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their + respective owners. + </p> + </div> <!-- lucy-copyright --> + + </div> <!-- lucy-rigid_wrapper --> + + </body> +</html> Added: websites/staging/lucy/trunk/content/docs/0.5.0/perl/Lucy/Docs/Tutorial/FieldTypeTutorial.html ============================================================================== --- websites/staging/lucy/trunk/content/docs/0.5.0/perl/Lucy/Docs/Tutorial/FieldTypeTutorial.html (added) +++ websites/staging/lucy/trunk/content/docs/0.5.0/perl/Lucy/Docs/Tutorial/FieldTypeTutorial.html Wed Sep 28 12:07:48 2016 @@ -0,0 +1,169 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> +<html lang="en"> + <head> + <meta http-equiv="Content-Type" content="text/html;charset=UTF-8"> + <title>Lucy::Docs::Tutorial::FieldTypeTutorial â Apache Lucy Documentation</title> + <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css"> + </head> + + <body> + + <div id="lucy-rigid_wrapper"> + + <div id="lucy-top" class="container_16 lucy-white_box_3d"> + + <div id="lucy-logo_box" class="grid_8"> + <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache Lucyâ¢"></a> + </div> <!-- lucy-logo_box --> + + <div #id="lucy-top_nav_box" class="grid_8"> + <div id="lucy-top_nav_bar" class="container_8"> + <ul> + <li><a href="http://www.apache.org/" title="Apache Software Foundation">Apache Software Foundation</a></li> + <li><a href="http://www.apache.org/licenses/" title="License">License</a></li> + <li><a href="http://www.apache.org/foundation/sponsorship.html" title="Sponsorship">Sponsorship</a></li> + <li><a href="http://www.apache.org/foundation/thanks.html" title="Thanks">Thanks</a></li> + <li><a href="http://www.apache.org/security/ " title="Security">Security</a></li> + </ul> + </div> <!-- lucy-top_nav_bar --> + <p><a href="http://www.apache.org/">Apache</a> » <a href="/">Lucy</a> » <a href="/docs/">Docs</a> » <a href="/docs/0.5.0/">0.5.0</a> » <a href="/docs/0.5.0/perl/">Perl</a> » <a href="/docs/0.5.0/perl/Lucy/">Lucy</a> » <a href="/docs/0.5.0/perl/Lucy/Docs/">Docs</a> » <a href="/docs/0.5.0/perl/Lucy/Docs/Tutorial/">Tutorial</a></p> + <form name="lucy-top_search_box" id="lucy-top_search_box" action="http://www.google.com/search" method="get"> + <input value="*.apache.org" name="sitesearch" type="hidden"/> + <input type="text" name="q" id="query" style="width:85%"> + <input type="submit" id="submit" value="Search"> + </form> + </div> <!-- lucy-top_nav_box --> + + <div class="clear"></div> + + </div> <!-- lucy-top --> + + <div id="lucy-main_content" class="container_16 lucy-white_box_3d"> + + <div class="grid_4" id="lucy-left_nav_box"> + <h6>About</h6> + <ul> + <li><a href="/">Welcome</a></li> + <li><a href="/clownfish.html">Clownfish</a></li> + <li><a href="/faq.html">FAQ</a></li> + <li><a href="/people.html">People</a></li> + </ul> + <h6>Resources</h6> + <ul> + <li><a href="/download.html">Download</a></li> + <li><a href="/mailing_lists.html">Mailing Lists</a></li> + <li><a href="/docs/">Documentation</a></li> + <li><a href="http://wiki.apache.org/lucy/">Wiki</a></li> + <li><a href="https://issues.apache.org/jira/browse/LUCY">Issue Tracker</a></li> + <li><a href="/version_control.html">Version Control</a></li> + </ul> + <h6>Related Projects</h6> + <ul> + <li><a href="http://lucene.apache.org/core/">Lucene</a></li> + <li><a href="http://dezi.org/">Dezi</a></li> + <li><a href="http://lucene.apache.org/solr/">Solr</a></li> + <li><a href="http://lucenenet.apache.org/">Lucene.NET</a></li> + <li><a href="http://lucene.apache.org/pylucene/">PyLucene</a></li> + </ul> + </div> <!-- lucy-left_nav_box --> + + <div id="lucy-main_content_box" class="grid_9"> + <div> +<a name='___top' class='dummyTopAnchor' ></a> + +<h2><a class='u' +name="NAME" +>NAME</a></h2> + +<p>Lucy::Docs::Tutorial::FieldTypeTutorial - Specify per-field properties and behaviors.</p> + +<h2><a class='u' +name="DESCRIPTION" +>DESCRIPTION</a></h2> + +<p>The Schema we used in the last chapter specifies three fields:</p> + +<pre>my $type = Lucy::Plan::FullTextType->new( + analyzer => $easyanalyzer, +); +$schema->spec_field( name => 'title', type => $type ); +$schema->spec_field( name => 'content', type => $type ); +$schema->spec_field( name => 'url', type => $type );</pre> + +<p>Since they are all defined as “full text” fields, +they are all searchable – including the <code>url</code> field, +a dubious choice. +Some URLs contain meaningful information, +but these don’t, +really:</p> + +<pre>http://example.com/us_constitution/amend1.txt</pre> + +<p>We may as well not bother indexing the URL content. +To achieve that we need to assign the <code>url</code> field to a different FieldType.</p> + +<h3><a class='u' +name="StringType" +>StringType</a></h3> + +<p>Instead of FullTextType, +we’ll use a <a href="../../../Lucy/Plan/StringType.html" class="podlinkpod" +>StringType</a>, +which doesn’t use an Analyzer to break up text into individual fields. +Furthermore, +we’ll mark this StringType as unindexed, +so that its content won’t be searchable at all.</p> + +<pre>my $url_type = Lucy::Plan::StringType->new( indexed => 0 ); +$schema->spec_field( name => 'url', type => $url_type );</pre> + +<p>To observe the change in behavior, +try searching for <code>us_constitution</code> both before and after changing the Schema and re-indexing.</p> + +<h3><a class='u' +name="Toggling_(8216)stored(8217)" +>Toggling ‘stored’</a></h3> + +<p>For a taste of other FieldType possibilities, +try turning off <code>stored</code> for one or more fields.</p> + +<pre>my $content_type = Lucy::Plan::FullTextType->new( + analyzer => $easyanalyzer, + stored => 0, +);</pre> + +<p>Turning off <code>stored</code> for either <code>title</code> or <code>url</code> mangles our results page, +but since we’re not displaying <code>content</code>, +turning it off for <code>content</code> has no effect – except on index size.</p> + +<h3><a class='u' +name="Analyzers_up_next" +>Analyzers up next</a></h3> + +<p>Analyzers play a crucial role in the behavior of FullTextType fields. +In our next tutorial chapter, +<a href="../../../Lucy/Docs/Tutorial/AnalysisTutorial.html" class="podlinkpod" +>AnalysisTutorial</a>, +we’ll see how changing up the Analyzer changes search results.</p> + +</div> + + </div> <!-- lucy-main_content_box --> + <div class="clear"></div> + + </div> <!-- lucy-main_content --> + + <div id="lucy-copyright" class="container_16"> + <p>Copyright © 2010-2015 The Apache Software Foundation, Licensed under the + <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>. + <br/> + Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache Lucy project logo are trademarks of The + Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their + respective owners. + </p> + </div> <!-- lucy-copyright --> + + </div> <!-- lucy-rigid_wrapper --> + + </body> +</html> Added: websites/staging/lucy/trunk/content/docs/0.5.0/perl/Lucy/Docs/Tutorial/HighlighterTutorial.html ============================================================================== --- websites/staging/lucy/trunk/content/docs/0.5.0/perl/Lucy/Docs/Tutorial/HighlighterTutorial.html (added) +++ websites/staging/lucy/trunk/content/docs/0.5.0/perl/Lucy/Docs/Tutorial/HighlighterTutorial.html Wed Sep 28 12:07:48 2016 @@ -0,0 +1,164 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> +<html lang="en"> + <head> + <meta http-equiv="Content-Type" content="text/html;charset=UTF-8"> + <title>Lucy::Docs::Tutorial::HighlighterTutorial â Apache Lucy Documentation</title> + <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css"> + </head> + + <body> + + <div id="lucy-rigid_wrapper"> + + <div id="lucy-top" class="container_16 lucy-white_box_3d"> + + <div id="lucy-logo_box" class="grid_8"> + <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache Lucyâ¢"></a> + </div> <!-- lucy-logo_box --> + + <div #id="lucy-top_nav_box" class="grid_8"> + <div id="lucy-top_nav_bar" class="container_8"> + <ul> + <li><a href="http://www.apache.org/" title="Apache Software Foundation">Apache Software Foundation</a></li> + <li><a href="http://www.apache.org/licenses/" title="License">License</a></li> + <li><a href="http://www.apache.org/foundation/sponsorship.html" title="Sponsorship">Sponsorship</a></li> + <li><a href="http://www.apache.org/foundation/thanks.html" title="Thanks">Thanks</a></li> + <li><a href="http://www.apache.org/security/ " title="Security">Security</a></li> + </ul> + </div> <!-- lucy-top_nav_bar --> + <p><a href="http://www.apache.org/">Apache</a> » <a href="/">Lucy</a> » <a href="/docs/">Docs</a> » <a href="/docs/0.5.0/">0.5.0</a> » <a href="/docs/0.5.0/perl/">Perl</a> » <a href="/docs/0.5.0/perl/Lucy/">Lucy</a> » <a href="/docs/0.5.0/perl/Lucy/Docs/">Docs</a> » <a href="/docs/0.5.0/perl/Lucy/Docs/Tutorial/">Tutorial</a></p> + <form name="lucy-top_search_box" id="lucy-top_search_box" action="http://www.google.com/search" method="get"> + <input value="*.apache.org" name="sitesearch" type="hidden"/> + <input type="text" name="q" id="query" style="width:85%"> + <input type="submit" id="submit" value="Search"> + </form> + </div> <!-- lucy-top_nav_box --> + + <div class="clear"></div> + + </div> <!-- lucy-top --> + + <div id="lucy-main_content" class="container_16 lucy-white_box_3d"> + + <div class="grid_4" id="lucy-left_nav_box"> + <h6>About</h6> + <ul> + <li><a href="/">Welcome</a></li> + <li><a href="/clownfish.html">Clownfish</a></li> + <li><a href="/faq.html">FAQ</a></li> + <li><a href="/people.html">People</a></li> + </ul> + <h6>Resources</h6> + <ul> + <li><a href="/download.html">Download</a></li> + <li><a href="/mailing_lists.html">Mailing Lists</a></li> + <li><a href="/docs/">Documentation</a></li> + <li><a href="http://wiki.apache.org/lucy/">Wiki</a></li> + <li><a href="https://issues.apache.org/jira/browse/LUCY">Issue Tracker</a></li> + <li><a href="/version_control.html">Version Control</a></li> + </ul> + <h6>Related Projects</h6> + <ul> + <li><a href="http://lucene.apache.org/core/">Lucene</a></li> + <li><a href="http://dezi.org/">Dezi</a></li> + <li><a href="http://lucene.apache.org/solr/">Solr</a></li> + <li><a href="http://lucenenet.apache.org/">Lucene.NET</a></li> + <li><a href="http://lucene.apache.org/pylucene/">PyLucene</a></li> + </ul> + </div> <!-- lucy-left_nav_box --> + + <div id="lucy-main_content_box" class="grid_9"> + <div> +<a name='___top' class='dummyTopAnchor' ></a> + +<h2><a class='u' +name="NAME" +>NAME</a></h2> + +<p>Lucy::Docs::Tutorial::HighlighterTutorial - Augment search results with highlighted excerpts.</p> + +<h2><a class='u' +name="DESCRIPTION" +>DESCRIPTION</a></h2> + +<p>Adding relevant excerpts with highlighted search terms to your search results display makes it much easier for end users to scan the page and assess which hits look promising, +dramatically improving their search experience.</p> + +<h3><a class='u' +name="Adaptations_to_indexer.pl" +>Adaptations to indexer.pl</a></h3> + +<p><a href="../../../Lucy/Highlight/Highlighter.html" class="podlinkpod" +>Highlighter</a> uses information generated at index time. +To save resources, +highlighting is disabled by default and must be turned on for individual fields.</p> + +<pre>my $highlightable = Lucy::Plan::FullTextType->new( + analyzer => $easyanalyzer, + highlightable => 1, +); +$schema->spec_field( name => 'content', type => $highlightable );</pre> + +<h3><a class='u' +name="Adaptations_to_search.cgi" +>Adaptations to search.cgi</a></h3> + +<p>To add highlighting and excerpting to the search.cgi sample app, +create a <code>$highlighter</code> object outside the hits iterating loop…</p> + +<pre>my $highlighter = Lucy::Highlight::Highlighter->new( + searcher => $searcher, + query => $q, + field => 'content' +);</pre> + +<p>… then modify the loop and the per-hit display to generate and include the excerpt.</p> + +<pre># Create result list. +my $report = ''; +while ( my $hit = $hits->next ) { + my $score = sprintf( "%0.3f", $hit->get_score ); + my $excerpt = $highlighter->create_excerpt($hit); + $report .= qq| + <p> + <a href="$hit->{url}"><strong>$hit->{title}</strong></a> + <em>$score</em> + <br /> + $excerpt + <br /> + <span class="excerptURL">$hit->{url}</span> + </p> + |; +}</pre> + +<h3><a class='u' +name="Next_chapter:_Query_objects" +>Next chapter: Query objects</a></h3> + +<p>Our next tutorial chapter, +<a href="../../../Lucy/Docs/Tutorial/QueryObjectsTutorial.html" class="podlinkpod" +>QueryObjectsTutorial</a>, +illustrates how to build an “advanced search” interface using <a href="../../../Lucy/Search/Query.html" class="podlinkpod" +>Query</a> objects instead of query strings.</p> + +</div> + + </div> <!-- lucy-main_content_box --> + <div class="clear"></div> + + </div> <!-- lucy-main_content --> + + <div id="lucy-copyright" class="container_16"> + <p>Copyright © 2010-2015 The Apache Software Foundation, Licensed under the + <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>. + <br/> + Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache Lucy project logo are trademarks of The + Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their + respective owners. + </p> + </div> <!-- lucy-copyright --> + + </div> <!-- lucy-rigid_wrapper --> + + </body> +</html> Added: websites/staging/lucy/trunk/content/docs/0.5.0/perl/Lucy/Docs/Tutorial/QueryObjectsTutorial.html ============================================================================== --- websites/staging/lucy/trunk/content/docs/0.5.0/perl/Lucy/Docs/Tutorial/QueryObjectsTutorial.html (added) +++ websites/staging/lucy/trunk/content/docs/0.5.0/perl/Lucy/Docs/Tutorial/QueryObjectsTutorial.html Wed Sep 28 12:07:48 2016 @@ -0,0 +1,290 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> +<html lang="en"> + <head> + <meta http-equiv="Content-Type" content="text/html;charset=UTF-8"> + <title>Lucy::Docs::Tutorial::QueryObjectsTutorial â Apache Lucy Documentation</title> + <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css"> + </head> + + <body> + + <div id="lucy-rigid_wrapper"> + + <div id="lucy-top" class="container_16 lucy-white_box_3d"> + + <div id="lucy-logo_box" class="grid_8"> + <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache Lucyâ¢"></a> + </div> <!-- lucy-logo_box --> + + <div #id="lucy-top_nav_box" class="grid_8"> + <div id="lucy-top_nav_bar" class="container_8"> + <ul> + <li><a href="http://www.apache.org/" title="Apache Software Foundation">Apache Software Foundation</a></li> + <li><a href="http://www.apache.org/licenses/" title="License">License</a></li> + <li><a href="http://www.apache.org/foundation/sponsorship.html" title="Sponsorship">Sponsorship</a></li> + <li><a href="http://www.apache.org/foundation/thanks.html" title="Thanks">Thanks</a></li> + <li><a href="http://www.apache.org/security/ " title="Security">Security</a></li> + </ul> + </div> <!-- lucy-top_nav_bar --> + <p><a href="http://www.apache.org/">Apache</a> » <a href="/">Lucy</a> » <a href="/docs/">Docs</a> » <a href="/docs/0.5.0/">0.5.0</a> » <a href="/docs/0.5.0/perl/">Perl</a> » <a href="/docs/0.5.0/perl/Lucy/">Lucy</a> » <a href="/docs/0.5.0/perl/Lucy/Docs/">Docs</a> » <a href="/docs/0.5.0/perl/Lucy/Docs/Tutorial/">Tutorial</a></p> + <form name="lucy-top_search_box" id="lucy-top_search_box" action="http://www.google.com/search" method="get"> + <input value="*.apache.org" name="sitesearch" type="hidden"/> + <input type="text" name="q" id="query" style="width:85%"> + <input type="submit" id="submit" value="Search"> + </form> + </div> <!-- lucy-top_nav_box --> + + <div class="clear"></div> + + </div> <!-- lucy-top --> + + <div id="lucy-main_content" class="container_16 lucy-white_box_3d"> + + <div class="grid_4" id="lucy-left_nav_box"> + <h6>About</h6> + <ul> + <li><a href="/">Welcome</a></li> + <li><a href="/clownfish.html">Clownfish</a></li> + <li><a href="/faq.html">FAQ</a></li> + <li><a href="/people.html">People</a></li> + </ul> + <h6>Resources</h6> + <ul> + <li><a href="/download.html">Download</a></li> + <li><a href="/mailing_lists.html">Mailing Lists</a></li> + <li><a href="/docs/">Documentation</a></li> + <li><a href="http://wiki.apache.org/lucy/">Wiki</a></li> + <li><a href="https://issues.apache.org/jira/browse/LUCY">Issue Tracker</a></li> + <li><a href="/version_control.html">Version Control</a></li> + </ul> + <h6>Related Projects</h6> + <ul> + <li><a href="http://lucene.apache.org/core/">Lucene</a></li> + <li><a href="http://dezi.org/">Dezi</a></li> + <li><a href="http://lucene.apache.org/solr/">Solr</a></li> + <li><a href="http://lucenenet.apache.org/">Lucene.NET</a></li> + <li><a href="http://lucene.apache.org/pylucene/">PyLucene</a></li> + </ul> + </div> <!-- lucy-left_nav_box --> + + <div id="lucy-main_content_box" class="grid_9"> + <div> +<a name='___top' class='dummyTopAnchor' ></a> + +<h2><a class='u' +name="NAME" +>NAME</a></h2> + +<p>Lucy::Docs::Tutorial::QueryObjectsTutorial - Use Query objects instead of query strings.</p> + +<h2><a class='u' +name="DESCRIPTION" +>DESCRIPTION</a></h2> + +<p>Until now, +our search app has had only a single search box. +In this tutorial chapter, +we’ll move towards an “advanced search” interface, +by adding a “category” drop-down menu. +Three new classes will be required:</p> + +<ul> +<li><a href="../../../Lucy/Search/QueryParser.html" class="podlinkpod" +>QueryParser</a> - Turn a query string into a <a href="../../../Lucy/Search/Query.html" class="podlinkpod" +>Query</a> object.</li> + +<li><a href="../../../Lucy/Search/TermQuery.html" class="podlinkpod" +>TermQuery</a> - Query for a specific term within a specific field.</li> + +<li><a href="../../../Lucy/Search/ANDQuery.html" class="podlinkpod" +>ANDQuery</a> - “AND” together multiple Query objects to produce an intersected result set.</li> +</ul> + +<h3><a class='u' +name="Adaptations_to_indexer.pl" +>Adaptations to indexer.pl</a></h3> + +<p>Our new “category” field will be a StringType field rather than a FullTextType field, +because we will only be looking for exact matches. +It needs to be indexed, +but since we won’t display its value, +it doesn’t need to be stored.</p> + +<pre>my $cat_type = Lucy::Plan::StringType->new( stored => 0 ); +$schema->spec_field( name => 'category', type => $cat_type );</pre> + +<p>There will be three possible values: “article”, +“amendment”, +and “preamble”, +which we’ll hack out of the source file’s name during our <code>parse_file</code> subroutine:</p> + +<pre>my $category + = $filename =~ /art/ ? 'article' + : $filename =~ /amend/ ? 'amendment' + : $filename =~ /preamble/ ? 'preamble' + : die "Can't derive category for $filename"; +return { + title => $title, + content => $bodytext, + url => "/us_constitution/$filename", + category => $category, +};</pre> + +<h3><a class='u' +name="Adaptations_to_search.cgi" +>Adaptations to search.cgi</a></h3> + +<p>The “category” constraint will be added to our search interface using an HTML “select” element (this routine will need to be integrated into the HTML generation section of search.cgi):</p> + +<pre># Build up the HTML "select" object for the "category" field. +sub generate_category_select { + my $cat = shift; + my $select = qq| + <select name="category"> + <option value="">All Sections</option> + <option value="article">Articles</option> + <option value="amendment">Amendments</option> + </select>|; + if ($cat) { + $select =~ s/"$cat"/"$cat" selected/; + } + return $select; +}</pre> + +<p>We’ll start off by loading our new modules and extracting our new CGI parameter.</p> + +<pre>use Lucy::Search::QueryParser; +use Lucy::Search::TermQuery; +use Lucy::Search::ANDQuery; + +... + +my $category = decode( "UTF-8", $cgi->param('category') || '' );</pre> + +<p>QueryParser’s constructor requires a “schema” argument. +We can get that from our IndexSearcher:</p> + +<pre># Create an IndexSearcher and a QueryParser. +my $searcher = Lucy::Search::IndexSearcher->new( + index => $path_to_index, +); +my $qparser = Lucy::Search::QueryParser->new( + schema => $searcher->get_schema, +);</pre> + +<p>Previously, +we have been handing raw query strings to IndexSearcher. +Behind the scenes, +IndexSearcher has been using a QueryParser to turn those query strings into Query objects. +Now, +we will bring QueryParser into the foreground and parse the strings explicitly.</p> + +<pre>my $query = $qparser->parse($q);</pre> + +<p>If the user has specified a category, +we’ll use an ANDQuery to join our parsed query together with a TermQuery representing the category.</p> + +<pre>if ($category) { + my $category_query = Lucy::Search::TermQuery->new( + field => 'category', + term => $category, + ); + $query = Lucy::Search::ANDQuery->new( + children => [ $query, $category_query ] + ); +}</pre> + +<p>Now when we execute the query…</p> + +<pre># Execute the Query and get a Hits object. +my $hits = $searcher->hits( + query => $query, + offset => $offset, + num_wanted => $page_size, +);</pre> + +<p>… we’ll get a result set which is the intersection of the parsed query and the category query.</p> + +<h3><a class='u' +name="Using_TermQuery_with_full_text_fields" +>Using TermQuery with full text fields</a></h3> + +<p>When querying full text fields, +the easiest way is to create query objects using QueryParser. +But sometimes you want to create TermQuery for a single term in a FullTextType field directly. +In this case, +we have to run the search term through the field’s analyzer to make sure it gets normalized in the same way as the field’s content.</p> + +<pre>sub make_term_query { + my ($field, $term) = @_; + + my $token; + my $type = $schema->fetch_type($field); + + if ( $type->isa('Lucy::Plan::FullTextType') ) { + # Run the term through the full text analysis chain. + my $analyzer = $type->get_analyzer; + my $tokens = $analyzer->split($term); + + if ( @$tokens != 1 ) { + # If the term expands to more than one token, or no + # tokens at all, it will never match a token in the + # full text field. + return Lucy::Search::NoMatchQuery->new; + } + + $token = $tokens->[0]; + } + else { + # Exact match for other types. + $token = $term; + } + + return Lucy::Search::TermQuery->new( + field => $field, + term => $token, + ); +}</pre> + +<h3><a class='u' +name="Congratulations!" +>Congratulations!</a></h3> + +<p>You’ve made it to the end of the tutorial.</p> + +<h3><a class='u' +name="See_Also" +>See Also</a></h3> + +<p>For additional thematic documentation, +see the Apache Lucy <a href="../../../Lucy/Docs/Cookbook.html" class="podlinkpod" +>Cookbook</a>.</p> + +<p>ANDQuery has a companion class, +<a href="../../../Lucy/Search/ORQuery.html" class="podlinkpod" +>ORQuery</a>, +and a close relative, +<a href="../../../Lucy/Search/RequiredOptionalQuery.html" class="podlinkpod" +>RequiredOptionalQuery</a>.</p> + +</div> + + </div> <!-- lucy-main_content_box --> + <div class="clear"></div> + + </div> <!-- lucy-main_content --> + + <div id="lucy-copyright" class="container_16"> + <p>Copyright © 2010-2015 The Apache Software Foundation, Licensed under the + <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>. + <br/> + Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache Lucy project logo are trademarks of The + Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their + respective owners. + </p> + </div> <!-- lucy-copyright --> + + </div> <!-- lucy-rigid_wrapper --> + + </body> +</html>