Re: [CODE4LIB] Providing Search Across PDFs

2013-02-21 Thread Jay Luker
On Wed, Feb 20, 2013 at 2:33 PM, Nathan Tallman wrote: > @Péter: The VuFind solution I mentioned is very similar to what you use > here. It uses Aperture (although soon to use Tika instead) to grab the > full-text and shoves everything inside a solr index. The import is managed > through a PHP scr

Re: [CODE4LIB] Providing Search Across PDFs

2013-02-21 Thread gibert julien
As far as the google custom search solution, I'd add that sometimes it yields weird results : for instance, we indexed a site and for a given search term, google says "about 16 results" (we have 10 hits displayed on the page) and when we click on page 2, it says "about 12 results" (showing the

Re: [CODE4LIB] Providing Search Across PDFs

2013-02-20 Thread Wilhelmina Randtke
Yes, Google Custom Search is not too bad, if your PDFs are sorted meaningfully by directory, and if you submit a site map to Google for more complete indexing. You can use Xenu to make a site map, put the site map online as a static XML file, and then use Google Webmaster Tools to pass the locatio

Re: [CODE4LIB] Providing Search Across PDFs

2013-02-20 Thread Nathan Tallman
@Jason and @Michele: I'd rather stay away from a Google solution. The reason being that they don't index everything. Our sitemap is submitted nightly and out of about 6000 URLs only 1500 are indexed. I can't make sure Google indexes the PDFs or be sure that they always will. (If I'm misunderstandin

Re: [CODE4LIB] Providing Search Across PDFs

2013-02-20 Thread Péter Király
an > Sent: Wednesday, February 20, 2013 12:54 PM > To: CODE4LIB@LISTSERV.ND.EDU > Subject: [CODE4LIB] Providing Search Across PDFs > > My institution is looking for ways to provide search across PDFs through our > website. Specifically, PDFs linked from finding aids. Ideally sear

Re: [CODE4LIB] Providing Search Across PDFs

2013-02-20 Thread Michele R Combs
What about just a Google site search? -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Nathan Tallman Sent: Wednesday, February 20, 2013 12:54 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] Providing Search Across PDFs My institution is

Re: [CODE4LIB] Providing Search Across PDFs

2013-02-20 Thread Jason Griffey
This might not fit your need exactly, but a Google Custom Search ( http://www.google.com/cse/) should do the job. You can have the Custom Search only index a given directory, or only PDFs, whichever is more useful. Jason On Wed, Feb 20, 2013 at 12:53 PM, Nathan Tallman wrote: > My institution

[CODE4LIB] Providing Search Across PDFs

2013-02-20 Thread Nathan Tallman
My institution is looking for ways to provide search across PDFs through our website. Specifically, PDFs linked from finding aids. Ideally searching within a collection's PDFs or possibly across all PDFs linked from all finding aids. We do not have a CMS or a digital repository. A digital reposito