Re: [CODE4LIB] indexing word documents using solr

2015-02-10 Thread Erik Hatcher
On Feb 10, 2015, at 12:43, Eric Lease Morgan emor...@nd.edu wrote: On Feb 10, 2015, at 11:46 AM, Erik Hatcher erikhatc...@mac.com wrote: First, with Solr 5, it’s this easy: Where can I download Solr 5 because none of the other version seem to be complete. —ELM It's not yet released

Re: [CODE4LIB] Restrict solr index results based on client IP

2015-01-07 Thread Erik Hatcher
Post processing results as in #1 has big disadvantages as you can’t easily “fill back in” as those docs that were removed and may have been accounted for in facet counts for example. #2 would be my recommendation as well. There is an open issue to create an IP(v6) field type in Solr, with a

Re: [CODE4LIB] Restrict solr index results based on client IP

2015-01-07 Thread Erik Hatcher
I meant to include this link in my first reply, sorry: https://issues.apache.org/jira/browse/SOLR-6741 https://issues.apache.org/jira/browse/SOLR-6741 On Jan 7, 2015, at 11:53 AM, Erik Hatcher erikhatc...@mac.com wrote: Post processing results as in #1 has big disadvantages as you can’t

Re: [CODE4LIB] MARC reporting engine

2014-11-03 Thread Erik Hatcher
I’m surprised you didn’t recommend going straight to Solr and doing the reporting from there :) Index into Solr using your MARC library of choice (e.g. solrmarc) and then get all authorities using facet.field=authorities (or whatever field name used). Erik On Nov 2, 2014, at 7:24

Re: [CODE4LIB] solr computation field norm problem

2013-09-26 Thread Erik Hatcher
Nicolas - Lucene 4 still encodes norms, as described here: http://lucene.apache.org/core/4_4_0/core/org/apache/lucene/search/similarities/DefaultSimilarity.html#encodeNormValue%28float%29 using this function:

[CODE4LIB]

2012-11-28 Thread Erik Hatcher
/2013_preconference_proposals -Shaun My understanding is that all of the pre-conference proposals are going to happen (note to self: ask Erik Hatcher whether the evening solr session could happen at a bar somewhere). The RailsBridge workshop in particular is aimed at folks who are new to Rails

Re: [CODE4LIB] extracting tiff info

2012-11-20 Thread Erik Hatcher
There's Tika http://tika.apache.org/, which has command-line capabilities. I just launched the UI app, dropped a TIFF on it, and got this output: Bits Per Sample: 8 8 8 8 bits/component/pixel Compression: LZW Content-Length: 262844 Content-Type: image/tiff Orientation: Top, left side

Re: [CODE4LIB] New Newcomer Dinner option

2012-02-04 Thread Erik Hatcher
Looks like some MARC records I've seen. On Feb 4, 2012, at 16:19, Cary Gordon listu...@chillco.com wrote: Probably their cat… They need this: http://www.bitboost.com/pawsense/ On Sat, Feb 4, 2012 at 12:49 PM, Eric Lease Morgan emor...@nd.edu wrote: LlkjyYYYYyetyeyppf Prpfc

Re: [CODE4LIB] Another Sharpie Opportunity

2012-02-03 Thread Erik Hatcher
canadian_snacks++ unless you mean poutine ;) but if you're talking Dangerous Dan's Diner, +1: http://www.dangerousdansdiner.com/

Re: [CODE4LIB] code4lib conference '12 - Solr pre-conference

2012-02-02 Thread Erik Hatcher
first come first serve - Since I'm not going to be making it to Seattle, I will gladly donate my conference slot to whoever 1) can make it and 2) e-mails me first @ erik.hatc...@lucidimagination.com On Feb 1, 2012, at 19:02 , Erik Hatcher wrote: Regretfully I must cancel my trip to Seattle

Re: [CODE4LIB] my conference slot?

2012-02-02 Thread Erik Hatcher
Don't sweat it Elizabeth... this is the case of the sharpie marker. If someone takes my slot, just pretend they're me as far as everything on your side goes and they sharpie their name on a badge. But no one has responded to me anyway. I know it's rough running an event (my company runs two

[CODE4LIB] code4lib conference '12 - Solr pre-conference

2012-02-01 Thread Erik Hatcher
Regretfully I must cancel my trip to Seattle, a bummer on several levels as I always love code4lib conferences, the people, the topics, and was also looking forward to enjoying downtown Seattle a bit too. Last minute urgent business duties call, alas. I have alerted the code4libcon e-mail

Re: [CODE4LIB] jQuery Ajax request to update a PHP variable

2011-12-06 Thread Erik Hatcher
I'm with jrock on this one. But maybe I'm a luddite that didn't get the memo either (but I am credited for being one of the instrumental folks in the Ajax world, heh - in one or more of the Ajax books out there, us old timers called it remote scripting). What I hate hate hate about seeing

Re: [CODE4LIB] stemming in author search?

2011-06-14 Thread Erik Hatcher
On Jun 14, 2011, at 08:10 , Keith Jenkins wrote: Does Solr support Soundex? (Soundex was originally developed to assist with alternate spellings of names) Indeed. And several other phonetic algorithms:

Re: [CODE4LIB] stemming in author search?

2011-06-14 Thread Erik Hatcher
for PhoneticFilterFactory, which may or may not differ depending on encoder chosen?) documented? On 6/14/2011 8:31 AM, Erik Hatcher wrote: On Jun 14, 2011, at 08:10 , Keith Jenkins wrote: Does Solr support Soundex? (Soundex was originally developed to assist with alternate spellings of names) Indeed

[CODE4LIB] japanese (Solr) analysis

2011-04-04 Thread Erik Hatcher
I'm trying to cull together the best practices for indexing/searching Japanese text. For those of you using Solr, what analyzer/field-type definition do you have for Japanese? Thanks for sharing! Erik

Re: [CODE4LIB] A suggested role for text mining in library catalogs?

2011-02-22 Thread Erik Hatcher
Solr _can_ use stemming, but to do it with POS would be flakey I'd think. Is work a verb or noun? Some of the (Solr-using) customers that I work with have done POS tagging (using tools like BasisTech Solr plugins for entity tagging). Payloads can be assigned to terms during indexing and then

[CODE4LIB] [code4libcon] QA fodder for the What's New in Solr 1.4 preconference??

2011-01-27 Thread Erik Hatcher
Just like I did last year, I'm requesting folks send me (on or off-list, as appropriate) issues/questions regarding Solr that I can factor into the session on Feb. 7 in Bloomington. Suggestions on specifics you'd like covered will be eagerly accepted and factored in too. Last year I had a ton

Re: [CODE4LIB] javascript testing?

2011-01-11 Thread Erik Hatcher
Here at Lucid we've got some Jasmine going on for LWE JS testing. Erik On Jan 11, 2011, at 21:25, Gabriel Farrell gsf...@gmail.com wrote: I like QUnit because it's minimal and I'm used to unit testing. A lot of people are jumping on Jasmine, though. It might be more your style if you're

[CODE4LIB] [JOB] Directory, Online Library Environment, University of Virginia

2010-07-19 Thread Erik Hatcher
I'm passing this on from contacts at UVa, please use the contact info below to follow up. == DIRECTOR, ONLINE LIBRARY ENVIRONMENT University of Virginia Library The University of Virginia Library seeks a strong technical leader for the position of Director

[CODE4LIB] evening CrossFit excursion

2010-02-22 Thread Erik Hatcher
I posted to the blog and update: http://wiki.code4lib.org/index.php/C4L2010_social_activities#CrossFit_Asheville If you're one of the few, the proud, the insane, meet me in the lobby at 5:45pm. I'll depart at 6pm. Gym is really close. Erik

[CODE4LIB] exercising at code4libcon next week

2010-02-17 Thread Erik Hatcher
code4libcon is about here, yay! I'm kinda in a fitness craze right now, and will be doing some training in Asheville. Monday night, 6:30pm, I'm going to the CrossFit Asheville gym - http://www.crossfitasheville.com/ I contacted them and they said that was a good time to come. I'll likely

Re: [CODE4LIB] preconference proposals - solr

2009-11-13 Thread Erik Hatcher
will leave this session with enough information to start running a solr service with your own data. 2. Morning session - solr black belt Instructors: Erik Hatcher (and Naomi Dushay? she has offered to help, if that's of interest) Amaze your friends with your ability to combine boolean and weighted

Re: [CODE4LIB] preconference proposals - solr

2009-11-13 Thread Erik Hatcher
On Nov 13, 2009, at 11:42 AM, Walter Lewis wrote: On 13 Nov 09, at 11:25 AM, Bess Sadler wrote: 1. Morning session - solr white belt [delightful descriptions snipped] 2. Morning session - solr black belt 3. Afternoon session - Blacklight Is there any chance that the black belt session needs

Re: [CODE4LIB] preconference proposals

2009-11-12 Thread Erik Hatcher
On Nov 11, 2009, at 6:46 PM, Naomi Dushay wrote: What do you think about the Solr part having some specific goodies like: +1 to it all! lots on dismax magic how to do fielded searching (author/title/subject) with dismax how to do browsing (termsComponent query, then fielded query to get

Re: [CODE4LIB] solr | StopFilterFactory - stopwords.txt

2009-11-12 Thread Erik Hatcher
I often recommend against stop word removal altogether. Is there any reason you need to remove them? The primary reason stop words get removed is to increase performance of queries with very common terms. If you are encountering that, using Solr's CommonGramsFilter(Factory) is a good

Re: [CODE4LIB] preconference proposals

2009-11-10 Thread Erik Hatcher
I'm interested presenting something Solr+library related at c4l10. I'm soliciting ideas from the community on what angle makes the most sense. At first I was thinking a regular conference talk proposal, but perhaps a preconference session would be better. I could be game for a half day

Re: [CODE4LIB] Greenstone: tweaking Lucene indexing

2009-09-29 Thread Erik Hatcher
The Lucene Highlighter doesn't require that the text you want highlighted be stored. In fact, you can pass in any arbitrary text to the Highlighter. See the various getBestFragments from the Highlighter class:

Re: [CODE4LIB] Greenstone: tweaking Lucene indexing

2009-09-29 Thread Erik Hatcher
On Sep 29, 2009, at 7:33 AM, Yitzchak Schaffer wrote: Erik Hatcher wrote: The Lucene Highlighter doesn't require that the text you want highlighted be stored. In fact, you can pass in any arbitrary text to the Highlighter. Thanks Erik, What I'm looking for is to return the context

Re: [CODE4LIB] indexing pdf files

2009-09-15 Thread Erik Hatcher
Here's a post on how easy it is to send PDF documents to Solr from Java: http://www.lucidimagination.com/blog/2009/09/14/posting-rich-documents-to-apache-solr-using-solrj-and-solr-cell-apache-tika/ Not only can you post PDF (and other rich content) files to Solr for indexing, you can

Re: [CODE4LIB] Usability evaluation of library online catalogues

2008-02-05 Thread Erik Hatcher
On Feb 4, 2008, at 4:12 PM, David Fiander wrote: Actually, the idea of using AJAX to create a way to add and remove limits diagonally is exactly what U Virginia's blacklight interface does, although with a slightly different interface: http://blacklight.betech.virginia.edu/ David - that

Re: [CODE4LIB] arg! classpaths!

2008-01-26 Thread Erik Hatcher
Sadly the Lucene demo is not all that great. I recommend you start with Solr rather than Lucene directly. Erik On Jan 26, 2008, at 9:30 AM, Eric Lease Morgan wrote: (Arg! Classpaths!) Please tell me why Java throws the NoClassDefFoundError error when I think I have set up my

Re: [CODE4LIB] Getting started with SOLR

2007-11-23 Thread Erik Hatcher
On Nov 22, 2007, at 3:41 PM, Kent Fitch wrote: On Nov 23, 2007 4:11 AM, Binkley, Peter [EMAIL PROTECTED] wrote: ... If you use boost on the date field the way you suggest, remember you'll have to reindex from scratch every year to adjust the boost as items age. Or maybe just use a method

[CODE4LIB] Portland library geeks?

2007-05-17 Thread Erik Hatcher
I'll be in Portland later today through Monday for RailsConf. The schedule is really tight, but if there are some library geeks in the area that want to get together around a pool table let me know. Erik

Re: [CODE4LIB] pspell aspell: make your own word lists/dictionaries

2007-04-03 Thread Erik Hatcher
Martin has created a Google Group for his spell checker, and discussions have been ongoing since c4lcon about how to contribute it to Lucene. You can learn more about it here: http://groups.google.com/group/spelt Martin has packaged the code with tests for folks to try it out easily.

Re: [CODE4LIB] Video encoding done - Mashup idea request

2007-03-16 Thread Erik Hatcher
Slides schmides. :) Just having slides synched to a speaker works for some cases, but for those of us that love doing live demos, coding on the fly, and just flat out winging it, the slides are often just barely related to what's being said. Having the actual screen being presented is all that

Re: [CODE4LIB] Flamenco

2007-03-07 Thread Erik Hatcher
On Mar 7, 2007, at 6:55 AM, K.G. Schneider wrote: A mention of the Flamenco project (open source faceted navigation) on Catalogablog made me wonder if anyone on c4l had looked at this: http://flamenco.berkeley.edu/ Of course! Many of us have been all over Flamenco since we first saw it.

Re: [CODE4LIB] Preconference

2007-02-13 Thread Erik Hatcher
On Feb 13, 2007, at 9:47 AM, Susan E Teague Rector/FS/VCU wrote: Are we supposed to be using a predefined set of data for the preconference or can we use our own data? Susan - I'm going to package up a lot of stuff (Solr, sample datasets, Luke, etc) to help everyone get started, but bringing

Re: [CODE4LIB] Preconference

2007-02-13 Thread Erik Hatcher
On Feb 13, 2007, at 10:58 AM, Jonathan Rochkind wrote: If we bring MARCXML and/or MODS, can we assume that there will be people who can help us process that data into something useable by Solr? That would be a nice, at any rate. Yes, yes you can make such an assumption. However, I want this

Re: [CODE4LIB] Preconference

2007-02-13 Thread Erik Hatcher
rate. Jonathan Erik Hatcher wrote: On Feb 13, 2007, at 9:47 AM, Susan E Teague Rector/FS/VCU wrote: Are we supposed to be using a predefined set of data for the preconference or can we use our own data? Susan - I'm going to package up a lot of stuff (Solr, sample datasets, Luke, etc) to help

Re: [CODE4LIB] Getting data from Voyager into XML?

2007-01-19 Thread Erik Hatcher
On Jan 17, 2007, at 3:26 PM, Andrew Nagy wrote: One thing I am hoping that can come out of the preconference is a standard XSLT doc. I sat down with my metadata librarian to develop our XSLT doc -- determining what fields are to be searchable what fields should be left out to help speed up

Re: [CODE4LIB] Getting data from Voyager into XML?

2007-01-19 Thread Erik Hatcher
, 2007, at 4:07 AM, Erik Hatcher wrote: On Jan 17, 2007, at 3:26 PM, Andrew Nagy wrote: One thing I am hoping that can come out of the preconference is a standard XSLT doc. I sat down with my metadata librarian to develop our XSLT doc -- determining what fields are to be searchable what fields

Re: [CODE4LIB] Lucene Newbie Question

2007-01-11 Thread Erik Hatcher
Andrew, On Jan 11, 2007, at 10:47 AM, Andrew Darby wrote: Hello, all. I'm trying to get started with Lucene for the Code4Lib preconference Excellent!!! and was wondering if someone could help. Of course I'm trying to do the first example from the Lucene site

Re: [CODE4LIB] Lucene Newbie Question

2007-01-11 Thread Erik Hatcher
On Jan 11, 2007, at 12:10 PM, Andrew Darby wrote: Thanks Erik and Bess. Erik: Lamentably, your java -cp lucene-core-2.0.0.jar:lucene-demos-2.0.0.jar org.apache.lucene.demo.IndexFiles src/ threw the same error. That is probably due to your environment CLASSPATH (I told you it was trouble!

Re: [CODE4LIB] Lucene Newbie Question

2007-01-11 Thread Erik Hatcher
On Jan 11, 2007, at 2:54 PM, Erik Hatcher wrote: On Jan 11, 2007, at 12:10 PM, Andrew Darby wrote: Thanks Erik and Bess. Erik: Lamentably, your java -cp lucene-core-2.0.0.jar:lucene-demos-2.0.0.jar org.apache.lucene.demo.IndexFiles src/ threw the same error. That is probably due to your

[CODE4LIB] solrb DSL collaboration

2007-01-09 Thread Erik Hatcher
A little edgy in #code4lib today about where we* are going with solrb (the Ruby/Solr domain-specific language API), so we're going to add a bit of process by fleshing it out via the solrb section of the Solr wiki. Below is the first draft, though I've revised it some since then slightly. Click

[CODE4LIB] Solr Flare

2007-01-02 Thread Erik Hatcher
code4libers, I've kicked off a sub-project of Solr called Flare. Its has several goals, including to be a Solr Ruby DSL and to achieve a general purpose user interface framework to include faceted browsing, suggest interfaces, as well as the folksonomy angle of tagging/annotating results. At

[CODE4LIB] Fwd: Solr 1.1 released

2006-12-23 Thread Erik Hatcher
The state of Solr's official release was asked about on #code4lib the other day. Here ya go, hot off the press Begin forwarded message: From: Yonik Seeley [EMAIL PROTECTED] Date: December 22, 2006 5:07:49 PM EST To: solr-user@lucene.apache.org, solr-dev@lucene.apache.org Subject: Solr

Re: [CODE4LIB] code4lib lucene pre-conference

2006-11-29 Thread Erik Hatcher
On Nov 29, 2006, at 10:27 AM, Art Rhyno wrote: I am so behind in e-mail that I might be treading on ground that is worn out on this, but I would add to Eric's list that I don't care about the indexer if: Here's how Lucene/Solr fares on these points: * the indexer has an open and configurable

Re: [CODE4LIB] code4lib lucene pre-conference

2006-11-28 Thread Erik Hatcher
On Nov 28, 2006, at 5:44 PM, Kevin S. Clarke wrote: Is there a standard for specifying how textual analysis works as well, so that tokenization can be standardized across these XQuery engines as well? Not that I know. What I've seen so far is that tokenization is implementation specific.

Re: [CODE4LIB] code4lib lucene pre-conference

2006-11-28 Thread Erik Hatcher
On Nov 28, 2006, at 3:28 PM, Andrew Nagy wrote: The major problem with it all is the ugly mess that is marcxml This brings up an interesting point about just dropping our source XML data into an XML-savvy database and using XQuery on it. Maybe y'all have much cleaner data that I've seen, but

Re: [CODE4LIB] code4lib lucene pre-conference

2006-11-27 Thread Erik Hatcher
On Nov 27, 2006, at 5:46 PM, Binkley, Peter wrote: You've got enough flexibility in the way you set up your Lucene index, and Lucene search results give you access to the term weights for each hit, It does? so you can tell which fields actually matched. You can? I'm curious how you're

Re: [CODE4LIB] code4lib lucene pre-conference

2006-11-27 Thread Erik Hatcher
On Nov 27, 2006, at 6:12 PM, Binkley, Peter wrote: Fair point, and that's how my current solr-based project works. I'm thinking I would like the other advantages of an XML db: the ability to run xqueries, batch updates, etc., alongside the Lucene searching. And I want them integrated under the

Re: [CODE4LIB] code4lib lucene pre-conference

2006-11-27 Thread Erik Hatcher
On Nov 27, 2006, at 5:04 PM, Jonathan Rochkind wrote: Bess Sadler wrote: application. That way you can use solr / lucene for search, faceted browse, etc, and your XML database only for known item retrieval, which it is generally able to do without performance issues. I'm hopping up and down

Re: [CODE4LIB] code4lib lucene pre-conference

2006-11-27 Thread Erik Hatcher
On Nov 27, 2006, at 5:49 PM, Andrew Nagy wrote: My only concern about lucene is the lack of a standard query language. I went down the native XML database path because of XQuery and XSL, does something like lucene and solr offer a strong query language? Is it a standard? What if someone