Re: [CODE4LIB] Citation parsing?

2007-07-20 Thread Eric Hellman
On Jul 18, 2007, at 10:04 PM, Eric Hellman wrote: Also, even in (many) scholarly journals, editorial consistency is almost unbelievably poor -- lots of times, the rules just aren't followed. Punctuation gets missed, journal names (especially abbreviations!) are misspelled... and so on. Rule-based

Re: [CODE4LIB] Citation parsing?

2007-07-20 Thread Joe Atzberger
On 7/20/07, Eric Hellman [EMAIL PROTECTED] wrote: Have people been able to do a decent job of identifying parts of speech in natural language? I think trying to import broad NLP findings into our narrower problem of citation parsing is not likely to be fruitful but on the other hand

Re: [CODE4LIB] Citation parsing?

2007-07-18 Thread Steve Toub
Godmar Back wrote: A year or so ago a couple of students looked into this for LibX. There are a number of systems that people have published about, although some are not available and none worked very well or were easy to get to work. The systems also varied in their computational complexity,

Re: [CODE4LIB] Citation parsing?

2007-07-18 Thread Jonathan Rochkind
Nice, that might be what I need. Maybe I'll take a look at the LibX code, it's open source, right? Google Scholar has no API--you're screen scraping it? Jonathan Godmar Back wrote: A year or so ago a couple of students looked into this for LibX. There are a number of systems that people have

Re: [CODE4LIB] Citation parsing?

2007-07-18 Thread Godmar Back
On 7/18/07, Steve Toub [EMAIL PROTECTED] wrote: Agreed that a lookup against something like Google Scholar, Web of Science, or a set of federated search targets instance may yield better results. We've discussed by haven't done any testing. Use your LibX edition, Steve. I can also send a

Re: [CODE4LIB] Citation parsing?

2007-07-18 Thread Godmar Back
On 7/18/07, Jonathan Rochkind [EMAIL PROTECTED] wrote: Nice, that might be what I need. Maybe I'll take a look at the LibX code, it's open source, right? Google Scholar has no API--you're screen scraping it? Yes and yes. - Godmar

Re: [CODE4LIB] Citation parsing?

2007-07-18 Thread Alberto Accomazzi
Hi Jonathan, There is a PERL module by Mike Jewell which was written for this purpose: http://search.cpan.org/~mjewell/Biblio-Citation-Parser-1.10/ I am not using the code, so I can't comment on how well it may work for your purpose, but it's probably worth a look. -- Alberto On 7/17/07,

Re: [CODE4LIB] Citation parsing?

2007-07-18 Thread Jonathan Rochkind
Ha! If it's not too difficult, then with all the time you've spent looking at it extensively, how come you don't have a solution yet? Just kidding. :) Jonathan Nathan Vack wrote: We've looked at this pretty extensively, and we're pretty certain there's nothing downloadable that does a good

Re: [CODE4LIB] Citation parsing?

2007-07-18 Thread Eric Hellman
Having written a pretty decent citation parser 10 years ago (in Applescript!), and having seen a lot of people take whacks at the problem, I have to say that it's pretty easy to write one that works on 70-80% of citations, particularly if you stick to one scholarly subject area. On the other

[CODE4LIB] Citation parsing?

2007-07-17 Thread Jonathan Rochkind
Does anyone have any decent open source code to parse a citation? I'm talking about a completely narrative citation like someone might cut-and-paste from a bibliography or web page. I realize there are a number of differnet formats this could be in (not to mention the human error problems that