On 7/20/07, Eric Hellman <[EMAIL PROTECTED]> wrote:
Have people been able to do a decent job of identifying parts of
speech in natural language?
I think trying to import broad NLP findings into our narrower problem of
citation parsing is not likely to be fruitful but on the other hand
ste
On Jul 20, 2007, at 9:14 AM, Eric Hellman wrote:
Heuristics are perhaps the only way to deal with lack of consistent
format. (i.e. "a cluster of words including "journal of" is likely
to contain a journal name")
You're right; in a lot of ways, it depends on what you consider a
heuristic; every
On Jul 18, 2007, at 10:04 PM, Eric Hellman wrote:
Also, even in (many) scholarly journals, editorial consistency is
almost unbelievably poor -- lots of times, the rules just aren't
followed. Punctuation gets missed, journal names (especially
abbreviations!) are misspelled... and so on. Rule-based
On Jul 18, 2007, at 10:04 PM, Eric Hellman wrote:
Anyway, almost all parsers rely on a set of heuristics. I have not
seen any parsers that do a good job of managing their heuristics in a
scaleable way. A successful open-source attack on this problem would
have the following characteristics:
1. a
Having written a pretty decent citation parser 10 years ago (in
Applescript!), and having seen a lot of people take whacks at the
problem, I have to say that it's pretty easy to write one that works
on 70-80% of citations, particularly if you stick to one scholarly
subject area. On the other hand,
It's on our list of Big Problems To Solve; I'm hoping to have time to
tackle it later this year :)
-n
On Jul 18, 2007, at 12:57 PM, Jonathan Rochkind wrote:
Ha! If it's not too difficult, then with all the time you've spent
"looking at it extensively", how come you don't have a solution yet?
Ha! If it's not too difficult, then with all the time you've spent
"looking at it extensively", how come you don't have a solution yet?
Just kidding. :)
Jonathan
Nathan Vack wrote:
We've looked at this pretty extensively, and we're pretty certain
there's nothing downloadable that does a "good
We've looked at this pretty extensively, and we're pretty certain
there's nothing downloadable that does a "good enough" job. However,
it's by no means impossible -- it seems to be undergrad thesis-level
work in Singapore:
http://wing.comp.nus.edu.sg/parsCit/
There used to be a paper describing
Hi Jonathan,
There is a PERL module by Mike Jewell which was written for this purpose:
http://search.cpan.org/~mjewell/Biblio-Citation-Parser-1.10/
I am not using the code, so I can't comment on how well it may work for
your purpose, but it's probably worth a look.
-- Alberto
On 7/17/07, Jon
On 7/18/07, Jonathan Rochkind <[EMAIL PROTECTED]> wrote:
Nice, that might be what I need. Maybe I'll take a look at the LibX
code, it's open source, right?
Google Scholar has no API--you're screen scraping it?
Yes and yes.
- Godmar
On 7/18/07, Steve Toub <[EMAIL PROTECTED]> wrote:
Agreed that a lookup against something like Google Scholar, Web of
Science, or a set of federated search targets instance may yield better
results. We've discussed by haven't done any testing.
Use your LibX edition, Steve. I can also send a dra
Nice, that might be what I need. Maybe I'll take a look at the LibX
code, it's open source, right?
Google Scholar has no API--you're screen scraping it?
Jonathan
Godmar Back wrote:
A year or so ago a couple of students looked into this for LibX. There
are a number of systems that people have p
Godmar Back wrote:
A year or so ago a couple of students looked into this for LibX. There
are a number of systems that people have published about, although
some are not available and none worked very well or were easy to get
to work. The systems also varied in their computational complexity,
wit
A year or so ago a couple of students looked into this for LibX. There
are a number of systems that people have published about, although
some are not available and none worked very well or were easy to get
to work. The systems also varied in their computational complexity,
with some not suitable
Does anyone have any decent open source code to parse a citation? I'm
talking about a completely narrative citation like someone might
cut-and-paste from a bibliography or web page. I realize there are a
number of differnet formats this could be in (not to mention the human
error problems that alw
15 matches
Mail list logo