Josh,
None of those pieces is an IR, but do you think
that when taken as a whole they could comprise an IR?
Yes. I think it’s very healthy to think of the IR as a set of services, rather
than a single software product. And I really like the idea of using your
catalog as the discovery environme
If the issue is that you don't want to actually have to separate out
additional columns and/or use some sort of temp table to hold them, you
can work around that by pulling out some substrings in the order by. I
work primarily in SQL but i'm pretty sure patindex ("pattern index") works
in MySQL a
Wow, options popping out of the woodwork left and right! We'll probably
try out several of these and see which suits best.
Thanks a lot, everyone!
Will
On 2017-10-25 16:31, Ray Voelker wrote:
Figured I'd chime in with something I spent entirely way too much time
on
(probably). A JavaScript c
Figured I'd chime in with something I spent entirely way too much time on
(probably). A JavaScript class to normalize and sort LC call numbers
https://github.com/rayvoelker/js-loc-callnumbers
Not sure if that helps you in your particular situation, but it might give
you a place to start along wit
If you're using PHP you could use natsort() on the result,
http://php.net/manual/en/function.natsort.php
Alternatively, you could try ordering by the length of the VARCHAR field first,
ORDER BY LENGTH(field), field
Sent from mobile
> On Oct 25, 2017, at 2:11 PM, Jodie Gambill wrote:
>
> Hi
Hi Bryan,
I agree that a repository is more than documents, and in this model we
would still do metadata, indexing, etc. It would just be handled by a
different piece. Instead of having one system that does it all (like
DSpace), we'd use the library catalog for metadata and indexing, backup
tools
Will -- I use this sortLC php script that I ported from Michael Doran's perl
version:
PHP: https://github.com/kenirwin/Weeding-Helper/blob/master/sortLC.php
Perl: https://rocky.uta.edu/doran/sortlc/
They have their flaws, but I find them to work pretty well.
I hope this helps!
Ken
-Origi
Hi Will -
I had a similar task a few years ago on a small project, though we only
used the classMark and classNum (from your example) parts of the call
number for what we needed. I implemented it as you outlined above, with two
separate fields to enable proper sorting -- classMark as varchar and
cl
The best way is probably to normalize the call numbers into a sortable
string outside of MySQ, save that string to a sortable_callnumber field in
your database, and sort by that.
Normalizing call numbers (
http://robotlibrarian.billdueber.com/2008/11/normalizing-loc-call-numbers-for-sorting/)
turn
Josh,
Theres nothing wrong with what you are describing if its all your institution
needs, but I would be careful about promoting that as an IR. An IR is much more
than a bunch of documents. The metadata modelling, preservation features and
indexing that you want to leave out are what makes it
We have a small web app with a MySQL backend, containing lists of books
that have been reported lost and tracking our efforts at locating them.
Access Services requested that the list of currently missing books be
sorted according to LC call number. So we did, but the results are
ordered in t
Thank you for all the replies. It all makes me feel as if I’m on the right
track. Again, thank you. —Eric M.
We're a mid-sized university library (10,000 fte) trying to get an IR off
the ground to showcase student and faculty research. We've had a DSpace
instance running for several years, but we use so few of its features that
DSpace ends up being more trouble than it is worth. In particular, it's
very f
> On Oct 25, 2017, at 6:55 AM, Edward Summers wrote:
>
> Recognizing that ethics and web archiving is a rapidly evolving field and
> that it might not fit directly into your primary work/research interests we
> wanted to keep the proposal process simple. We just need 100 words from you
> abou
It turns out it's straightforward to reimplement the default fingerprinting
algorithm that OpenRefine uses. We did that here to help catch those sorts
of trivial spelling differences in user searches in order to provide
best-bet suggestions for some of our most popular stuff. Here's my
reimplementa
On Wed, Oct 25, 2017 at 8:57 AM, Eric Lease Morgan wrote:
> ...My bibliographic data is fraught with inconsistencies. For example, a
> publisher’s name may be recorded one way, another way, or a third way. The
> same goes for things like publisher place: South Bend; South Bend, IN;
> South Bend,
Eric,
You can actually use open refine programmatically. There are multiple
client libraries for it.
https://github.com/OpenRefine/OpenRefine/wiki/Documentation-For-Developers#known-client-libraries-for-refine
Trevor Muñoz wrote a great blog post about his work doing just that
http://trevormunoz
I actually love the approach Mark writes about here. It was partly what
inspired me to do this work in MarcEdit -- abet, in a light-weight way --
so not to incur any additional dependencies.
--tr
On Wed, Oct 25, 2017 at 12:23 PM, Phillips, Mark
wrote:
> Of possible interest is some work we've
Of possible interest is some work we've done to take the clustering
capabilities of OpenRefine and bake them into our metadata editing interface
for The Portal to Texas History and the UNT Digital Library.
We've focused a bit on interfaces which might be of interest. I've written a
bit about i
Hi Eric,
I am planning to work on detecting such anomalities. What I have
thought about so far the following approaches:
- n-gram analysis
- basket analysis
- similarity detection of Solr
- final state automat
The tools I will use: Apache Solr and Apache Spark. I haven't started
yet the implement
Unfortunately -- not in a language you likely would want to use. But I've
been working on doing this in MarcEdit 7, and to do it, I found that I got
a lot of mileage using the Levenshtein distance algorithm (which I
prefer). You can usually find these in a variety of languages. The
approach that
Has anybody here played with any clustering techniques for normalizing
bibliographic data?
My bibliographic data is fraught with inconsistencies. For example, a
publisher’s name may be recorded one way, another way, or a third way. The same
goes for things like publisher place: South Bend; Sout
Once again, this is a periodic reminder of why *formally* organizing (which may
or may not involve incorporation in a State) is such a great idea.
This kind of thing (Who is a member of the community? How do they vote? How do
you determine whether a vote is properly held and binding?) is *preci
Forum on Ethics and Archiving the Web
New Museum, New York City, March 22-24
Proposals due by November 14 (funding available)
http://rhizome.org/editorial/2017/oct/24/open-call-national-forum-on-ethics-and-archiving-the-web/
The dramatic rise in the public’s use of the web and social media to docu
24 matches
Mail list logo