CV Mining Which CMS

Luca Dini Fri, 02 Mar 2012 04:01:46 -0800

Hi Andreas,

that's a good question. So far we have been using no CMS. Just a SOLRbased application with faceting in the style of ajax-solr or vuFind. Nowwe would like to shift to a CMS to allow real CV management and not Justuploading and searching. So the basic element for the choice are:

1) Faceted Search
===============

We noticed that users of this kind of application become rapidlyconfident in this kind of search modality. However it looks like neitherAlfresco nor Nuxeo have something comparable. Nuxeo has a kind offaceted navigation implementation, but it is not really that, in thesense that it provides for each field all facets to be selectable,irrespective of the fact that it returns documents or not. Moreover itdoes not provide the number of documents which would be returned byselecting a certain facet. And it does not seems that NEs are integratedin this kind of search. Alfresco seems to support facets only via theLucidImagination integration, which is proprietary. However it is basedon SOLR, thus it might not be impossible to integrate a view with ajax-solr

2) Standbol Integration
===================

The interesting claim that we would like to verify is that by usingstanbol, than the integration between a semantic system and a CMS ismuch easier. Thus we need to evaluate how far the various integrationsStanbol/CMS which already took place, can support a central use of datacoming from stanbol and not just an accessory enrichment.


3) Non functional constraints
=========================

Even in the best world, I think it would be over-optimistic to assumethat no intervention on the source code of the CMS will be required. Aswe are mainly java programmers, the fact that the CMS is implemented injava is a prerequisite.

In the last week we have been investigating in all these directions andit seems so far that probably Alfresco with some specially designedinterface is a wise choice. But we didn't make a choice yet, so anyadvise would be absolutely precious.

Many thanks,
Luca


On 01/03/2012 20:52, Andreas Kuckartz wrote:

Hi Luca,

which CMS do you intend to use for the project?

Cheers,
Andreas
---

On 01.03.2012 15:44, Luca Dini wrote:

Dear All,
Please let me introduce a new early adopter project, in which we will
be involved. I hope in a great and intellectually inspiring
communication with you all.
Kind regards,
Luca

The project (run by CELI under the umbrella of the  IKS early adopter
program) aims  to integrate Stanbol technology with a specific context
of use, i.e. CV management via CMS and semantic technologies. The
crucial challenge of this integration is the parametrization of
Stanbol to deal with information which has been automatically
extracted from CV. Besides the direct integration results, which will
be distributed at the same conditions as Stanbol software, the early
adoption project will produce two additional by-products:

     The provision to Stanbol of classes allowing the connection with
Linguagrid (www.linguagrid.org) and possibly LanguageGrid
(http://langrid.org/en/index.html).
     The verification of the extensibility of Stanbol to languages
other than English (The project will concern CVs written in French).

We envisage two prototypical use cases, which are described in the
following:
Use-Case 1: Human Resources Department

The context is the one of a Human Resource Department of a big company
or any recruitment company. The basic goal is to provide them with an
open source document management system able to deal in an intelligent
way with non structured CV (or "resumes"), i.e. CVs which comes in
Microsoft Word, pdf, Open Office etc. Each time a new CV arrives it is
inserted in the document base. Behind the scene this is not just
adding a document but passing it to a Standbol server which enhances
it with structured information.

This might represent:

     experiences of the candidate
     skills of the candidate
     Education level
     reference data (name, address etc.)
     contact data

Some of these data might be slightly more structured than just named
entities, but definitely in the representation power of rdf. Some of
them could be even more semantically enriched, by providing external
information on companies, places, specific technologies etc.

As a result of this personnel at the HR department would be able to
formulate queries such as (just an exemplification):

     All CV of people living in Paris older then 27 years
     All CV of people with skills in SQL server and Java
     All people who have worked in an high tech company since november
2011.

....

In terms of GUI the user will be confronted with a system that allows
easy search and easy population of CV data.


Use-Case 2: Employment Administration

In the second use case we are keeping into account the needs of public
agencies with the institutional role of re-integrating in the labor
market persons which loose their job or that are looking for their
first job. In particular we are considering institutions such as the
French Pôle emploi (http://www.pole-emploi.fr/accueil/ ,
http://fr.wikipedia.org/wiki/P%C3%B4le_emploi). This institution is in
charge of crossing the demand and the offer on the labor market, in
particular by addressing candidates to the right potential employer,
suggesting possible educational training, by shaping their skills,
etc. In many cases these agencies are managed at a local rather than a
national level, as the market of labor is affected by regional
constraints. In this use case the parametrized CMS has a double goal:

     Much like in the previous case to allow the fast and intelligent
retrieval of CVs out of the document base in order to answer potential
employer needs.
     To be able to perform Business Intelligence like tasks over the
structured information provided by the mass of analyzed CVs. Of course
performing BI analysis is out of the scope of this proposal, but the
structuring of CV information into ontology based classes is
definitely the first step towards this direction.




Challenges

 From a technical point of view the most interesting challenge consists
in integrating the set of Stanbol enhancer, with the semantic web
services provided at www.linguagrid.org. In principle it should not be
a different integration than what has already been made with
OpenCalais WS and Zemanta WS. However there are at least two major
challenges:

     Multilinguality. The extraction will consider French documents
rather than English ones. Moreover, in a second phase (not covered by
the present project, the whole system could be extended to Italian and
French.
     Ontological extension. While CVs typically contains quite a lot of
named entities which are already covered by Stanbol (e.g. geographical
names, time expressions, Company names, person names) there are
entities which will need some ontology extension such as skills and
education.
     Structural Complexity. In a CV instances of entities are linked
each other in a structurally complex way. For instance places are not
just a flat list of geographical entities, but their are likely to be
connected with periods, with job types, with companies, etc. Handling
this structural complexity represents an important challenge.

CV Mining Which CMS

Reply via email to