CV Mining (Early adopter program)

Luca Dini Thu, 01 Mar 2012 06:45:02 -0800

Dear All,

Please let me introduce a new early adopter project, in which we will beinvolved. I hope in a great and intellectually inspiring communicationwith you all.

Kind regards,
Luca

The project (run by CELI under the umbrella of the IKS early adopterprogram) aims to integrate Stanbol technology with a specific contextof use, i.e. CV management via CMS and semantic technologies. Thecrucial challenge of this integration is the parametrization of Stanbolto deal with information which has been automatically extracted from CV.Besides the direct integration results, which will be distributed at thesame conditions as Stanbol software, the early adoption project willproduce two additional by-products:

The provision to Stanbol of classes allowing the connection withLinguagrid (www.linguagrid.org) and possibly LanguageGrid(http://langrid.org/en/index.html).The verification of the extensibility of Stanbol to languages otherthan English (The project will concern CVs written in French).

We envisage two prototypical use cases, which are described in thefollowing:

Use-Case 1: Human Resources Department

The context is the one of a Human Resource Department of a big companyor any recruitment company. The basic goal is to provide them with anopen source document management system able to deal in an intelligentway with non structured CV (or "resumes"), i.e. CVs which comes inMicrosoft Word, pdf, Open Office etc. Each time a new CV arrives it isinserted in the document base. Behind the scene this is not just addinga document but passing it to a Standbol server which enhances it withstructured information.


This might represent:

    experiences of the candidate
    skills of the candidate
    Education level
    reference data (name, address etc.)
    contact data

Some of these data might be slightly more structured than just namedentities, but definitely in the representation power of rdf. Some ofthem could be even more semantically enriched, by providing externalinformation on companies, places, specific technologies etc.

As a result of this personnel at the HR department would be able toformulate queries such as (just an exemplification):


    All CV of people living in Paris older then 27 years
    All CV of people with skills in SQL server and Java
    All people who have worked in an high tech company since november 2011.

....

In terms of GUI the user will be confronted with a system that allowseasy search and easy population of CV data.



Use-Case 2: Employment Administration

In the second use case we are keeping into account the needs of publicagencies with the institutional role of re-integrating in the labormarket persons which loose their job or that are looking for their firstjob. In particular we are considering institutions such as the FrenchPôle emploi (http://www.pole-emploi.fr/accueil/ ,http://fr.wikipedia.org/wiki/P%C3%B4le_emploi). This institution is incharge of crossing the demand and the offer on the labor market, inparticular by addressing candidates to the right potential employer,suggesting possible educational training, by shaping their skills, etc.In many cases these agencies are managed at a local rather than anational level, as the market of labor is affected by regionalconstraints. In this use case the parametrized CMS has a double goal:

Much like in the previous case to allow the fast and intelligentretrieval of CVs out of the document base in order to answer potentialemployer needs.To be able to perform Business Intelligence like tasks over thestructured information provided by the mass of analyzed CVs. Of courseperforming BI analysis is out of the scope of this proposal, but thestructuring of CV information into ontology based classes is definitelythe first step towards this direction.





Challenges

From a technical point of view the most interesting challenge consistsin integrating the set of Stanbol enhancer, with the semantic webservices provided at www.linguagrid.org. In principle it should not be adifferent integration than what has already been made with OpenCalais WSand Zemanta WS. However there are at least two major challenges:

Multilinguality. The extraction will consider French documentsrather than English ones. Moreover, in a second phase (not covered bythe present project, the whole system could be extended to Italian andFrench.Ontological extension. While CVs typically contains quite a lot ofnamed entities which are already covered by Stanbol (e.g. geographicalnames, time expressions, Company names, person names) there are entitieswhich will need some ontology extension such as skills and education.Structural Complexity. In a CV instances of entities are linkedeach other in a structurally complex way. For instance places are notjust a flat list of geographical entities, but their are likely to beconnected with periods, with job types, with companies, etc. Handlingthis structural complexity represents an important challenge.




--
*************************************
Luca Dini
CELI France SAS

Grenoble:
12-14 rue Claude Genin
38000 Grenoble

Paris:
33 Avenue Philippe Auguste
75011 Paris

tel: 00 33 476 24 23 80
www.celi-france.com/
www.celi.it/
research.celi.it

*************************************

CV Mining (Early adopter program)

Reply via email to