Hi Maven cracks
At a customer site there is a custom, company-wide dictionary available for
spellchecking. This dictionary is managed in an proprietary application from
where you can export it. For the webapp we're building we need to transform this
dictionary into a very simple format: a single file with one dictionary entry
per line. The export format is somewhat special as its spread over a bunch of
files (one for each letter of the alphabet), contains additional syllabication
info, which we don't need and also has some comments that have to be removed.
The specifics of the format aren't really that important here though...
After some testing I came up with the following short bash-script that fullfills
all my needs:
8<-----------------------------------------------------------
tmp_folder=target/dict
cls_folder=target/classes
mkdir -p $tmp_folder
mkdir -p $cls_folder
cat src/main/dictionary/*.lst > $tmp_folder/tmp1.dict
sed "s/[~?]//g" $tmp_folder/tmp1.dict > $tmp_folder/tmp2.dict
sed "s/ .*$//g" $tmp_folder/tmp2.dict > $tmp_folder/tmp3.dict
sort -u -o $cls_folder/my.dict $tmp_folder/tmp3.dict
8<-----------------------------------------------------------
(In other words: Take all files src/main/dictionary/*.lst, concat them into one
single file, match some strings with simple regexes and remove those, and
finally sort the dictionary entries and remove all duplicates.)
This script is then called from within maven with exec-maven-plugin. Afterwards
maven-jar-plugin wraps the file in a simple jar, so the dictionary can be easily
consumed in Java using getClassLoader().getResourceAsStream().
Now all is well & nice and this script even performs sufficently given about 1.6
million dictionary entries (~38MB). But of course it's not really the Maven way
to do things, especially because it's not portable. You need to have some kind
of Unix-like enviroment in place for this script to work.
I've given this some thought, but I can't seem to find a possible combination of
maven plugins that's able to do what four lines of bash script achieve so elegantly.
I'd really like to hear your ideas on this matter...
Bye,
Thomas
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@maven.apache.org
For additional commands, e-mail: users-h...@maven.apache.org