Hi Maven cracks

At a customer site there is a custom, company-wide dictionary available for
spellchecking. This dictionary is managed in an proprietary application from
where you can export it. For the webapp we're building we need to transform this
dictionary into a very simple format: a single file with one dictionary entry per line. The export format is somewhat special as its spread over a bunch of files (one for each letter of the alphabet), contains additional syllabication info, which we don't need and also has some comments that have to be removed. The specifics of the format aren't really that important here though...

After some testing I came up with the following short bash-script that fullfills
all my needs:

8<-----------------------------------------------------------
tmp_folder=target/dict
cls_folder=target/classes
mkdir -p $tmp_folder
mkdir -p $cls_folder

cat src/main/dictionary/*.lst > $tmp_folder/tmp1.dict
sed "s/[~?]//g" $tmp_folder/tmp1.dict > $tmp_folder/tmp2.dict
sed "s/  .*$//g" $tmp_folder/tmp2.dict > $tmp_folder/tmp3.dict
sort -u -o $cls_folder/my.dict $tmp_folder/tmp3.dict
8<-----------------------------------------------------------

(In other words: Take all files src/main/dictionary/*.lst, concat them into one
single file, match some strings with simple regexes and remove those, and
finally sort the dictionary entries and remove all duplicates.)

This script is then called from within maven with exec-maven-plugin. Afterwards
maven-jar-plugin wraps the file in a simple jar, so the dictionary can be easily consumed in Java using getClassLoader().getResourceAsStream().

Now all is well & nice and this script even performs sufficently given about 1.6
million dictionary entries (~38MB). But of course it's not really the Maven way to do things, especially because it's not portable. You need to have some kind of Unix-like enviroment in place for this script to work.

I've given this some thought, but I can't seem to find a possible combination of maven plugins that's able to do what four lines of bash script achieve so elegantly.

I'd really like to hear your ideas on this matter...


Bye,
 Thomas

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@maven.apache.org
For additional commands, e-mail: users-h...@maven.apache.org

Reply via email to