My comments on RC1 are below.  i don't feel comfortable voting for it in 
it's current state...


1) release naming: should probably be apache-tika-0.2-src.jar  i seem to 
recall someone somewhere saying that was important for apache releases 
(and it's more consistent with the the 0.1 release)

2) release file format: the 0.1 release seems to have been a tar.gz ... 
was a concious choice made by the community to switch to distributing as a 
src jar? otherwise you may want to publish both, or stick with tar.gz for 
consistency (the docs on the website refer to the tarball when giving 
examples of downloading and verifying)

3) incubator refs: as mentioned before, there are a lot of refrences to 
the incubator that should be switched to point to lucene...

[EMAIL PROTECTED]:~/tmp/tika-release/rc1/tika-0.2$ grep -lir incubator .
./pom.xml
./src/site/apt/download.apt
./src/site/apt/index.apt
./README.txt

4) user docs: (I think grant may have already mentioned this) The 
README.txt file talks about building Tika, but there doesn't seem to be 
anything in the release that describes how to use Tika ... has any thought 
been given to including more docs in the release it self? -- 
gettingstarted.html perhaps? ... at the very least a paragraph should be 
added to the README refering to the gettingstarted.html page.  

Personally, i think including documentation.html and formats.html in the 
release are also important -- they're going to change between releases, 
probably more then the "getting started" type info, and should be 
"versioned" so moving forward people with older versions won't get 
misslead by the docs on the site.

5) artifacts missing: i tried following along with the gettingstarted.html 
(my first time using maven BTW so i may have messed something up) and ran 
into a snag... "mvn install" download a bunch of dependencies (i think 
they were maven's own dependencies since i'd never used it before), ran 
some test (these definitely had tika in the name) then downloaded some 
more things, then told me it was installing tika-0.2.jar in my ~/.m2 
directory.  When i looked at the next section "Build artifacts" it refered 
to 3 jars in my target directory -- but i only have one...

[EMAIL PROTECTED]:~/tmp/tika-release/rc1/tika-0.2$ find target -name \*jar
target/tika-0.2.jar

...is the gettingstarted.html wrong, or did the build not run correctly?

6) RAT: Apache RAT noticed the following files missing license info...

 !????? /home/hossman/tmp/tika-release/rc1/tika-0.2/src/site/resources/tika.svg
 !????? 
/home/hossman/tmp/tika-release/rc1/tika-0.2/src/site/resources/tikaNoText.svg
 !????? 
/home/hossman/tmp/tika-release/rc1/tika-0.2/src/test/resources/test-documents/testHTML.html
 !????? 
/home/hossman/tmp/tika-release/rc1/tika-0.2/src/test/resources/test-documents/testHTML_utf8.html
 !????? 
/home/hossman/tmp/tika-release/rc1/tika-0.2/src/test/resources/test-documents/testRTF.rtf
 !????? 
/home/hossman/tmp/tika-release/rc1/tika-0.2/src/test/resources/test-documents/testTXT.txt
 !????? 
/home/hossman/tmp/tika-release/rc1/tika-0.2/src/test/resources/test-documents/testXHTML.html
 !????? 
/home/hossman/tmp/tika-release/rc1/tika-0.2/src/test/resources/test-documents/testXML.xml

...I don't know if i've ever heard an opinion on needing to include the 
ASL header in *.svg files (they are xml, but they are also clearly 
generated by inkscape), but I do remember someone pointing out that test 
data files in formats that are capable of containing comments in them (ie: 
xml, html, etc...) should include the ASL header, such as...

http://svn.apache.org/repos/asf/lucene/solr/trunk/example/exampledocs/hd.xml

7) javadocs: maybe this is something that is obvious to maven users, and 
as a non-maven user i just don't know the magic incantation, but i 
couldn't find any generated javadocs in the release (or in the "target" 
directory after running "mv install") ... since Tika is primarily a 
library people will use in java apps, this seems kind of important.  If 
there is a magic maven incantation to build these, let's included the 
instructions somewhere (since the gettingstarted guide suggests that maven 
is neccessary to build tika, but not to use it (per the Artifacts and Ant 
sections)

FWIW: browsing the nightly snapshot javadocs online i really wasn't even 
sure where i should start.  My suggestion: documentation.html would be 
damn near perfect as an overview.html javadoc file.


-Hoss

Reply via email to