My comments on RC1 are below. i don't feel comfortable voting for it in it's current state...
1) release naming: should probably be apache-tika-0.2-src.jar i seem to recall someone somewhere saying that was important for apache releases (and it's more consistent with the the 0.1 release) 2) release file format: the 0.1 release seems to have been a tar.gz ... was a concious choice made by the community to switch to distributing as a src jar? otherwise you may want to publish both, or stick with tar.gz for consistency (the docs on the website refer to the tarball when giving examples of downloading and verifying) 3) incubator refs: as mentioned before, there are a lot of refrences to the incubator that should be switched to point to lucene... [EMAIL PROTECTED]:~/tmp/tika-release/rc1/tika-0.2$ grep -lir incubator . ./pom.xml ./src/site/apt/download.apt ./src/site/apt/index.apt ./README.txt 4) user docs: (I think grant may have already mentioned this) The README.txt file talks about building Tika, but there doesn't seem to be anything in the release that describes how to use Tika ... has any thought been given to including more docs in the release it self? -- gettingstarted.html perhaps? ... at the very least a paragraph should be added to the README refering to the gettingstarted.html page. Personally, i think including documentation.html and formats.html in the release are also important -- they're going to change between releases, probably more then the "getting started" type info, and should be "versioned" so moving forward people with older versions won't get misslead by the docs on the site. 5) artifacts missing: i tried following along with the gettingstarted.html (my first time using maven BTW so i may have messed something up) and ran into a snag... "mvn install" download a bunch of dependencies (i think they were maven's own dependencies since i'd never used it before), ran some test (these definitely had tika in the name) then downloaded some more things, then told me it was installing tika-0.2.jar in my ~/.m2 directory. When i looked at the next section "Build artifacts" it refered to 3 jars in my target directory -- but i only have one... [EMAIL PROTECTED]:~/tmp/tika-release/rc1/tika-0.2$ find target -name \*jar target/tika-0.2.jar ...is the gettingstarted.html wrong, or did the build not run correctly? 6) RAT: Apache RAT noticed the following files missing license info... !????? /home/hossman/tmp/tika-release/rc1/tika-0.2/src/site/resources/tika.svg !????? /home/hossman/tmp/tika-release/rc1/tika-0.2/src/site/resources/tikaNoText.svg !????? /home/hossman/tmp/tika-release/rc1/tika-0.2/src/test/resources/test-documents/testHTML.html !????? /home/hossman/tmp/tika-release/rc1/tika-0.2/src/test/resources/test-documents/testHTML_utf8.html !????? /home/hossman/tmp/tika-release/rc1/tika-0.2/src/test/resources/test-documents/testRTF.rtf !????? /home/hossman/tmp/tika-release/rc1/tika-0.2/src/test/resources/test-documents/testTXT.txt !????? /home/hossman/tmp/tika-release/rc1/tika-0.2/src/test/resources/test-documents/testXHTML.html !????? /home/hossman/tmp/tika-release/rc1/tika-0.2/src/test/resources/test-documents/testXML.xml ...I don't know if i've ever heard an opinion on needing to include the ASL header in *.svg files (they are xml, but they are also clearly generated by inkscape), but I do remember someone pointing out that test data files in formats that are capable of containing comments in them (ie: xml, html, etc...) should include the ASL header, such as... http://svn.apache.org/repos/asf/lucene/solr/trunk/example/exampledocs/hd.xml 7) javadocs: maybe this is something that is obvious to maven users, and as a non-maven user i just don't know the magic incantation, but i couldn't find any generated javadocs in the release (or in the "target" directory after running "mv install") ... since Tika is primarily a library people will use in java apps, this seems kind of important. If there is a magic maven incantation to build these, let's included the instructions somewhere (since the gettingstarted guide suggests that maven is neccessary to build tika, but not to use it (per the Artifacts and Ant sections) FWIW: browsing the nightly snapshot javadocs online i really wasn't even sure where i should start. My suggestion: documentation.html would be damn near perfect as an overview.html javadoc file. -Hoss
