[ https://issues.apache.org/jira/browse/LUCENE-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robert Muir reassigned LUCENE-2946: ----------------------------------- Assignee: Robert Muir > change file format documentation from "bit-for-bit" to highlevel > ---------------------------------------------------------------- > > Key: LUCENE-2946 > URL: https://issues.apache.org/jira/browse/LUCENE-2946 > Project: Lucene - Java > Issue Type: Task > Components: general/website > Reporter: Robert Muir > Assignee: Robert Muir > Fix For: 4.0 > > > While reviewing website docs in LUCENE-2924, > I noticed the the existing fileformats is going to be pretty hopeless for 4.0. > Before it described the format "bit-for-bit", but with flexible indexing this > is > somewhat silly (and who really wants a bit-for-bit explanation of some of the > new formats!) > I think it would be much better to give a high-level overview, perhaps > linking to javadocs or > even source code for the low-level details. > We probably should delay this until 4.0 is really close in sight (since > things are changing so fast) but we can go ahead and think about it some now. > For example: > * high level explanation of what a codec is, and the various subsystems one > is usually composed of (terms index, terms data, skiplist impl, postings > impl, etc). We can reiterate that you can make your own, and hopefully this > kind of documentation will actually encourage that. > * high level explanation of what StandardCodec is "composed of". For example > assume its Variable Terms Index, Block Terms Reader, PForDelta docs and freqs > and Simple64 positions. I think really this is the only codec we should try > to "diagram" in any way. > * high level explanation (probably with links) of some of the components. For > example we could explain what the purpose of a Terms Index is, and that this > implementation uses a finite state transducer to find the terms block for a > given term. In this case maybe we have an image now that Dawid made the toDot > useful. > * high level explanation (probably with links) of some of the compression > algorithms. For example, we could explain the basics of the available > algorithms we have (vbyte/simple/for/pfor/...) and what their advantages and > disadvantages are. > Some of the things i mentioned here are probably optional, for instance I > think its "enough" to give a high-level overview of StandardCodec, but I > can't help but think that explaining some of the architecture will be useful > for new developers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org