I've noticed how big our jars can get, and I opened a ticket about decreasing the amount of duplication with libraries some time ago, but it hasn't been a priority yet. ( https://issues.apache.org/jira/browse/THRIFT-447, https://issues.apache.org/jira/browse/THRIFT-701 are the relevant tickets.)
I'm all for making some changes, but is 1.6MB of jar really a problem for you? I know that personally my project depends on 30MB of jar, only 2 of which is my Thrift stuff. I'd love to work with you to get a patch in to extract some of the redundant code. I doubt it will be that hard to do - someone just has to take a look at it. Feel free to email me off-list if you would like to chat. I have to imagine you could fix thrift a lot faster than you could build a competing system from scratch. -Bryan On Mon, Mar 22, 2010 at 7:43 AM, tomer filiba <[email protected]> wrote: > if you recall, i'm working on a project called xthrift, which adds passing > objects by-reference on top of thrift. the project seemed very promising up > until yesterday, when i realized thrift generates way to much code to make > it feasible. > > i made an test case of 6 classes, each with 6 methods and 6 attributes, and > 6 service functions that expose those. i attached the thrift file that's > generated from my xthrift file -- it contains around 100 functions. > > generating java code using the thrift compiler yields a 2.2 MB java source > file! when compiled, it yields a 1.6MB jar! in csharp and python, the > situation is slightly better: ~700 KB. just for the sake of entropy, > compressing (bz2) the generated java code yields a 34 KB file (the a ratio > is 65! ) > > for our project, that contains ~100 classes, each with ~10 methods and ~5 > attributes, plus ~50 functions, the generated java code would weigh tens if > not hundreds of MBs, which is unacceptable, of course. > > looking at the generated code, it's easy to spot the redundancy: thrift > employs a "full beta-reduction policy", i.e., it doesn't encapsulate common > functionality into functions, instead it just repeats them over and over. > this yields ~80,000 lines of code that mostly repeat one another. > > judging from the code size, i understand thrift is not meant to handle more > than ~50 functions per project, unless you are willing to accept tens of MBs > of library footprint.[1] > is there any "compiler switch" or planned feature, to eliminate this code > bloat? > > if not, my company will have to drop thrift and adopt an in-house solution > (which we really hoped to avoid...) > > > thanks in advance, > -tomer > > [1] a 100 MB library, on today's hardware, is not unheardof, but our > project's RAM footprint is ~30 MB... it would be a pity to require such big > a footprint just for glue code. > > > > An NCO and a Gentleman >
