Hi, I think you can reduce cull time with build kdtrees option in osgdb registry or env var OSG_BUILD_KDTREES (if you not already using it). As for draw its related to large number of state changes i believe, so you should try to merge statesets. Large number of primitive sets is kinda bad, but with display lists (at least on nvidia hardware) it's dont hurt that much actually.
22.01.2011, 00:13, "Jean-Sébastien Guay" <jean-sebastien.g...@cm-labs.com>: > Hi all, > > I thought I had a pretty firm grasp on what to optimize given a certain > set of scene stats, but I've optimized what I can and I'm still getting > little improvement in results. So I'll explain my situation here and > hope you guys have some good suggestions. Sorry if this is a long > message, but I prefer to give all the relevant data now rather than get > asked later. > > The whole scene is about a 200m x 200m square (apart from the ocean and > skydome but these are not significant, I have removed them and confirmed > that the situation is the same). The worst case viewpoint is a flying > view where the whole scene could be visible at once. So I need to > balance culling cost with draw cost, since in some views we will see > only part of the scene (so we should be able to cull away at least part > of what's not visible) and in the flying view everything is visible so > we shouldn't waste too much time doing cull tests which we know will not > cull anything. > > The other thing is that there are a lot of dynamic objects, so there are > a lot of transforms. But I can't change this, it's part of our simulation. > > So, after doing some optimization (removing redundant groups, building > texture atlases where possible, merging geodes and geometry, generating > triangle strips, most of which I did with the osgUtil::Optimizer), I get > the following stats, which I'll talk about a bit later: > > Scene stats: > StateSets 1345 > Groups 392 > Transforms 672 > Geodes 992 > Geometry 992 > Vertices 139859 > Primitives 87444 > > Camera stats: > State graphs 1282 > Drawables 2151 > PrimitiveSets 73953 > Triangles 3538 > Tri. Strips 211091 > Tri. Fans 16 > Quads 11526 > Quad Strips 534 > Total primitives 226705 > > And, both in our simulator and in osgViewer, for the same scene and same > viewpoint, I get: > > FPS: ~35 > Cull: 5.4ms > Draw: 19ms > GPU: 19ms > > This is on a pretty good machine: Core i7 920, GeForce GTX 260. > > First of all, the stats above tell me that the "Primitives" part of the > scene stats refers to primitive sets, not just primitives... Since the > camera stats tell me there are over 226000 primitives in the current view. > > As you can see, the number of primitiveSets is very high. If I > understand correctly, each PrimitiveSet will result in an OpenGL draw > call, and since my draw time is what's high now, I would want to reduce > that (since I'm currently at about 3 primitives per primitiveSet on > average). If I remove triangle strip generation from the optimizer > options, the stats become: > > Scene stats: > StateSets 1345 > Groups 392 > Transforms 672 > Geodes 992 > Geometry 992 > Vertices 190392 > Primitives 51197 > > Camera stats: > State graphs 1254 > Drawables 2117 > PrimitiveSets 4899 > Triangles 17122 > Tri. Strips 191 > Tri. Fans 7212 > Quads 106464 > Quad Strips 534 > Total primitives 131523 > > This indicates to me that the tristrip visitor in the optimizer does a > pretty bad job. I looked at an .osg dump, and it seems to generate a > separate strip for each quad (so one strip for 4 vertices) which is > ridiculous... But that's a subject for another day. > > When I disabled the tristripper, you can see a massive decrease in the > number of primitiveSets (and even in the number of primitives), however > there was no significant change in the frame rate and timings. I don't > understand this. I would have expected, with more primitives per > primitiveSet (I'm now at about 26 prims per primSet on average, as > opposed to around 3 before) and much less draw calls, that the draw time > would have been much lower. That's not what happens in practice. > > My previous attempts at optimizing (using the osgUtil::Optimizer) were > also centered around lowering the number of primitives (by creating > texture atlases and sharing state so the merging of geodes and geometry > objects gave good results). And even though that also lowered the > numbers (I started at around 2215 Geodes and 2521 Geometry objects in > the same scene, compare that to 992 each now), it also had underwhelming > results in practice. > > Clearly there are more than one primitiveSet per Geometry in the above > stats. What I see in the dumped .osg file, is there is often things like: > > PrimitiveSets 4 > { > DrawArrays TRIANGLES 0 12 > DrawArrays QUADS 12 152 > DrawArrays TRIANGLES 164 12 > DrawArrays QUADS 176 152 > } > > I would expect, by reordering the vertex/color/normal/texCoord data, I > would be able to get only 2 primitiveSets there, one TRIANGLES and one > QUADS. Am I wrong? Why does the osgUtil::Optimizer not do this already > when merging Geometry objects? I expect because it's easier not to do > it, but still, it gives sub-optimal results... > > Of course I can't do that for strips or fans, unless I insert new > vertices to restart the strip. Again this is something that could be > done, but might bring diminishing returns in my case given that my own > scene contains many more triangles and quads than strips and fans (when > I turn off tristripping). > > So, first of all, am I on the right track trying to reduce the number of > primitiveSets? Do you think on current hardware, disabling tristripping > is a good idea? > > Why, when disabling tristripping which reduced the number of > primitiveSets from 73953 to 4899, didn't I see an increase in performance? > > Is there some other way to find out what's going on and seeing what I > can improve to increase the performance? I've tried running our app in > gDEBugger, which tipped me off that I was batching poorly when using > triangle strips (about 3 prims per primitiveSet as I said above). > Turning off triangle strips improved the situation (as gDEBugger sees > it), but not by that much, which is probably coherent with what I'm > seeing in practice, but I'm no closer to finding out what to improve > next. What is not mergeable now is like that because of different > settings in StateSets (backface culling on vs off, can't use texture > atlas because the wrap mode is set to REPEAT, etc.), so I don't think > osgUtil::Optimizer can help me improve the situation further... > > I have looked at video memory usage by the way, and I'm fine in that > respect, so I don't think I'm getting any thrashing or paging between > video RAM and main RAM at runtime. Also, I'm using display lists for > most of the objects in the scene, I tried using Vertex Buffer Objects > and it actually slowed it down. > > I should also mention that these results are obtained using > osgShadow::LightSpacePerspectiveShadowMap. I can run the dumped .osg > file with > > osgshadow --lispsm --noUpdate --mapres 2048 <dumped_file>.osg > > and I get the results above, which are pretty similar to our simulator. > If I run the same data file in plain osgViewer without shadows, it runs > at a solid 60Hz, with stats and timings: > > Scene stats: > StateSets 1345 > Groups 392 > Transforms 672 > Geodes 992 > Geometry 992 > Vertices 190392 > Primitives 51197 > > Camera stats: > State graphs 321 > Drawables 810 > PrimitiveSets 1774 > Triangles 7243 > Tri. Strips 85 > Tri. Fans 2508 > Quads 39370 > Quad Strips 178 > Total primitives 49384 > > FPS: 60 > Cull: 1.7ms > Draw: 8ms > GPU: 6.8ms > > (that's the no tristrips version, so compare these stats to the second > set of stats from the top, not the first) > > I would have expected most numbers there to be half what they were with > shadows enabled, but as you can see they're consistently less than half, > so shadows added more than a 100% overhead... Note that even if it added > exactly 100% overhead, I would still be at 16ms draw, which is too much, > but I'm just mentioning it in case it may prompt some other suggestions. > > I'm not sure I could send my whole scene to everyone on the list, but I > might be able to send it to someone if they want to see firsthand. Just > the bare .osg file without any textures and without ocean and skydome > shows the problem adequately well. > > Thanks in advance for any suggestions you might have. I really need to > improve this, and I've been working for a while already with only a > small improvement to show for my time... > > J-S > > -- > ______________________________________________________ > Jean-Sebastien Guay jean-sebastien.g...@cm-labs.com > http://www.cm-labs.com/ > http://whitestar02.webhop.org/ > _______________________________________________ > osg-users mailing list > osg-users@lists.openscenegraph.org > http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org _______________________________________________ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org