POI 3.12

In looking at the styles.xml file generated by POI, It appears that a lot of 
borders and fills are created that are not used. Maybe POI needs a method to 
trim off those unused fills and borders on save. This method would also be able 
to normalize the fills and borders to remove duplicates and system colors (i.e. 
fgColor = 64 or bgColor = 65). I notice that Excel does this when it saves a 
spreadsheet created by POI. Here is an example of the fills, first as created 
by POI, second as saved by Excel (same spreadsheet):

POI 3.12
       <fills count="9">
              <fill>
                     <patternFill patternType="none" />
              </fill>
              <fill>
                     <patternFill patternType="darkGray" />
              </fill>
              <fill>
                     <patternFill patternType="none">
                           <fgColor indexed="64" />
                     </patternFill>
              </fill>
              <fill>
                     <patternFill patternType="none">
                           <fgColor indexed="64" />
                           <bgColor indexed="65" />
                     </patternFill>
              </fill>
              <fill>
                     <patternFill>
                           <fgColor indexed="64" />
                           <bgColor indexed="65" />
                     </patternFill>
              </fill>
              <fill>
                     <patternFill patternType="none">
                           <fgColor indexed="44" />
                     </patternFill>
              </fill>
              <fill>
                     <patternFill patternType="none">
                           <fgColor indexed="44" />
                           <bgColor indexed="65" />
                     </patternFill>
              </fill>
              <fill>
                     <patternFill patternType="solid">
                           <fgColor indexed="44" />
                           <bgColor indexed="65" />
                     </patternFill>
              </fill>
              <fill>
                     <patternFill patternType="none">
                           <bgColor indexed="65" />
                     </patternFill>
              </fill>
       </fills>

Excel 2010 Save As (same spreadsheet)
                <fills count="4">
              <fill>
                     <patternFill patternType="none" />
              </fill>
              <fill>
                     <patternFill patternType="gray125" />
              </fill>
              <fill>
                     <patternFill patternType="none" />
              </fill>
              <fill>
                     <patternFill patternType="solid">
                           <fgColor indexed="44" />
                     </patternFill>
              </fill>
       </fills>

Note, not even Excel was perfect as it left a duplicate fill (0, and 2). 
Borders was even more dramatic POI generated 94, Excel saved 13. And now that I 
look at it, POI generated 42 styles while Excel kept 23. I understand that you 
are keeping all the fills borders and styles as they are specified, and 
re-using them as you get to a final setting, but normalizing and trimming seem 
to be what you are missing. That can also be applied to fonts and formats.

Some additional thoughts, based on not yet looking at all the code. I can see 
three steps to collecting all the garbage: Normalization, Duplicate removal, 
Orphan removal. Normalization would involve setting default values, and 
removing system colors. On the fill above, the POI generated fill would look 
like this after normalization:

       <fills count="9">
              <fill>
                     <patternFill patternType="none" />
              </fill>
              <fill>
                     <patternFill patternType="darkGray" />
              </fill>
              <fill>
                     <patternFill patternType="none" />
              </fill>
              <fill>
                     <patternFill patternType="none" />
              </fill>
              <fill>
                     <patternFill patternType="none" />
              </fill>
              <fill>
                     <patternFill patternType="none">
                           <fgColor indexed="44" />
                     </patternFill>
              </fill>
              <fill>
                     <patternFill patternType="none">
                           <fgColor indexed="44" />
                     </patternFill>
              </fill>
              <fill>
                     <patternFill patternType="solid">
                           <fgColor indexed="44" />
                     </patternFill>
              </fill>
              <fill>
                     <patternFill patternType="none" />
              </fill>
       </fills>

Duplicate removal would remove the many duplicates left by normalization. This 
would have to be carried back to the styles themselves and potentially to 
individual cells in the spreadsheet that contain fill id's. After duplicate 
removal you would have something like the following in the fills:

       <fills count="4">
              <fill>
                     <patternFill patternType="none" />
              </fill>
              <fill>
                     <patternFill patternType="darkGray" />
              </fill>
              <fill>
                     <patternFill patternType="none">
                           <fgColor indexed="44" />
                     </patternFill>
              </fill>
              <fill>
                     <patternFill patternType="solid">
                           <fgColor indexed="44" />
                     </patternFill>
              </fill>
       </fills>

And I just happen to know that my spreadsheet never uses fill "none" - 44, so 
that could d be stripped out during orphan removal to bring the fill to:

       <fills count="3">
              <fill>
                     <patternFill patternType="none" />
              </fill>
              <fill>
                     <patternFill patternType="darkGray" />
              </fill>
              <fill>
                     <patternFill patternType="solid">
                           <fgColor indexed="44" />
                     </patternFill>
              </fill>
       </fills>

I suspect that the same process could be used for borders, fonts, formats and 
styles where the order of operation would be normalization and duplicate 
removal for fills, borders, fonts, formats, then styles. This followed by 
orphan removal for styles then fills, borders, fonts, and formats.

Reply via email to