Thanks for the submission. I have opened bug 60416 [1] to work on this
and committed your patch.

Discussion can take place either on the bug or on the
d...@poi.apache.org mailing list. I have bcc'd user@poi.apache.org out
of the email responses.

[1] https://bz.apache.org/bugzilla/show_bug.cgi?id=60416

On Thu, Nov 24, 2016 at 3:24 AM, Dmitry Katsubo <dm...@mail.ru.invalid> wrote:
> Dear POI community,
>
> I am trying to use SXSSFWorkbook (streaming version of workbook) to generate 
> XLSX spreadsheet with many rows and hyperlinks in the cells. It turned out 
> that effective streaming of such document is not possible, as hyperlinks are 
> accumulated in XSSFSheet#hyperlinks (see [1]). It's clear why this is 
> necessary: <hyperlinks> XML element goes after <sheetData> in <worksheet>, so 
> they need to be somehow preserved until all data rows are written. However it 
> turned out that each hyperlink is 2KB size, while the "useful" data (the 
> string link itself) is about 300 bytes (75% memory overhead). For document 
> with 60000 hyperlinks, the memory used is 132MB (see attached 
> poi-hyperlink-dump.png). The most of memory is consumed by Cur$Locations 
> which has 7 arrays, 32 elements each, consuming 1KB in total (see attached 
> poi-hyperlink-dump-detailed.png). The stack trace is the following:
>
> Thread [main] (Suspended (breakpoint at line 493 in 
> org.apache.xmlbeans.impl.store.Cur$Locations))
>         
> org.apache.xmlbeans.impl.store.Cur$Locations.<init>(org.apache.xmlbeans.impl.store.Locale)
>  line: 493
>         
> org.apache.xmlbeans.impl.store.Locale.<init>(org.apache.xmlbeans.SchemaTypeLoader,
>  org.apache.xmlbeans.XmlOptions) line: 169
>         
> org.apache.xmlbeans.impl.store.Locale.getLocale(org.apache.xmlbeans.SchemaTypeLoader,
>  org.apache.xmlbeans.XmlOptions) line: 241
>         
> org.apache.xmlbeans.impl.store.Locale.newInstance(org.apache.xmlbeans.SchemaTypeLoader,
>  org.apache.xmlbeans.SchemaType, org.apache.xmlbeans.XmlOptions) line: 592
>         
> org.apache.xmlbeans.impl.schema.SchemaTypeLoaderImpl(org.apache.xmlbeans.impl.schema.SchemaTypeLoaderBase).newInstance(org.apache.xmlbeans.SchemaType,
>  org.apache.xmlbeans.XmlOptions) line: 198
>         
> org.apache.poi.POIXMLTypeLoader.newInstance(org.apache.xmlbeans.SchemaType, 
> org.apache.xmlbeans.XmlOptions) line: 84
>         
> org.openxmlformats.schemas.spreadsheetml.x2006.main.CTHyperlink$Factory.newInstance()
>  line: not available
>         
> org.apache.poi.xssf.usermodel.XSSFHyperlink.<init>(org.apache.poi.common.usermodel.HyperlinkType)
>  line: 60
>         
> org.apache.poi.xssf.usermodel.XSSFCreationHelper.createHyperlink(org.apache.poi.common.usermodel.HyperlinkType)
>  line: 78
>         
> org.apache.poi.xssf.streaming.SXSSFCreationHelper.createHyperlink(org.apache.poi.common.usermodel.HyperlinkType)
>  line: 83
>
> By looking into tacktrace one can see that there is chance to optimize memory 
> in Locale#getLocale() by taking advantage of 
> options.hasOption(USE_SAME_LOCALE). However this option is not documented – 
> what could be the side effects?
>
> Another opportunity is to refactor XSSFHyperlink so that it does not create 
> CTHyperlink immediately but postpones that until that is really needed 
> (SXSSFWorkbook#write() is called), more specifically, when 
> XSSFHyperlink#getCTHyperlink() is called. However SXSSFCell#setHyperlink() 
> misuses this method [3] in a sense that it should use API of XSSFHyperlink 
> the same way as XSSFCell#setHyperlink() does (see SXSSFCell.java.patch).
>
> Summarizing above I would say that the following could be improved right now:
>
> * Document USE_SAME_LOCALE option. Provide a description of what could go 
> wrong if this options is used, especially when used in conjunction with 
> UNSYNCHRONIZED. Also the option assumes that Locale object could be somehow 
> instantiated but it's constructor is private and one cannot do something like 
> POIXMLTypeLoader.DEFAULT_XML_OPTIONS.put(Locale.USE_SAME_LOCALE, 
> Locale.getLocale(XmlBeans.getContextTypeLoader(), 
> POIXMLTypeLoader.DEFAULT_XML_OPTIONS));
> * Apply minor correction to documentation [4]:
>    there are still things that still may consume a large amount of memory 
> based on which features you are using, e.g. merged regions, comments, ...
>    there are still things that still may consume a large amount of memory 
> based on which features you are using, e.g. merged regions, hyperlinks, 
> comments, ...
> * Apply the attached SXSSFCell.java.patch.
>
> More advanced improvement of SXSSF would be to dump all problematic objects 
> (regions, hyperlinks, comments, ...) into temporary files and then merge them 
> into destination XSLX spreadsheet – then it would be a real streaming.
>
> POI v3.15. Links are to POI 3.12, but there are no conceptual differences.
>
> [1] 
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.poi/poi-ooxml/3.12/org/apache/poi/xssf/usermodel/XSSFSheet.java#137
> [2] 
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.xmlbeans/xmlbeans/2.4.0/org/apache/xmlbeans/impl/store/Locale.java#213
> [3] 
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.poi/poi-ooxml/3.12/org/apache/poi/xssf/streaming/SXSSFCell.java#603
> [4] https://poi.apache.org/spreadsheet/how-to.html#sxssf
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@poi.apache.org
> For additional commands, e-mail: user-h...@poi.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@poi.apache.org
For additional commands, e-mail: user-h...@poi.apache.org

Reply via email to