Re: [iText-questions] Some questions for converting HTML to PDF using HTMLWorker

SeungHyun Park Sun, 29 Oct 2006 22:12:02 -0800

Mail daemon may filter embeded HTML file.

HTML source is as this;

<HEAD> iText Test</HEAD>
<BODY>
<IMG src="" href="http://www.tis-co.com/images/shots/RS1.gif">http://www.tis-co.com/images/shots/RS1.gif" width="400" height="300">
Test HTML sample 
Test HTML sample 
 
Test HTML sample

<IMG src="" href="http://www.atmos.ucla.edu/~brianpm/figures/ape_figs/ape_climo_lathgt_flat_Q.pdf.png"></P">http://www.atmos.ucla.edu/~brianpm/figures/ape_figs/ape_climo_lathgt_flat_Q.pdf.png">

<TABLE height=201 cellSpacing=0 cellPadding=0 width=429 border=1>
<TBODY>
<TR>
<TD>
1</TD>
<TD>
2</TD>
<TD>
3</TD>
<TD>
4</TD></TR>
<TR>
<TD>
asdf</TD>
<TD>
sdf</TD>
<TD>
sdf</TD>
<TD>
sdf</TD></TR>
<TR>
<TD>
dfdf</TD>
<TD>
dfdf</TD>
<TD>
dfdf</TD>
<TD>
 </TD></TR>
<TR>
<TD>
 </TD>
<TD>
 </TD>
<TD>
 </TD>
<TD>
 </TD></TR></TBODY></TABLE>
 
<A href="" http://www.google.com">Google</A>

</BODY>

2006/10/30, SeungHyun Park <[EMAIL PROTECTED]>:

I am developing a program to convert HTML source to PDF.

I searched mailing list and I found that HTMLWorker and HTMLParser class.

HTMLParser may not support CJK string(I tested HTMLParser but all CJK strings became blanks.) and I decided to use HTMLWorker.

I made the code as followings; (I used iTextSharp 3.1.5)

===============================================================================
 Private Sub Test_HTMLWorker()
 Dim fs As New FileStream("test.html", FileMode.Open, FileAccess.Read, FileShare.ReadWrite )
 Dim sr As New StreamReader(fs, System.Text.Encoding.Default)
 Dim sReader As New StringReader(sr.ReadToEnd)
 sr.Close()
 fs.Close()

 Dim document As Document = New Document(A4, 20, 20, 20, 20)

 PdfWriter.GetInstance(document, New FileStream("test_output.pdf", FileMode.Create))

 FontFactory.Register("c:\\windows\\fonts\\gulim.ttc")

 Dim st As StyleSheet = New StyleSheet
 st.LoadTagStyle("body", "face", "Gulim")
 st.LoadTagStyle("body", "encoding", "Identity-H")
 st.LoadTagStyle("body", "leading", "12,0")

 document.Open()

 Dim worker As html.simpleparser.HTMLWorker = New html.simpleparser.HTMLWorker(document)

 Dim p As ArrayList = worker.ParseToList(sReader, st)

 For k As Integer = 0 To p.Count - 1
 document.Add(p.Item(k))
 document.Add(New Paragraph(vbCrLf))

 Next

 document.Close()

 sReader.Close()

 End Sub
=================================================================================

This code works fine at the HTML sources that are composed of only texts.

But, it does not work at the HTML sources with img tags; in detail, the layout of generated PDF files are different from original HTML sources.

Also, if I does not use width and height attributes at img tag, that images do not inserted at the generated PDF file.

I think that this problem results from HTMLWorker may not consider the space of image - especially the img tag within tag.

Then, I tried to insert the space that was equal to the height of image but the position of image was not updated (I succeeded in finding the chunk objects with image).

I attached sample HTML file and generated PDF files for your test.

If you could take a few minutes to answer my questions, I would really appreciate it.

Best regards,

S. H. Park

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642

_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions

Re: [iText-questions] Some questions for converting HTML to PDF using HTMLWorker

Reply via email to