<HEAD> iText Test</HEAD>
<BODY>
<P><IMG src="" href="http://www.tis-co.com/images/shots/RS1.gif">http://www.tis-co.com/images/shots/RS1.gif" width="400" height="300"></P>
<P><FONT size=6>Test HTML sample</FONT></P> <BR/>
<P><FONT face=Arial size=4><STRONG>Test HTML sample</STRONG></FONT></P><BR/>
<P> </P>
<P><FONT face=Arial>Test HTML sample</FONT></P><BR/>
<P><IMG src="" href="http://www.atmos.ucla.edu/~brianpm/figures/ape_figs/ape_climo_lathgt_flat_Q.pdf.png"></P">http://www.atmos.ucla.edu/~brianpm/figures/ape_figs/ape_climo_lathgt_flat_Q.pdf.png"></P >
<P>
<TABLE height=201 cellSpacing=0 cellPadding=0 width=429 border=1>
<TBODY>
<TR>
<TD>
<P>1</P></TD>
<TD>
<P>2</P></TD>
<TD>
<P>3</P></TD>
<TD>
<P>4</P></TD></TR>
<TR>
<TD>
<P>asdf</P></TD>
<TD>
<P>sdf</P></TD>
<TD>
<P>sdf</P></TD>
<TD>
<P>sdf</P></TD></TR>
<TR>
<TD>
<P>dfdf</P></TD>
<TD>
<P>dfdf</P></TD>
<TD>
<P>dfdf</P></TD>
<TD>
<P> </P></TD></TR>
<TR>
<TD>
<P> </P></TD>
<TD>
<P> </P></TD>
<TD>
<P> </P></TD>
<TD>
<P> </P></TD></TR></TBODY></TABLE></P>
<P> </P>
<P><A href=""
http://www.google.com">Google</A></P>
<P> </P>
<P>
</P>
<P> </P>
</BODY>
I am developing a program to convert HTML source to PDF.
I searched mailing list and I found that HTMLWorker and HTMLParser class.
HTMLParser may not support CJK string(I tested HTMLParser but all CJK strings became blanks.) and I decided to use HTMLWorker.
I made the code as followings; (I used iTextSharp 3.1.5)===============================================================================
Private Sub Test_HTMLWorker()
Dim fs As New FileStream("test.html", FileMode.Open, FileAccess.Read, FileShare.ReadWrite )
Dim sr As New StreamReader(fs, System.Text.Encoding.Default)
Dim sReader As New StringReader(sr.ReadToEnd)
sr.Close()
fs.Close()Dim document As Document = New Document(A4, 20, 20, 20, 20)
PdfWriter.GetInstance(document, New FileStream("test_output.pdf", FileMode.Create))
FontFactory.Register("c:\\windows\\fonts\\gulim.ttc")
Dim st As StyleSheet = New StyleSheet
st.LoadTagStyle("body", "face", "Gulim")
st.LoadTagStyle("body", "encoding", "Identity-H")
st.LoadTagStyle("body", "leading", "12,0")document.Open()
Dim worker As html.simpleparser.HTMLWorker = New html.simpleparser.HTMLWorker(document)
Dim p As ArrayList = worker.ParseToList(sReader, st)
For k As Integer = 0 To p.Count - 1
document.Add(p.Item(k))
document.Add(New Paragraph(vbCrLf))Next
document.Close()
sReader.Close()
End Sub
=================================================================================This code works fine at the HTML sources that are composed of only texts.
But, it does not work at the HTML sources with img tags; in detail, the layout of generated PDF files are different from original HTML sources.
Also, if I does not use width and height attributes at img tag, that images do not inserted at the generated PDF file.
I think that this problem results from HTMLWorker may not consider the space of image - especially the img tag within <p> tag.
Then, I tried to insert the space that was equal to the height of image but the position of image was not updated (I succeeded in finding the chunk objects with image).
I attached sample HTML file and generated PDF files for your test.If you could take a few minutes to answer my questions, I would really appreciate it.
Best regards,
S. H. Park
------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________ iText-questions mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/itext-questions
