Skip to main content

Convert HTML Content to PDF format using Java

I have researched about to convert HTML to PDF.
I got 2 approaches.

1. Using Tidy and XSL-FO.
2. Using the project xhtmlrenderer

Basically the 1st approach is :

1. Your HTML will need to be validate in according to XHTML, for this you could use Tidy.
2. After you will need to transform this new XHTML document in XLS-FO, you can review this link to find the stylesheet resource (XHMTL to XLS-FO).
3. Finally, convert your XLS-FO document in a PDF document.

There are 2 links that could help with this approach:

http://www.onjava.com/lpt/a/3924
http://www.javaworld.com/javaworld/jw-04-2006/jw-0410-html.html

The 2nd approach is using the project xhtmlrenderer
(https://xhtmlrenderer.dev.java.net/)
This is easier than 1st approach. This tool hides the steps mentioned in the 1st approach and use CSS.
This project uses a CSS parser (http://sourceforge.net/projects/cssparser/).
the unique problem the I found out was when you want to use external CSS file in your HTML file.

In the example used in the project I had modified (red highlighted) :

...
String css = "../webapps/myContext/PDFservlet.css";

// put in some style
buf.append("<head
><link rel='stylesheet' type='text/css' "+
"href='"+css+"' media='print'/
></head>");
....
Document doc = builder.parse( input );
ITextRenderer renderer = new ITextRenderer();
renderer.setDocument(doc, null);

I have deployed the project in Tomcat 5.5. The problem was related to find the css file by the renderer (URI and BASE URL).
Too, If your HTML is not validate in according to XHTML you should use Tidy(Jtidy).
Both approach made use of iText library(
www.lowagie.com/iText/ ).




Comments

Unknown said…
This is great, thanks for sharing your approaches in converting html to pdf. Thumbs up!

san diego seo

Popular posts from this blog

My first serious Groovy class ..... decompiling java classes with closures

After I read the chapter 6 "closures" of the book Groovy and Grails Recipe, and I decided to use the power of closures of Groovy for resource (files) with other problem that I had, decompile in one step every class of jar library. Well, the purpose of this class is call an external java decompiler (jad) from a Groovy class and execute the command into directory where the jar file was decompressed. And by using the power of closures executes recursively into this directory until find the classes. Well, no more words, here the class package demo class Executor { // directory where the library(jar) was decompressed def path /** * Execute the decompilation of the classes into the directory defined in the path * @return */ def decompileClasses(){ def directory = new File(path) //make use of closures defined in the Object class directory.eachFileRecurse { def name = it.absolutePath //if the current resource hasn't a .class extension continues with...

How to .. Integration Non-SAP J2EE-based Web Applications into SAP Portal with SSO Part 1

We are going to integrate Non-SAP J2EE-based Web Applications into the SAP Portal with Application Integrator and SSO. In this part, I will discuss the overall of these posts and configure the iView with Application Integrator Overview of Integration. To perform this integration must take into account the following steps: Deployment of the portal application for the creation of the system portal object Create and set the type of Application Integrator iView that will contain the applications to integrate. Installing SAPSSOEXT and SAPSECU libraries Deployment of the application gateway called SsoGatewayWeb Changing the target application. This integration has the following restrictions: It applies only for web applications based on J2EE Servlet. Depend exclusively on the sucessful load of the libraries supported by SAP (sapssoext and sapsecu) in both Windows and UNIX environments. The target application must have created a profile for the user id logged to the SAP portal, this sh...