Here's the problem...
We need an imaging solutions that will process e-mail messages and attachments and convert them to pdf or tif so that they can be stored and viewed through a management system. A single message could consist of any number of d
ocuments with varying formats (text, .doc, html, .xls, etc.) and I need to convert anything that can be converted into a PDF document. This all needs to happen on the fly.
The Tools
I am finally putting together a number of tools to use for this task. First is the MIME processing required to extract the different parts of a message.
Mime4Net is likely going to be my tool for extracting mime messages. I plan to create text e
mails as .txt files and html emails as .html files. Attachments will be extracted as they are. (Mime4Net is a commercial product, but there is also an open source tool, called SharpMime that works, but is not as refined as Mime4Net).
The next step is to convert the messages (.txt and .html) and the files. Right now, I am only really concerned with converting all .txt, .doc, .tif, and any images (.jpg, .gif, etc).
This is where things get tricky. There are posts all over the web about various tools for this. But, I have not found any straight forward solution. In the end, it seems that the only way to get an accurate reproduction of the original document is going to involve opening the document in it's native application and printing it to a PostScript file. Ghostscript comes with a print driver for performing this task. There is a good walk through of this at ASPAlliance
This Post Script file can then be converted to PDF using Ghostscript. Ghostscript is an "interpreter for the PostScript language and for PDF".
Finally, I will need to merge all my PDF docs into one file for each message. This can be accomplished using iTextSharp
The Solution
After some prototyping I think the entire process for one message is going to look like this.
- Extract the various message parts
- Open and print all .txt, .doc, .jpg, .gif files to PostScript
- Convert PostScript files to PDF using a C# Ghostscript Wrapper
- Merge PDF files using iTextSharp
More to come in later posts.
== Edit ==
I just found a commercial tool that promises to convert just about anything to PDF.... activePDF