HowTo: Convert a Word document to clean HTML

Ever have a need to convert a Word document to HTML knowing that Word itself produces some of the worst HTML code ever? There is a simple, easy trick to get this done.

  1. Make sure you have a Gmail account
  2. Send the Word document, as an attachment, to your Gmail account
  3. Sign in to Gmail and view the email with the attachment
  4. Click the ‘View as HTML‘ link at the bottom of the email
  5. With the page opened, ‘View source‘ from your browser
  6. Copy the code

That’s all there is to it. Google produces some pretty clean HTML from a Word document. It may have to be updated to XHTML 1.0 specifications (Google uses some deprecated tags)

I had a pretty complex tabled document with lots of text styles and it did a very good job.

Check out these posts too:

  1. Convert a document to PDF through email
  2. Unprotect a protected Word document
  3. Followup: Convert Excel and PowerPoint to webpage
  4. To recover a Word document that can't be opened and hangs Word
  5. Word + HTML = Bad Markup
  6. How to convert Adobe Acrobat Reader (*.pdf) files to Word 2003
  7. Critical Word Vulnerability Uncovered