You are viewing epic_rants

epic_rants

People! Stop using Word to convert to HTML!

« previous entry | next entry »
Mar. 16th, 2010 | 01:29 pm
posted by: jane_elliot in epic_rants

For everyone out there who uses Word to automatically convert to HTML -- STOP.  Seriously.  More than half of the Word-to-HTML files I encounter are nearly unusable due to tables (guess what?  They don't work if your reader is using a different screen size), pictures (which invariably end up over the text), single line spacing between paragraphs (I can't even figure out where Word screwed up there), or a half-dozen other issues.  Even worse, even the readable files have at least one problem with them (usually the text coming out way too large).  It may look great on your computer, but trust me -- it's not going to work on *everyone's* computer.  And how stupid is it to lose readers just because HTML 4.0 came out and your Word-created HTML isn't 100% compatible or because Microsoft apparently makes a special effort to have Word produce HTML files that look like shit in Firefox?

Plus, adding injury to injury, Word-converted HTML files are completely non-tweakable at the code level, which means that when I find a file with 14 pt. font, I can't just go into the code and fix it.  I have to either copy and paste it into Word (is it irony or tragedy?) or I just have to do what I usually do -- hit the back button and find a story that doesn't require me to physically manipulate it to be read.

Instead of doing the automatic conversion, here's an easy way to format your story in Word so that pretty much everyone on earth can at least read it.  It won't be pretty, it won't be fancy, and it won't have any bells and whistles.  What it *will* offer, however, is the ability to have your story be readable on every single computer/cell phone/eReader on the planet that is capable of displaying HTML files. 

-Before you start typing, *turn off smart quotes*.  You can do this under Tools -> Auto Correct Options -> Auto Format *and* Auto Format As You Type.  Frankly, if you intend for people to be able to read your stories on archives, cell phones, and/or eReaders, it's best to uncheck nearly everything on those two tabs.

-If you've already finished typing your story, you can also turn off smart quotes and then do a find-replace (Control+H) for the opening and closing double quote (do them separately) as well as one for apostrophes.

-Once the story is done, type <HTML><BODY> at the very top.

-Put <b> in front of every word/phrase/section you want bolded and </b> at the end.  Ditto for <i></i> (italics), <u></u> (underline), <center></center> (center).

-Do a find-replace (control+H) and click on 'More'.  Then click on 'Special'.  Then on 'Paragraph Mark'.  This will put a ^p in the 'Find' field.  In the 'Replace' field, put: <BR><BR>

-Click 'Replace all'.  This should add lots and lots of <BR>s into your file.  If this is not the case, do the previous step again, only instead of 'Paragraph Mark', click on 'Manual Line Break' (^l).

-At the very end of your file, type </BODY></HTML>

-Copy and paste the entire file into notepad and save it using the file name you want to use when uploading the file.

-Find the file where you saved it and change the extension from .txt to .html (if you can't see your extensions, open up any folder and go to Tools -> View and in that massive list uncheck 'Hide extensions for known file types').

-Upload the file.

Easy as that!  Trust me, your readers (especially your readers with eReaders) will thank you!

ETA:  Apparently there are other programs that also use Word's incredibly shitty convert to HTML programming (CoffeeCup Editor, I'm looking at you).  Avoid those as well as they result in the exact same problems.

Link | Leave a comment | Share

Comments {70}

Isis

(no subject)

from: isiscolo
date: Mar. 16th, 2010 07:48 pm (UTC)
Link

There is a nice piece of software that converts much more cleanly, which you can get from http://word-to-html.com. It is not free, though, which may dissuade some people; it's currently $47, although I bought it when it was $35 or so some years ago.

Reply | Thread

jane_elliot

(no subject)

from: jane_elliot
date: Mar. 16th, 2010 07:54 pm (UTC)
Link

I have to admit that I've been coding my files by hand for so long that I can't imagine doing anything else (especially now that I've learned you can actually do a find replace on *formatting*), but I'm all in favor of programs that produce a clean end product. Thanks!

Reply | Parent | Thread

boogieshoes

(no subject)

from: boogieshoes
date: Mar. 16th, 2010 08:24 pm (UTC)
Link

what i do is save as .html, then look at the source file and tend to delete everything i don't recognize. i started in web-space hand-coding my pages, which i think is a perfect pain - but it does allow me to recognize sluff i don't need.

it's not a perfect method, but it gets rid of 99% of the issues inherent in the (M$Word) system.

-bs

Reply | Thread

jane_elliot

(no subject)

from: jane_elliot
date: Mar. 16th, 2010 09:23 pm (UTC)
Link

I've tried that with Word-to-HTML files for stories I really, really wanted to read, but in the end there was just too much code to deal with (especially because I read really long stories). *sigh*

Reply | Parent | Thread | Expand

valiha

(no subject)

from: valiha
date: Mar. 16th, 2010 09:10 pm (UTC)
Link

There's also a website called Word to Clean HTML I've sometimes used to convert files. It's useful for those times you need a quick conversion.

Reply | Thread

jane_elliot

(no subject)

from: jane_elliot
date: Mar. 16th, 2010 09:24 pm (UTC)
Link

Sweet! We need to spread the word:)

Reply | Parent | Thread

busaikko

(no subject)

from: busaikko
date: Mar. 16th, 2010 09:59 pm (UTC)
Link

I use Astolat's fanfic conversion macros, which are free and here: http://astolat.livejournal.com/157513.html

Reply | Thread

jane_elliot

(no subject)

from: jane_elliot
date: Mar. 16th, 2010 10:13 pm (UTC)
Link

Very cool, thanks!

Reply | Parent | Thread

Kuro Tenshi

Also..

from: kuro_tenshi13
date: Mar. 17th, 2010 12:12 am (UTC)
Link

Small note on the [b] and [i] tags. They work fine for visual browsers, but if you want screen readers to pick up the same cues, [strong] instead of [b] and [em] instead of [i]. Visually, you'll see no difference, strong looks bold and em looks italicized, but with screen readers, strong will give a stronger inflection, while em will give an emphasis.

Reply | Thread

jane_elliot

Re: Also..

from: jane_elliot
date: Mar. 17th, 2010 12:37 am (UTC)
Link

I didn't know that, thanks!

Reply | Parent | Thread

Amalthia

repost to fix tags

from: amothea
date: Mar. 17th, 2010 03:21 am (UTC)
Link

I've been using WordPerfect recently with my conversions and so far so good. It gives relatively clean html from a doc file and it puts the code in the correct order, it also uses < EM> and < STRONG > tags instead of b and i. I personally can't tell the difference between the two on my Sony PRS but I hear from others there is a slight difference.

Is it okay if I link to this post from [info]fanfic_ebooks?

Reply | Thread

Amalthia

Re: repost to fix tags

from: amothea
date: Mar. 17th, 2010 03:22 am (UTC)
Link

that should be fanfic_ebooks I hate not being able to edit my comments...

Reply | Parent | Thread

Elspeth

(no subject)

from: elspethdixon
date: Mar. 17th, 2010 06:25 am (UTC)
Link

There are... people who don't code their html files by hand? (granted, they're probably people who use much fancier web design than I ever did). I've been doing it that way since geocities. I still hand-code everything I post to Tales of Suspense, and I've gotten so used to [i] [/i] to italicize things that these days I automatically type it in while writing instead of hitting Ctrl+I.

Reply | Thread

OwlRigh

(no subject)

from: owlrigh
date: Mar. 17th, 2010 02:15 pm (UTC)
Link

*Me too. It's so easy to write in Notepad and code as you want; that way not only are you using a very lightweight on resources program, you don't have annoying squiggles under everything you write ... (or, like me, don't have Word installed in the first place.)

Reply | Parent | Thread

Rat Creature

(no subject)

from: ratcreature
date: Mar. 17th, 2010 02:08 pm (UTC)
Link

I wrote myself a shell script some years ago to strip junk html from word converted files when more people still posted to their own websites.

Reply | Thread

jane_elliot

(no subject)

from: jane_elliot
date: Mar. 17th, 2010 03:29 pm (UTC)
Link

Sounds very cool -- I've seen a lot of nifty resources in response to this post:)

Reply | Parent | Thread

Don't call me Shirley

Netscape Composer

from: dorothy1901
date: Mar. 17th, 2010 02:24 pm (UTC)
Link

Another possibility is to write your story in Netscape Composer instead of Microsoft Word. Composer is a WYSIWYG (What You See Is What You Get) HTML editor. It produces much cleaner HTML code. Although no longer supported, it can still be downloaded as part of Netscape Navigator 7.2 (which is also no longer supported).

I've been using Composer for years, and I love it to bits.

Reply | Thread

jane_elliot

Re: Netscape Composer

from: jane_elliot
date: Mar. 17th, 2010 03:12 pm (UTC)
Link

Cool, thanks!

Reply | Parent | Thread

beatrice_otter

(no subject)

from: beatrice_otter
date: Mar. 17th, 2010 03:02 pm (UTC)
Link

For me, I write in Word with blank lines between paragraphs. When I'm ready to post a story, I save it and then do a find-replace with < strong > and < em > and any other HTML tags I might want. (I've tried adding them as I go along, but find it throws me off my writing; also, when I'm editing and revising, it distracts me from the story. So HTML is always the very last thing added.) Then I c&p the whole thing into Notepad to strip out things like smart quotes. Then I close the Word doc without saving the HTML stuff, and save the Notepad doc. I use the Word file to upload to the Pit of Voles, which does a good clean conversion automatically, and I use the .txt file from Notepad to upload everywhere else.

If I need to edit something major later, I go back and fix it in the Word doc (without the HTML tags) and redo the c&p to Notepad as necessary.

Reply | Thread

jane_elliot

(no subject)

from: jane_elliot
date: Mar. 17th, 2010 03:31 pm (UTC)
Link

Yay! I like this process:)

(Though, I should warn -- I've noticed that Notepad no longer converts smart quotes to straight quotes. Apparently MS is so attached to the smart quotes that even Notepad has been made to recognize them. *sigh*)

Reply | Parent | Thread | Expand

vtn

(no subject)

from: cosmicdancer
date: Mar. 18th, 2010 04:02 pm (UTC)
Link

(here from metafandom)

OMG, you are awesome! I never knew how to do the find and replace for paragraph marks! Was getting very annoyed at having to manually put in <br> or <p> at the beginning of every paragraph for stuff on my website.

Reply | Thread

jane_elliot

(no subject)

from: jane_elliot
date: Mar. 18th, 2010 04:48 pm (UTC)
Link

I spent years adding
s and

s by hand and when you write stories as epic as the ones I write, it's an utter pain in the ass. I was so happy when I finally figured out what the special characters part of the Find-Replace function meant:)

Reply | Parent | Thread

Hambel

Here via metafandom

from: hambelandjemima
date: Mar. 18th, 2010 04:08 pm (UTC)
Link

There's a lot of useful info here. Thanks for sharing it :)

Reply | Thread

jane_elliot

Re: Here via metafandom

from: jane_elliot
date: Mar. 18th, 2010 04:47 pm (UTC)
Link

No problem! I have to admit, I've found some great information in the comments -- many of their systems sound easier than mine:)

Reply | Parent | Thread

Starkindler

(no subject)

from: nufaciel
date: Mar. 18th, 2010 06:06 pm (UTC)
Link

Nice! Very useful. I didn't realize that formatting could be edited like that. Usually I just hand code.

Note on the saving html files in Notepad:

You don't have to redo the extension when you originally save. When you save it, do it like this:

"nameofhtmlfile.html" (and don't worry about the extension box below it)

And it will automatically save as an html file. :D I do this all the time when making/editing web pages. Then if you want/need to edit, just open with Notepad. If people don't want to worry about bouncing back and forth between their tools to change the view of extensions, this is much easier.

Reply | Thread

Starkindler

(no subject)

from: nufaciel
date: Mar. 18th, 2010 06:07 pm (UTC)
Link

Oh, and be sure you have the quotes when saving too. That's how it knows it's supposed to be saved as an html file.

Reply | Parent | Thread | Expand

mjj

(no subject)

from: flemmings
date: Mar. 18th, 2010 06:12 pm (UTC)
Link

Here from metafandom--

What about paragraph indents? I'm attached to mine, hand-coding those is a PITA, and I'm not up to the css level of coding yet. Word's excess of coding bumpf is annoying, but they do keep the indents for you.

Reply | Thread

Stephanie C. Leary

(no subject)

from: sleary
date: Mar. 18th, 2010 06:48 pm (UTC)
Link

It's supremely easy in CSS:

p { text-indent: 1em; }

Make it .5em if that's too big.

Reply | Parent | Thread

Stephanie C. Leary

(no subject)

from: sleary
date: Mar. 18th, 2010 06:47 pm (UTC)
Link

wordoff.org is my special friend. I also have a great Python script that handles multiple documents and an Automator script (also there) that makes it easier to use on Macs.

I absolutely will not deal with Word HTML without running it through one of those two applications.

Reply | Thread

jane_elliot

(no subject)

from: jane_elliot
date: Mar. 18th, 2010 07:28 pm (UTC)
Link

Cool, thanks!

Reply | Parent | Thread

Kate

(no subject)

from: kate_nepveu
date: Mar. 18th, 2010 08:02 pm (UTC)
Link

Please use <p> instead of <br><br>? It's semantically correct and makes applying custom stylesheets so much easier.

(I have no idea how I got here now, sorry.)

Reply | Thread

jane_elliot

(no subject)

from: jane_elliot
date: Mar. 18th, 2010 08:11 pm (UTC)
Link

I find < p > misfires a *lot*, both with my eReader and with different browsers. < br > is clunky and out of date, but I've never, ever encountered an instance where < br > did not produce a hard return. Many, many times, on the other hand, I've found < p > not actually produce a break at the end of a paragraph and/or produce a break at the end of a paragraph but not produce an extra space before the next paragraph. If I have to choose between clunky and reliable and more elegant but more prone to error, I'm going to go with the former.

Reply | Parent | Thread | Expand

Lisa

(no subject)

from: meridian_rose
date: Mar. 18th, 2010 08:27 pm (UTC)
Link

Thank you :) Some of this I knew, some I didn't. The smart quotes thing I found out the hard way; whenever I prepared what I thought was a text href command and pasted it into a comment or post it didn't work. The line break thing is new; I've been using find and replace the paragraph break with 2 paragraph breaks to get the extra lines.

Reply | Thread

jane_elliot

(no subject)

from: jane_elliot
date: Mar. 18th, 2010 09:07 pm (UTC)
Link

Smart quotes are evil. Unfortunately, most new authors either don't notice them or don't think anything of using them. The end result is that there is a lot of long stories that I want to upload into my eReader and can't (because my eReader translates smart quotes as question marks and it's impossible for me to read a long story with my dialogue all prefaced and concluded with question marks).

Reply | Parent | Thread

girl, you're a dandelion

(no subject)

from: sarken
date: Mar. 18th, 2010 10:31 pm (UTC)
Link

Before you start typing, *turn off smart quotes*.

Word -- if you'll excuse my pun. Ugh. Not only do they break things, but sometimes they just look like hell in certain fonts.

Reply | Thread

jane_elliot

(no subject)

from: jane_elliot
date: Mar. 19th, 2010 12:31 am (UTC)
Link

Exactly!

Reply | Parent | Thread

sad lizard jackson just wants a friend

(no subject)

from: queenitsy
date: Mar. 19th, 2010 12:36 am (UTC)
Link

If I could convince my entire office to turn off smart quotes forever, my world would be a much happier place.

Reply | Thread

jane_elliot

(no subject)

from: jane_elliot
date: Mar. 19th, 2010 12:45 am (UTC)
Link

I'd love to get rid of smart quotes entirely. The very first thing I do when installing Word is to turn off smart quotes, both on personal computers and office computers. No one's complained yet.

Reply | Parent | Thread

Amalthia

smart quotes question

from: amothea
date: Mar. 19th, 2010 03:25 am (UTC)
Link

What exactly do smart quotes do and why does it mess up formatting?

I know for awhile I had replace them with regular quotes because otherwise anything made with smartquotes would not convert over to LRF or EPUB so I'd have missing quotations and apostrophes. But I assumed that was a formatting issue?

Reply | Thread

jane_elliot

Re: smart quotes question

from: jane_elliot
date: Mar. 19th, 2010 03:32 am (UTC)
Link

Smart quotes are those c-shaped quotation marks that curve in on either side of a sentence. They are not part of the standard, basic ASCII alphabet and, as such, if a computer or software is only capable of reading basic ASCII (which is true of a lot of new and still developing hardware, like eReaders) then the software cannot interpret the characters and thus displays something other than quotation marks.

In short, they are eeeevil:)

More info here:
http://en.wikipedia.org/wiki/Ascii

Reply | Parent | Thread | Expand

sansreads

(no subject)

from: sansreads
date: Mar. 19th, 2010 03:56 am (UTC)
Link

here via metafandom
I'm another "STOP using Word to html" person. Pardon the screaming, but really, MS Word with it's extra codes and all drive me nuts when I have to code for the ebook archive I run. Hand coding is the way to go for me.

Have you tried OpenOffice? Although it has it's own quirks, it produces html with pretty standard codes and extras can be striped off easily, compared to Word.

Reply | Thread

jane_elliot

(no subject)

from: jane_elliot
date: Mar. 19th, 2010 04:23 am (UTC)
Link

I have to admit, I was not impressed with Open Office. Fortunately, I've always hand-coded my HTML so it's not a huge deal for me.

Reply | Parent | Thread

Sarah K

(no subject)

from: tears_of_nienna
date: Mar. 19th, 2010 07:11 am (UTC)
Link

(here from metafandom)

The first thing I do when I open up a new copy of Word is turn off smart quotes and the autoformatting that makes an ellipsis into one character. I spent way too long when I was 14 or 15 editing the hell out of my wonky ff.n uploads because the characters did not translate.

Actually, now, I just pull up the "AutoFormat As You Type" page and kill everything as soon as I start. No, I do not want automatic paragraph indents. Okay, you may capitalize my sentences when I forget, but that's it! :)

(In Word 2007, turning it off is a little more complicated than on previous versions. You have to click on the Word symbol on the upper left--I guess Microsoft is too cool to use "File"--and click Word Options at the bottom, then go to Proofing. At the top of that page is the button that will open the AutoCorrect window, with tabs for AutoFormat and AutoFormat As You Type. If I didn't know better, I'd say they were trying to make this difficult...)

Reply | Thread

jane_elliot

(no subject)

from: jane_elliot
date: Mar. 19th, 2010 06:15 pm (UTC)
Link

Ditto! Honestly, no one even notices that smart quotes is turned off and it makes my life so much easier.

(I've played around in Word 2007, but I'm not thrilled with the way that MS is pushing their styles -- i.e., the one aspect of their application that is 100% non-compatible with nearly every other word processing program available. I never have and never will use styles (for that very reason) so I'm pissed that a full third of the limited menu space is irrevocably allocated to them.)

Reply | Parent | Thread | Expand

Nic

(no subject)

from: jedinic
date: Mar. 19th, 2010 07:53 am (UTC)
Link

I'm a huge advocate of saving in RTF (Rich Text) format.

- You can still use Word
- It's a MUCH easier file-format to convert
- It's a more universally-used file format
- It keeps your italics

There are heaps of free converter programs out there, Google should help.

Reply | Thread

jane_elliot

(no subject)

from: jane_elliot
date: Mar. 19th, 2010 06:15 pm (UTC)
Link

Great idea, thanks!

Reply | Parent | Thread

Ender

(no subject)

from: enderwiggin24
date: Mar. 19th, 2010 12:08 pm (UTC)
Link

just wanted to say thanks for this rant, though I am not understanding all the formatting issues, and I have no ebook reader yet, I certainly plan for buying one , and then this will be handy!

you should rant about this every second month or so, maybe people will remember it then :D

Reply | Thread

jane_elliot

(no subject)

from: jane_elliot
date: Mar. 19th, 2010 06:15 pm (UTC)
Link

Heh. I'm not sure I'm willing to put that kind of time into the issue, but it's getting a lot of press, so hopefully the word will spread.

Reply | Parent | Thread

snorkackcatcher

(no subject)

from: snorkackcatcher
date: Mar. 19th, 2010 07:31 pm (UTC)
Link

Agreed, although Word is so very useful for editing. (Smart quotes are a Good Thing if your target is printed output, but they really screw up otherwise.)

For fic/post conversions I tend to use my set of macros to take the tedium out of hand-coding. For stuff that doesn't handle, well, a dozen or so Perl regexes clear out most of the crap and produce sensible if badly formatted HTML source.

Reply | Thread

jane_elliot

(no subject)

from: jane_elliot
date: Mar. 19th, 2010 07:38 pm (UTC)
Link

Ooo, more macros! (Honestly, the best thing about posting this was how many awesome resources folks have commented with:) Thanks!

Reply | Parent | Thread