I never figured that the solution would be so easy. Those bothersome question marks within black diamonds that were appearing occasionally on some of my older blog entries were driving me nuts.
All that I had to do was change the following:
<meta http-equiv="Content-Type" content="text/html;
charset=utf-8" />
into this:
<meta http-equiv="Content-Type" content="text/html;
charset=iso-8859-1" />
See the difference? Perhaps this is not the most elegant way to fix this, not internationally compatible with every known character set on the planet, but it works.
UTF-8 is a standard way to store characters outside the ISO-8859-1 specification within 8 or 16 bit characters. Another way to do this is by using ISO-8859-2 for eastern europe, but since the ISO standard only stores characters inside 8 bits the ISO-8859-2 spec is not fully compatible with the ISO-8859-1 spec and you will lose characters like the copyright symbol.
The reason why some of the entries are not submitted correctly is because some browsers will follow the language setting of itself to submit fields inside a form. This will result in ISO-8859-1 submission to an UTF-8 site if there is no characterset specified inside the form itself.
Another, and better, way to solve the UTF-8 problem is by making all forms submit in UTF-8.
Thanks for the tip, Art. Now I figured an easy way to search and replace those characters. First I switch over to ISO-8859-1 so that the search field accepts the correct characters. Second, each entry containing such unwanted characters can be scanned and zapped accordingly. Finally when I am all done, I switch back to the more acceptable utf-8 and it's square one all over again. Piece of cake, really.
Oh yeah, about the copyright symbol. The limitiation you mention should not be a problem as long as you use the © thingie instead.
The copy thingie is not part of any characterset. It's a workaround introduced in html 3, just as well as some other characters and entities. If you go that way you can use number; as well for all your characters which are not in your current characterset.