Monday 30 May 2011

HTML: special entity codes

   


How would you write β in your web page? Can you do it? And this ?

If your web page is mainly in one language, you might need to add special characters some times. To do so, you need to use the special entity codes. The main reason why you should use them, is because - as usual - target browsers may get a little bit confused if a special character is found. That is why it is better to write entities than to write symbols directly. Your HTML code should always be written in ASCII, just to avoid strange results in your text.


How to use entity codes
To use entity codes, we need something very important: the content-type meta-tag. We already discussed this in the past, however let me remind you that it is common sense to determine the charset in our web pages. In order to do so, we just need to put the following meta tag in the head of the document:
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
That is a general recommendation, even if you won't use entity code. In any case, whenever you need to insert something like œ, the above meta tag is definitely needed.
In HTML, in order to use the entities, we use the code enclosed with a "&#" at the beginning and a ";" at the end.
&#code;
where code is the entity code number. That means:
® is &#174;
As you can see, we need to know the code number to insert a special character. That is not always true. In fact, some of the numbers have been converted into entity code:
fran&cecedil;ais is français
That leads us to possible long lists of entity codes to use in our web pages. The W3C site has a page with references to HTML 4 entity codes, for those of you that need to find the code for a special character.

Another reason why to use them
There is actually another reason why we should use entity codes: requests in URLs. If we use requests in URL (and with asp we use them, don't we?), there might be trouble when your query string contains an ampersand. In those case we need to render the ampersand not like "&" but like "&amp;" because an url like:
www.somedomain.com/example.asp?a=1&b=2
might be interpreted as containing two requests, when it might contain just one.

Summary
The above reasoning leads us to say:
1) always define the charset in your pages;
2) use entity code whenever needed;
3) always use entity code in URLs.

Let me know what you think about it.

3 comments:

  1. Some entities are not supported by user browser. Have a little pity!

    ReplyDelete
  2. For a nice reference visite http://www.html-entities.org/

    ReplyDelete

Comments are moderated. I apologize if I don't publish comments immediately.

However, I do answer to all the comments.