Contents - Index - Previous - Next


Converting HTML entities (special characters)


HTML entities are the formal name for special characters in HTML, such as 
   
É

which stands for É (upper-case E with acute accent), or 

æ

which stands for æ (the ae ligature).

HTML entities are required for correct results on web pages, but the normal Windows (ANSI) characters are required in other Windows programs.

Concordance gives you full control over conversion from HTML entities to ANSI characters when a source text is read, and over conversion from ANSI characters to HTML entities when a concordance is exported as HTML.
  • In the Preferences dialog (on the Tools menu) you can choose Convert HTML entities found in input and Convert to HTML entities during output.  

    The default state of Convert HTML entities found in input is true, meaning that unless you have un-ticked this option, HTML entities found in your input texts will be converted to their ANSI equivalents, provided the HTML entity and its ANSI translation are found in the translation files (see below).

    The default state of Convert to HTML entities during output is also true, meaning that unless you have un-ticked this option, ANSI characters will be converted to their HTML equivalents when you save a concordance as HTML, provided the ANSI character and its HTML entity translation are found in the translation files (see below).
    Note: If you are using a non-Western language, you will probably want to turn off this option, or else define your own translations (as explained next).

    Installed with the program are two files Latin1toHTML.ini and HTMLtoLatin1.ini which define the translations between ANSI and HTML characters.  You can edit these files with a plain text editor such as the Multiple Document Editor which is part of Concordance.  

    You can add extra HTML entities and their corresponding ANSI characters, or you can change the translation of existing HTML entities.  Athough you cannot at present change the names of these files, they need not contain translations to and from the Latin-1 character set.

    If these translation files are moved, deleted, or re-named, the conversions will not take place.