Contents - Index - Previous - Next


Alphabet


The alphabet is user-definable using the edit box in the Alphabet Dialog on the Text Menu.  The Alphabet controls the way that Concordance recognises words: characters included in the Alphabet are part of a word and all other characters are not.  The Alphabet has no effect on sorting: see below for more on sorting.

Getting the right alphabet is a powerful way of producing exactly the results you want, and a little trial and error is often needed.

Edit Alphabet

Use the Alphabet Editor to define which characters you want in your alphabet.

Note that in Concordance Version 2.0.0 and later, the numerals 0..9 are included in the default alphabet. In earlier versions they were not part of the default alphabet and you had to add them if your text or your references contained numerals which you wanted to keep.

You can copy and paste characters from the Character Set display in the Language and Font Control using the right mouse button or standard Windows keys.

Characters not in Alphabet, when found in text

If you select 'Add to Alphabet' instead of 'Ignore', when Concordance reads your text and finds a character which is not in your Alphabet, the character will automatically be included in the Alphabet and hence will be treated as part of a word. If you select 'Ignore' instead, characters not in the Alphabet will be removed entirely from your concordance.  (They will be removed even if they occur inside reference markers.)  

Either way, Concordance will tell you which characters it is adding to the Alphabet or ignoring. This information appears in the Progress dialog as a concordance is being made.

Choosing 'Add to Alphabet' may seem like an easy option - it means, for example, that you can start with an empty Alphabet and still have all words recognised. In practice it may create as many problems as it solves. If every undefined character is treated as part of a word, you tend to end up with a lot of words like "and/or" because the slash has been added to the alphabet. 

If you choose 'Add to Alphabet', then, you may need to make your concordance over again after expelling some unwanted characters added during the first attempt. If you choose 'Ignore', you may have to do the opposite: add characters one at a time to the Alphabet which were reported as ignored.

Tip: Avoid adding lots of characters to your alphabet that you don't really need for the text you are working with.  It's best to start with the simplest possible alphabet and add characters as required.  'Add to Alphabet' is a better option for new users; 'Ignore' for experienced users.

Each concordance you make has its user-defined alphabet saved with it. Using the Options dialog, you can choose a setting so that reloading a concordance from disk will restore the alphabet which was in force when it was made.  This setting is off by default.

You should open the Alphabet Editor from time to time and check what alphabet is currently in force, as it may not be what you expect.

Tip: If you choose 'Add to Alphabet' and make a concordance, lots of characters you don't really want may be added to your Alphabet.  If your Alphabet is one you have worked carefully to refine, you might like to save it separately in case you need to revert to it later. Copy and paste all the characters from your Alphabet to a new file in the Multiple Document Editor and save the file.

Sort Alphabet

You can sort the characters in your alphabet at any time by pressing the Sort Alphabet button. 

Sorting here is for ease of reading only.  You cannot alter the order in which Concordance sorts words by moving letters around in your Alphabet.   All that counts in the Alphabet dialog is the presence or absence of characters, not their order. The order of the letters is defined by Windows, depending on the language you have chosen and the current character set.  

This sort is not case-sensitive: hence a lower-case letter may sometimes appear before its upper-case equivalent, and sometimes after.  This makes no difference.

If you are trying to change the way Concordance sorts your finished concordance, see Sorting.

Restore Defaults

This button replaces the contents of the Alphabet Editor with the default alphabet. This is as follows: the letters a..z and A..Z, the hyphen, the apostrophe, and (starting in Version 2.0.0) the numerals 0..9.

Languages

This button opens the Language and Font Control which provides support for languages other than English.

Word separators

In the Alphabet Dialog you can also specify word separators. These are characters which will be treated as marking a division between words.  A space is always treated as a word separator.

By using different combinations of alphabet and word separators you can gain extensive control over the way your source text is processed.

Overlap between Alphabet, Word Separators, and Reference Markers

If a character is in your Alphabet, it cannot also be a Word Separator or a Reference Marker.  If you have any duplication of this kind, the program will warn you (repeatedly!), and will not make a concordance until you remove the duplication.  To alter Reference Markers, go to the References dialog.

Handle split words as in Version 2.0.0

This checkbox allows you to choose between two alternative behaviours for splitting words when non-word characters are encountered in the middle of a word.  The default  behaviour matches that of Version 3.0; the alternative matches that of Version 2.0.0.  For full details see this topic.

Translate OEM (DOS) Characters to ANSI (Windows)

If your source text was prepared using an OEM character set, Concordance can automatically convert what it reads to the Windows (ANSI) character set as the source text is read. Try this option if characters such as accented characters and special symbols in your source text are being changed into unrelated characters in the concordance. You will still need to add the characters to your alphabet for them to be included in the concordance. Since translation of OEM to ANSI characters takes place before the removal of all characters not declared in your alphabet, you should add the translated, not the untranslated, characters to your alphabet.

If this still doesn't give the results you want, your source text may contain some characters that bear no direct relation to the Windows characters you want, perhaps because the file was prepared on a different computer system. In that case you need to edit the source text!  And if translation gets confusing, give up and edit the source text instead.

Bear in mind, too, that translating OEM characters to ANSI characters is done for the concordance, but your source text is not altered - Concordance never alters your source text unless you edit it yourself. So viewing the text in the File Viewer and File Editor will show untranslated characters.  If you want a permanently translated version of your source text, use the stand-alone tool OEM to ANSI File Converter

Definitions:

An OEM character set is one specified by the code page currently in force on the computer.

The Windows operating system in the US and most of Europe uses the ANSI character set.  DOS uses the ASCII character set.

Both ASCII and ANSI characters can be entered at the keyboard even if they do not have a dedicated key. To enter an ASCII character, hold down the Alt key and type the character's three-digit code on the numeric keypad (e.g. Alt+155).  To enter an ANSI character, use the same method but add a zero in front of the three-digit code (e.g. Alt+0248).

Early versions of DOS were limited to a single version of the (extended) ASCII character set. With later versions of DOS, Microsoft and IBM introduced the concept of the code page. US ASCII, for example, is codepage 850. Code pages offer some support for different languages by assigning special characters to certain numeric values and keys.

If you need to know more, see the very detailed reference information on character sets



See also:
Preparing text
  Saving and restoring settings
Character table
Language and Font Control

Other related topics