Contents
- Index
- Previous
- Next
Lemmatisation
What is it?
Lemmatising means grouping related words together under a single headword. Concordance includes a Lemmatiser which allows you to define groups of related words and then apply your groupings to words displayed in the Wordlist.
For example, you could choose to gather the words am, was, are, is, were, and been together under the word be. To use linguistic terminology, the variants taken together form the lemma of the lexeme be.
You can choose to lemmatise any groups of words which interest you, not just ones which are linguistically or grammatically related. For example, if you are investigating some aspect of democracy, you could choose to gather vote, population, election, etc., under the word democracy. You can also use it to group alternative spellings of the same word, or plural forms with singular, or anything else you like.
Concordance will of course allow you to move words around in the wordlist 'by hand', by drag and drop or cut and paste. But using the Lemmatiser makes it much quicker to achieve complex re-arrangements of words.
Choosing your words to group
You define the words you want to gather together in the Lemmatiser. Open the Lemmatiser by choosing it on the Headwords menu or by pressing Shift+Control+L.
Each entry in the Lemmatiser should normally be a single word. That is because the Headword list, which is to be lemmatised, normally contains single words; consequently, if an entry in the lemmatiser has more than one word, it will never match any entry in the Headword list. The exception to this is when you make a Fast Concordance using phrases: you can then use phrases in the lemmatiser too.
The Lemmatiser works like an outliner or tree view. The controls let you add words and indent or outdent them. A word at the first level of indentation, with a book icon beside it, is one under which you want to gather other words, which you add at the second level of indentation. Second-level words have a plus-sign icon.
You can also indent words to the third level or more. These have a minus sign and take no part in the lemmatisation. This allows you to disable sections of your lemmatisation scheme without actually deleting the words.
The Sort button sorts words in the Lemmatiser (not the Wordlist).
When you have defined and arranged some words, choose Save or Save As on the Lemmatiser's File menu to save your lemma file.
Making it happen
Whenever a concordance is displayed, you can lemmatise its Wordlist by pressing the Lemmatise button.
Lemmatising the Wordlist is just a special case of sorting it. You are telling the program that some words are not to be treated alphabetically but placed where you wish. Consequently, switching to any other sort order in the Wordlist undoes the lemmatisation.
As lemmatising a list is slower than ordinary sorting, it is best to apply any basic sorts which you wish first, and then to lemmatise your Wordlist last. Speed is more influenced by the number of headwords (i.e. the size of the concordance) than by the number of words in the Lemmatiser.
Other features
You can click on any word in the Lemmatiser to make the Wordlist scroll to show it, if it is there.
There is an incremental search in the Lemmatiser. Start typing any word to make the Lemmatiser scroll to find it.
Avoiding duplicates
A word must not appear again as a child of itself.
A word may appear again if it is not a child of itself. But this can lead to unwanted results. Consider the following. In the Lemmatiser you might quite reasonably define
lay
--- + laid
lie
--- + lay
since 'laid' is the past tense of the verb to lay, and 'lay' is the past tense of the verb to lie. This example actually appears in the default lemma file. Having 'lay' appear twice causes no problems in the Lemmatiser, since it is not a child of itself. But it can appear only once in the Wordlist. How is that occurrence of lay in the Wordlist to be handled - is it to have laid placed below it, or is it to be placed below lie? You can't have both. What Concordance will actually do, if the words are in alphabetical order, is first to move laid after lay, then move lay after lie, leaving laid orphaned in the position where lay was. This is probably undesirable.
Since human language is not a wholly rational construct, there is no general remedy for this issue except vigilance. You might wish to edit your text to distinguish between lay1 and lay2.
Note too that if you want to lemmatise am, was, and were under be, but be is not present in the text, then am, was, and were will not be moved.
Manipulating the tree
You can drag and drop a single word or a parent with all its children. You can't drop a parent onto one of its own children.
You can cut, copy, and paste words using the standard Windows shortcut keys: Control+X, Control+C, Control+V.
If you hold the cursor very near the top or bottom of the lemma list, it will scroll automatically. This is particularly useful when dragging words to a place far away in the list.
If you drag a word to a parent which is not expanded and wait a moment, the parent will automatically expand to make its children visible.
You can cancel a drag operation by moving the cursor out of the lemma list before releasing the mouse button.
As you drag a word, the status line at the foot of the Lemmatiser changes to show where the word would go if you dropped it. You can get different results by dropping on another word, or on the indentation lines beside a word.
Managing lemma files
When you first open the Lemmatiser, it opens a lemma file called Default.lemma if that file exists in the same folder as the program itself. (It should do, because it was placed there by the installation.) Default.lemma is a short sample file. You can create any number of your own lemma files using the Lemmatiser, and they can be called whatever you like. If you want a different lemma file to be loaded automatically when you open the Lemmatiser, use Choose Lemma File on the Options menu and also tick the Auto-load Lemma File option.
On a shared network installation, your system administrator should give you your own copy of Default.lemma.
Sharing Lemma lists
If you prepare a lemma list for a special purpose, please consider sharing it with other users of Concordance. Send it to me and I will make it available on the Concordance website.
Advanced use
As well as dragging and dropping words to re-arrange them within the Lemmatiser, you can drag words from the Wordlist or the Scratchpad and drop them into the Lemmatiser.
You can prepare a lemma list by making a concordance to a text, saving the headword list (without contexts or frequencies) as a text file, opening that file with the Lemmatiser, and adding indentation as you wish. To save a headword list without contexts or frequencies, just turn off the display of both before saving (on the Headwords menu, un-tick 'Show Frequencies', and on the Contexts menu, choose 'None').
A lemma file can also be edited with the supplied Multiple Document Editor or any editor able to handle plain text. The file format is simply one word on each line, with a tab to indicate indentation. You could prepare a lemma file with another program such as Word if you wanted (save it as plain text).
If you wanted, you could prepare a file with the Pick or Stop List Manager and then use it in the Lemmatiser, or vice versa. But the Pick and Stop List Managers ignore indentation. Hence a file prepared with them and then used in the Lemmatiser will need indentation added, and a file made with the Lemmatiser and then opened in the Pick or Stop List Manager and saved there will have its indentation removed.
Finally, there is nothing to stop you opening an ordinary text file with the Lemmatiser if you wish. It would then have multiple words on each line instead of one, and you could use the Lemmatiser as an outliner or 'ideas processor'. (The resulting file wouldn't be much good for lemmatising headwords, though.)