Contents
- Index
- Previous
- Next
Input Text File Format
Concordance expects your input text to be in plain text files. They should be divided into lines of human-readable length.
Line lengths
The program imposes no limit on the length of a line, but in practice it works much better if line lengths are kept within limits convenient for humans to read - around 60 to 100 characters.
Many word processors treat each paragraph as a single long line so that they can wrap text to your margins. With a program such as Word you can overcome this by saving your file as "Text Only with Line Breaks". (In Word 2002, an option to save a file as text with line breaks does not appear when you choose 'Save as...'' Instead, you can choose to save as Plain Text; an additional dialog will then appear which allows you to select an option to 'insert line breaks'.)
Some other common files, notably SGML files, mark line endings with their own notation instead of using a carriage return (CR) and/or line feed (LF) as used in plain text files. (For more detail on line endings, see below.) Such files, which lack CR/LF line endings, consist of a single 'line' as far as most programs are concerned, including this one.
Although this program should still produce a correct concordance from a file with long lines, it will take longer than usual, and there will be other undesirable features.
For example, contexts based on the actual line in which a word occurs are pretty useless if the whole input is on a single line: each word will have the entire text as its context, and the concordance will consequently be n times as long as the original, where n is the number of words in the original. This is to be avoided at all costs! Another reason is that Concordance uses line numbers as the default reference system, and it isn't helpful if all words are referenced to the same line.
If you make a concordance and find everything is referenced to line 1, you should check your input file to see whether it has CR+LFs.
The Context View in Concordance will not scroll to show contexts longer than 255 characters, although any further text is still there and will be correctly preserved when you save, print, or export the concordance.
File sizes
Most counting functions in the program are limited to numbers no bigger than 2,147,483,647. This is unlikely to cause a problem.
The technical details below can be skipped unless you encounter problems - for example, if you find that everything in your concordance is referenced to line 1.
Line endings
The file should have line endings according to either the Windows/DOS convention or the Macintosh convention. Both of these formats can be read directly by the program. Unix files, however, should be converted first.
Technically speaking, this means that line endings should be marked with a carriage-return and line-feed (CR+LF) pair, or with a carriage return alone, but not with a line feed alone. This table summarises the differences between systems:
System Line ending Conversion needed
Windows/DOS CR+LF no
Mac CR no
Unix LF yes
In the ASCII and ANSI character sets, CR is no. 13 (hexadecimal 0D) or Ctrl+M, and LF is no. 10 (hexadcimal 0A) or Ctrl+J.
You can check which line-end characters a file uses by loading it into the File Viewer, then choosing Hex Mode on the Options Menu. Carriage returns will be displayed as 0D and line feeds as 0A.
Files produced on Unix systems can easily be converted to Windows files. A conversion program is provided on the Tools Menu. Some file transfer (FTP) programs can do the conversion automatically when fetching a file from a Unix to a DOS/Windows system, and some text editors can also do the conversion.
If you choose multiple input files to make your concordance from, Concordance automatically concatenates them into a single new file before making the concordance. If you prefer to do this yourself, use any text editor, including the Multiple Document Editor which comes with Concordance. The command-prompt (DOS) Copy command can also concatenate files.
Related topics