How These Files Were Made
The source for the poems consisted of plain text (.txt) files downloaded from sites such as the Internet Archive. The text files were originally created from scanned material. Every line ended in a paragraph break, and there was absolutely no formatting at all. There were also a great many artefacts from the scanning process. For example, most of the em-dashes in the original were lost altogether; there were often spaces between a word and a semicolon, etc.
The challenge was to get from this:
The process went roughly like this:
- Import the text into Microsoft Word.
- Using Word’s Find-and-Replace tools, fix scanning anomalies.
- Use Word’s built-in VisualBasic for Applications to apply paragraph styles to different paragraph types (poem title, first line of poem, first line of verse, etc.).
- Import the word document into FrameMaker.
- Design page layouts, paragraph formats, etc. for titles, body pages, Contents, Index, and so on.
- Convert the unstructured text to structured text (XML-capable) using a FrameMaker conversion table. At this point the text is much more manageable and formatting is a formality.
- Polish the output to cover all formatting scenarios.
- Assemble the Title, ToC, body, and Index for each volume.
- Create the PDF from FrameMaker, check, and fix any problems.
- Proofread, proofread, proofread!
If that sounds like a lot of work, it was! But it was also a labour of love.


Leave a Comment
Comments (0)