Collation Software

These are the instructions for installing and running the collation software, also known as the collation script.  A number of LatinStudy and GreekStudy coordinators use the collation software to assemble side-by-side listings of group members' translations or exercise answers.  If you are not planning on coordinating a group, you need not read further.

The rest of this page is organized as follows.

See also the collation FAQ for additional troubleshooting information.

Downloads

First, download one of the following two zipped files containing the collation software:

If you are interested in upgrading an older version of the collation software (that is, something is broken and you want to fix it), consult the change log for more information.

Now download the following sample header.txt file.  This file contains the standard, boilerplate text that is copied to the beginning of the collation output file.  Modify it for your own use.  Note that it is not necessary to credit the author of the collation software.  Please.

Then download one of the following files and rename it to names.txt.  A names.txt file is the configuration file for the collation software.

Installing Perl on Windows

The Perl program is required by the collation software.  Mac OS X and most Linux or Unix systems already have Perl installed.  On Windows, however, you need to download a copy of Strawberry Perl for Windows and install it.

Once at the Strawberry Perl web site, download one of the Perl installation files. The particular version doesn't matter. After you double click the installation file, the installer will put Perl in C:\strawberry, taking around 120MB of disk space. You will not need to reboot your PC.

The very first time you double-click on a perl script (a file with a .pl file type), Windows may ask you for the program with which to open that file. Browse to C:\strawberry\perl\bin\perl.exe and select that program. Check the box indicating that all files with a .pl extension should be opened by perl.exe.

Types of Collations

Before you can configure the collation software, you must know what type of collation you wish to produce.

A Wheelock-style collation assumes a section name, an exercise number, and initials starting each sentence.  In addition to the example below, there is a more detailed description suitable for group members doing the assignments.  If you're using a textbook other than Wheelock's Latin, you can tailor the section names for your own use.

TR 1 GJC Gaul as a whole is divided into three parts.
GM 3 GJC These actions having been taken, the army approached the city.

A translation-style collation assumes that a sentence to be collated begins with initials, optionally followed by a period or colon, and a space.  For example:

GJC   Gaul as a whole is divided into three parts.
GJC.  Gaul as a whole is divided into three parts.
GJC:  Gaul as a whole is divided into three parts.

Configuring the Collation Software

The collation software's configuration commands are stored in a file called names.txt.  Each line starts with a command name, followed by a colon, followed by some text.  Lines starting with a # character are ignored.

Caveat Collator:  the collation software assumes all its input files, including header.txt, names.txt, and any email, are in plain text.  If you use Word to edit any files used by the collation software, make sure you save the result as plain text.

A “person” command is a set of initials followed by the person's name and email address.  This information is printed at the top of the collation file, right after the contents of the header.txt file file and just before the collated text.  This information is also used to recognize a contributor's initials when the collation software is operating in translation mode.

person:GJC  Julius Caesar <imperator@roma.org>

A “section” command specifies a word that starts a new line, and signals the start of some text to be collated.  If you are using the collation script for an introductory Wheelock group, you need not do anything, since the keywords SA, PR, TR and GM are built-in.  A section keyword may be up to five alphanumeric characters long.  No lowercase letters may be used.

section:RDA

An “end” command is tells the collation software to quit gathering text for collation when a line starts with a particular set of characters.  Examples include the word END, the hyphens starting internal mail sections, and the underscore line in Juno originated email.  When the collation software doesn't correctly detect the end of an assignment, it may append signatures and other junk to the last portion of legitimate text.  This source of unhappiness can be fixed quickly by editing the offending file and putting END at the start of the appropriate line.

end:--

The “wheelock:always” command forces the collation software to assume that only the Wheelock mode formatting is in effect. The collation software will ordinarily try to figure out which formatting convention is being used.

The “unicode” command indicates you are collating Unicode text, as for example for a GreekStudy group.  If you omit this keyword, the collation will be garbled.

Running the Collation Software

Windows

Once you have installed Perl and downloaded the collate.pl file as well as the sample header.txt and names.txt files, create a folder called, say, latin.  You may choose a different name if you desire and it doesn't matter where you locate the folder.  Put the three files into the latin folder:

Create a second folder called lesson within the latin folder.  You don't have a choice about this new folder's name.  Put people's email into lesson by saving email into that folder from your mail reader.  It is a good practice to name the files with people's initials.  When you want to run a collation, double-click on the file collate.pl.  It will configure itself from names.txt, use the contents of header.txt as a preamble, and extract sentences from the email files stored in lesson.  If everything is successful, the collation software will produce a file output.txt containing the collation.  Any errors will be recorded in a file called errors.txt

Mac OS X

After downloading the collate application, you may put it anywhere convenient.

Create a folder named lesson and put in it your names.txt and header.txt files. Add to your lesson folder the emails with translations (these will typically have .eml extensions), as well as the file containing the original text of the assignment. Then use your mouse to drag the lesson folder and drop it on top of the collate application icon. After you release the mouse button the collate application will run. The collation will be written into the file output.txt on your Desktop. Any errors will be recorded in the file errors.txt on your Desktop.

Unix or Linux

The following discussion is focused on MacOS X and its Terminal application (aka shell window), but is generally applicable to any Unix system.

Perl is already installed on MacOS X.  It can only be used via the command shell which is accessed using the Terminal program.

Create a folder called Collation.  You may use a different name if you wish.  Copy the collate.pl file into this directory.  Give the shell command chmod +x collate.pl to turn collate.pl into an executable file.  Within the Collation folder, create a second folder called lesson.  Unlike the Windows case, this second folder can actually have any name you desire. 

You now have two choices where to put your names.txt and header.txt files.  If you are collating for only one group, you can put them into the Collation folder.  If you are collating for multiple groups, you may put the different names.txt and header.txt files into the appropriate subfolder.  That would be the lesson in our example.

Collect email into the lesson folder by copying from your mail reader into individual files within lesson.  It is a good practice to name the files with people's initials.

To run the collation software over the files in lesson, connect to Collation and type the shell command collate.pl lesson. The collation software will configure itself from names.txt, use the contents of header.txt as a boilerplate preamble, and extract sentences from the email files stored in lesson. If everything is successful, the collate software will produce a file called output.txt containing the collation.

Correcting Errors

If there are any errors, the collation software will write the error messages to a file called errors.txt or to the screen when using a Unix shell.  Examine the errors to determine what the software is complaining about, then go and fix the problem in the offending piece of email.  Rerun the collation software again.  When there are no longer any errors, the errors.txt file or screen messages will disappear.

There are a few errors that the software knows how to automatically correct.  It will still complain about those problems in the errors.txt file.

You should also examine output.txt for collation problems that the software didn't catch.  Again, the problems should be fixed in the offending pieces of email.  Repeat the cycle of running the collation software, examining output.txt, and fixing email until you're happy with the results.

Including the translation assignment

If you wish to include the text that is being translated, you should start each line of that text with the “initials” of two dots, that is, “..”  The collation software understands this convention and will place such lines at the top of a collation section. You do not need to add anything to your names.txt file.

Similarly, a file whose lines begin with numbers such as 6.77 or 2.18.3 is assumed to contain text that is being translated, which will be placed at the head of each section.

Changing the order of the collation sections

The collation software normally sorts a collation alphabetically by the section name (in Wheelock mode) and then by initials.  This is good enough for most purposes, however you may want to change that default ordering.

For section names (such as SA, PR, TR, and GM) or initials (such as GJC, MC, or PM) you can assign a sorting key, a number between 0 and 9.  If you don't assign a sorting key, the default value is 5.

A couple of examples should explain the idea.  Let us say that you wish the section names mentioned above to be sorted in the order PR, SA, TR, and GM, instead of the alphabetic order GM, PR, SA, and TR.  You would add the following commands in the names.txt file:

sort:PR  1
sort:SA  2
sort:TR  3
sort:GM  4

Now let us imagine you have three people, GJC, MC, and PM, whose contributions you wish to segregate at the end of each exercise.  Recall that the default sorting key value is 5.  If we give those three people a sorting key of 6, then their contributions will appear in a group after all the others, but alphabetically within that group.  The commands in the names.txt file would be:

sort:GJC 6
sort:MC  6
sort:PM  6