Software page

Maxent grammars for the metrics of Shakespeare and Milton

Bruce Hayes (UCLA), Colin Wilson (Johns Hopkins University) and Anne Shisko (UCLA)

This is the software page for the web site accompanying article "Maxent grammars for the metrics of Shakespeare and Milton"


We used quite a bit of home-brewed software to do our project.  We are happy to share it with you but since it's not industrial software you'll have to be quite patient to get it to work. (We know the software works -- but on our machines and with the commands that we characteristically give it.)  If you are serious about doing metrics yourself with this software and run into problems please direct your queries to Bruce Hayes at bhayes@humnet.ucla.edu.

To understand the software you need to understand the formalisms we used to express lines of verse and metrical constraints.  You can read about them in this document.

There are four programs.


I. Software for prosodic annotation I:  SyllabifyLines.exe

The input file is simply a sequence of lines of verse, each on its own line in a plain text file.

The output file is a tab-delimited text file, best read with a spreadsheet program, that for each line gives the sequence of syllables together with their stresses (word level) and a preliminary guess about phonological phrasing.  Here is an example of an output file.

The key idea of this program is to break a word into syllables and assign it a stress pattern just once; then the program will automatically assigns this syllabification and stress to future occurrences.  It also guesses the phrasing on the basis of word breaks, hyphens, and other punctuation.

Windows software, written by Bruce Hayes in Visual Basic 6.  All that you need is in the zipped folder.  Unzip it and click on Syllabify.exe.

Download.

The file Dictionary.txt, sitting inside the zipped folder, has all the words we used for Shakespeare and for Milton.


II. Software for prosodic annotation II:   AdjustScansions.exe

This program is an editor for files in the format created by SyllabifyLines.exe.

Once you've used Syllabify Lines to get a basic draft of prosodic annotation, you need to modify it in three ways.  First, there are adjustments of syllable count.  For instance, "many a" is sometimes counted by poets as two syllables, sometimes as three. Experienced scanners of poetry can tell this pretty much right away. The program lets you merge the syllables "-y" and "a" into one with a single click, preserving all other information intact.  

The other adjustments are of stress and phrasing, following the rules of English.  You need to adjust the "rough draft" patterns assigned by SyllabifyLines.exe.  For this purpose you can use this resource

Windows software, written by Bruce Hayes in Visual Basic 6.  All that you need is in the zipped folder.

Download

To run Adjust Scansions, take a file created by Syllabify Lines and put it in the folder for Adjust Scansions.  Look in the folder and find the file RememberEarlierRun.txt, and enter your file name.  Then click on AdjustScansions.exe.


III. Software for symbol conversion:  ConvertStressesAndJuncturesToSingleSymbols.exe

When you are done with AdjustScansions you will have a simple tab-delimited spreadsheet, with lines for syllables, stress, and juncture.  Examples (our own codings) are given here. These files have to be recompiled into coded strings, one per line, that use single symbols to express all information about a syllable (scansion, stress, phrasing).  The format of such files (example) is discussed later on. The purpose of ConvertStressesAndJuncturesToSingleSymbols.exe is to create files in this format.

Windows software, written by Bruce Hayes in Visual Basic 6.  All you need is in the zipped folder.

Download

Put the output of AdjustScansions.exe into the input folder for ConvertStressesAndJuncturesToSingleSymbols.  Then click on ConvertStressesAndJuncturesToSingleSymbols.exe and specify your input file name. The program will create two output files, called Training.txt and Testing.txt.  These get sent on to the next program ...


IV. Software for constraint weighting and grammar application:  Maxent.jar

Maxent.jar inputs

(1) a metrical data file (like this one), which will normally be created by ConvertStressesAndJuncturesToSingleSymbols above;
(2) A grammar file (either borrowed, like this one, or painstakingly constructed by the analyst; see below).  

It digests the data file, selects the best constraints from the grammar file, and finds the best-fit maxent weights for those constraints, thus forming a complete grammar.  It then outputs the grammar (grammar.txt; example here) and also a tableau (tableaux.txt, example here) covering all the lines of a file of user-provided testing data (which can be same as the training data).

This is Java software (multi-platform), written by Colin Wilson. The software includes special routines programmed by scientists in other disciplines and downloaded from internet sources. You must respect the open-source licenses of these routines. We are releasing this software for the purpose of replicating and extending our work in metrics. If you are interested in using the software for other purposes (e.g., phonotactic learning), please contact Colin Wilson (colin@cogsci.jhu.edu).

Download

A bit of advice on running Java programs in Windows.  First, be sure you have an up-to-date version of the Java Runtime Environment on your computer; to obtain this, Google the phrase "Java Runtime Environment".  Also, be sure your Path command points to the program (so your operating system can find it).  You can learn how to do this by Googling the phrase "Java change path".

To control the behavior of Maxent.jar, you need to alter the file Parameters.params, which is included.  In the version that comes with the program, the crucial fields are shown below. They are filled in with the "factory settings", which you can change if you like.
-features FeaturesMetrics.txt Feature system for the input file.  This file is included in the program.
-corpus TrainingDataShakespeareBruce.txt The metrical data on the basis of which the constraints are weighted.  Sample files included with program
-sigma2 1.00E+10 Making this higher discourages high weights.
 -maxGramSize 3 The largest number of syllables that a constraint can consider.
 -minWordLength 11 10  (iambic pentameter) plus one (for why there is an extra position, see below)
 -maxWordLength 12 11 (iambic pentameter with extrametrical syllable) plus one
-inviolable InviolableConstraintsMetrics.txt Sample file included.  These constraints do things like force every S to be followed by a W.
-train
-selectFromList GrammarUG87Constraints.txt   The system of constraints.  Sample file included
test TestingDataShakespeareBruce.txt   Lines from which to make the output tableau.  Can be same as training data.

The training data and testing data files can be made by using the software described above.  The system of constraints can be the one that comes with the program (GrammarUG87Constraints.txt), or you can change or replace it.  If you want to do the latter, first read the document ExplanationOfFeaturesAndFormalism.pdf.

You can start Maxent.jar in Windows by clicking on one of the two RunMe.bat files.  One of them reports current progress to the screen, the other reports progress to the ProgramTrace.txt file.  Before you use these files, you must edit them.  Right now, they assume that everything is sitting in a particular folder on my computer; you need to change every instance of this folder to the one that is correct on your computer.

The RunMe.bat files also have a line requesting more memory:  -Xmx1600m.  At the moment 1600 megabytes seems to be the maximum; perhaps with time this number will increase if Microsoft is able to improve the Windows operating system. Memory really helps with this program.

In Mac, the equivalent of a .bat file is a .sh file.  Here is the .sh file that Colin Wilson uses to run the program; you may have to augment it to access all those auxiliary .jar files; use the Windows .bat files as your model

Maxent.jar will output two files:  grammar.txt (constraints and weights) and tableau.txt (assessment of all forms in the testing data file).  If you like, you can port these files to their corresponding "Paste Together" folders (they sitting inside the main program folder) and run the Windows programs there; these add in annotations from a template file you create, linking up the program output with user-defined labels, thus making the grammar and tableau easier to read and interpret.  


Back to main page for  "Maxent grammars for the metrics of Shakespeare and Milton"

Last modified July 11, 2012