BKAAQSUM.RVW   20020825

"Analysing for Authorship", Jill M. Farringdon, 1996, 0-7083-1324-8
%A   Jill M. Farringdon
%C   6 Gwennyth Street, Cardiff, Wales  CF2 4YD
%D   1996
%G   0-7083-1324-8
%I   University of Wales Press
%O   s.charles@press.wales.ac.uk
%P   324 p.
%T   "Analysing for Authorship: A Guide to the Cusum Technique"

Literary critics are quite used to talking about how an author like
Henry James would write enormously long sentences, sentences that
would, in more modern writings, be split into smaller, more digestible
chunks, but which were, in the days when it was considered acceptable
for someone like Marcel Proust to write an entire book that was one
long sentence, the norm that was to be emulated and adopted.  Others
wrote differently.  Hemingway, for example.  Short sentences. 
Sentence fragments.  So critics are quite used to making decisions
about authorship based upon numeric metrics.

Cusum (or QSUM, the two terms seem to be used interchangeably in the
book) is such a technique.  Instead of looking at meanings or
characteristic turns of phrase, the method looks at combinations of
statistical patterns in writing, patterns that the writer is probably
unaware of using.

Part one is an introduction and history.  Chapter one is a defence and
a rough idea of the process, which would be stronger if we were
presented with research indicating the likelihood of two separate
authors having homogeneous or indistinguishable patterns.  There is
also a history of statistical stylometry studies.  Details of the
technique are provided in chapter two, somewhat weakened by errors in
the arithmetic of the examples.  (Typographical errors are rife, such
as a reference to chapter two which actually refers to chapter three.) 
The bases of comparison are generally sentence length in proportion to
the number of short words and words starting with vowels.  This may
sound strange, but an analysis of general word use in English
indicates that cusum is based on syntactic structures, rather than
content.  As an example, chapter three looks at "the Back Road,"
suspected to be by D. H. Lawrence, in comparison with other works
known to be by Lawrence.  The reasons for the setup chosen for this
comparison are not always clear.

Part two examines a range of uses for cusum.  Chapter four considers
the statistical fingerprinting of authors even over a change of
literary "voice," and also notes that an editor's style can be
identified.  This is extended, in chapter five, to the ability to
identify a translator.  Amazingly, consistent patterns survive from
childhood into adult authors, as is shown with Helen Keller's writings
in chapter six.  Chapter seven discusses the applications of cusum to
a variety of writing forms, and notes that not even the use of dialect
and invented languages can hide an author's signature.

Part three looks into forensic applications.  Chapter eight lists
considerations for reports to be used in court.  As in the consistency
over time with children, chapter nine demonstrates that speakers and
writers of English as a second language are remarkably consistent over
time, and does some analysis of the identity of confessions.  Chapter
ten answers criticisms of the method.  It raises good points, but has
a rather confused structure.  One issue raised with the cusum method
is that it provides a chart to be interpreted rather than a single
measure: the text notes that statistical measures are available, but
that the graphics were felt to be more acceptable to users.

The book finishes off with an explanation of the method from the
inventor, A. Q. Morton.

Cusum is a technique that deserves further study.  Despite its flaws,
the book provides valuable information.

copyright Robert M. Slade, 2002   BKAAQSUM.RVW   20020825