»OF WHAT IS PAST, OR PASSING, OR TO COME« ELECTRONIC ANALYSIS OF LITERARY TEXTS

Abstract

This essay looks at the way computer-assisted studies of literature have been received in the past. It analyses some of the fundamental assumptions about text and the way critics perceive text and textuality, and it discusses the way in which electronic procedures can be used for the study of literature. The example of ›author gender‹ is presented as a challenging new study that may bring mainstream and specialized scholarship together in literary studies. Finally, three possible developments in computer-assisted literary studies are outlined.

Every reader dreams of a reliable memory and unlimited time: »The Biblical three-score years and ten no longer suffice to read more than a selection of the great writers in what can be called the Western tradition, let alone in all the world's traditions. Who reads must choose, since there is literally not enough time to read everything, even if one does nothing but read.«[1] But in an ideal world the canon does not matter, and there is enough time to read – and even to re-read – all the texts deemed relevant from a given perspective. In addition, all these texts can subsequently be interlinked on various levels, and any intertextual connection between a virtually unlimited number of texts can be established. These texts are constantly present in the reader's mind and they form a vast archive accessible any time, with no limitations.[2] In this ideal world the reader remembers perfectly well every subplot, every character, even every single phrase of every text ever read.

Literary criticism, acutely aware of the problems of both canon and memory, therefore operates selectively. The limitation and the problem of exclusion is accepted as an integral aspect of traditional approaches to texts, and for this reason most literary critics deal with representative textual phenomena when they talk about surface features of a text. Human memory is extended and externalised through written notes and references, and the limitations of the human mind both in respect to time and capacity have become an accepted part of the conceptual and methodological framework of literary studies.

The frequent textual echoes that link chapters in books, the verbal subtleties of language on the stage, and the intricate sound patters employed in poetry constitute a small fraction only of literary phenomena that can be observed ›on the surface‹ of texts. The myriads of details that come with every reading of a text are filtered by the reader who has to prioritise – what to keep in mind, what to memorize, what to discard. Very often surface phenomena are dealt with in a cursory fashion, as scholars shy away from the tedious task of systematically collecting, analysing and interpreting all relevant passages. Conveniently, a substantial number of these features are deemed dispensable material that can be used if necessary, but that do not in their totality contribute to the understanding of a text. John Burrows criticises this type of eclectic analysis: »It is a truth not generally acknowledged that, in most discussions of works of English fiction, we proceed as if a third, two-fifths, a half of our material were not really there.«[3]

But literary works do depend on the totality of text, on every single textual item found in the work: »We in literature consider the text to be the result of the artistic intention of the author, not as a linguistic document. The text studied for its literary value contains nothing that can be ascribed to chance. The probability of finding a given word in a certain place in a text is thus one if it is there, and zero if it isn't.«[4] Stylistics, both traditional and computational, agrees, and it is in the field of textual exegesis based on textual evidence that the electronic analysis of literary texts is most dominant and successful.

When monks in the middle ages produced the first concordances of the Bible, they kept tab manually, compiling endless word lists and indices that would allow the reader to locate passages where human memory proved inadequate. Computer-assisted analysis follows the same pattern and employs similar strategies. The difference, however, can be seen in the flexibility of sampling and testing that comes with electronic procedures. A search for specific textual phenomena can be refined, even changed, should it become necessary. Texts can be analysed in their entirety, and sufficient time and man power for any given analysis is no longer problematic. While medieval monks spent years in isolation compiling word lists, the modern scholar can modify search patterns within seconds, or he or she can think of expanding the corpus by adding more texts. While the methodology of this type of analysis has not changed over the centuries, the introduction of computer-assisted work has led to a fundamental change in the way data is produced.

Computer-assisted studies, sometimes referred to as ›computer-based‹ depending on the amount of data processing involved, thus constitute a continuation of stylistic analysis that originates from textual exegesis. The procedures focus on a thorough combing of the text, and the tools and techniques required for this kind of work are basic, because

much can be achieved with judicious use of simple tools. The computer is best viewed as an aid to scholarship, a machine which can help with many repetitive tasks and which can assist with detailed investigations or help to provide an overall picture which would be impossible to obtain by other means. Many humanities electronic text projects which are more than simply putting material on the Web have been based, in one way or another, on word searching, frequency lists, and concordances. These have been used as a basis for further interpretation of textual material, for comparative work, for lexicography, for the preparation of scholarly editions, and for the analysis of different linguistic features.[5]

Close attention to surface features of a text provides the basis for the ensuing analysis, and with this focus on complete sets of data extracted from the text a number of theoretical issues need to be discussed. Sometimes this attention to the text in its entirety and with a particular emphasis on minute analysis of isolated stylistic features is described as a return to the theoretical position of New Criticism and its theoretical and methodological tenets. If this is indeed the case, then the computer-assisted analysis of texts does not in itself constitute a new ›method‹, but provides sophisticated tools only that work within an existing set-up. In addition, as New Criticism is widely regarded as a dated, if not inadequate approach to texts and their location in a literary or cultural context, the continuation of such methods proves difficult and invites criticism. »One might argue that the computer is simply amplifying the critic's powers of perception and recall in concert with conventional perspectives. This is true, and some applications of the concept can be viewed as a lateral extension of Formalism, New Criticism, Structuralism, and so forth.«[6]

In the light of recent developments in computer-assisted studies of literary texts it remains to be seen in how far this assessment is still adequate. It has become apparent, however, that most studies that use electronic means of text analysis are aware of the theoretical implications of their approach. It remains open to debate whether a truly naive, positivistic reading of a text in computer-assisted studies was ever published in reviewed journals such as Literary and Linguistic Computing or Computers and the Humanities. If literary critics were happy to use the computer for its own sake, then the fault lies not with the tool, but with the methodology.

In a widely noted assessment of the field, published 1991 in Literary and Linguistic Computing, Thomas Corns comments on the disappointing achievements of computer-assisted studies in mainstream literary studies. He writes that literary studies have split up into

increasing aggressive and intolerant theoretical camps, for the most part mutually suspicious and marked by sharply differentiated critical vocabularies, idioms, objectives and values, though there have been elements of hybridization. We advocates of computer applications do not figure significantly within that complex configuration. In so far as we are regarded, traditionalists still observe us with suspicion – we murder to dissect. Post-structuralists regard us as engaged in an inherently foolish enterprise, mistaking the modality of the text, absurdly unaware of the inadequacy of our categories, of all categories; feminists regard us as involved in the fetishizing of the machine, the toys for the boys critique; marxists disclose the political implications of the seemingly apolitical nature of our analysis.[7]

The debate, however, continues, and at nearly every conference on humanities computing the failure of computer-assisted studies to be perceived in mainstream literary criticism is commented on.[8] Given the high degree of critical awareness of their own methodological position in humanities computing it seems unlikely that theoretical or methodological criteria are responsible for the low rate of acceptance. If one follows Hans-Walter Gabler's reasoning, the problem remains as much with the community of book-trained scholars as with those well versed in using electronic procedures:

The established present use of the computer in the humanities is to enhance the properties and quality of the book. With the book electronically stored, book contents and book knowledge can be accessed fast and very flexibly [...]. [...] In the face of the forces of habit, the question arises how clearly the book-conditioned and book-trained humanities scholar and researcher is capable of discerning the unique otherness of the electronic medium and both explore and exploit its potential.[9]

Gabler's argument is convincing, but the ›otherness of the electronic medium‹ needs to be communicated to the world of mainstream academia. Especially the question of contextualization and a critical re-evaluation of the seemingly obsolete techniques of close reading in literary studies stimulate discussions that question methodology and fundamental notions of the relationship between author, text, and reader. In this respect most contributions to literary and cultural analysis that originate from computer-assisted studies are highly aware of the theoretical implications of their approach. As a result one finds that in nearly all cases the question of the status of the text and related problems of textuality are dealt with in great detail.

In the most useful studies, researchers have used the computer to find features of interest and then examined these instances individually, discarding those that are not relevant, and perhaps refining the search terms in order to find more instances. They have also situated their project within the broader sphere of criticism on their author or texts, and reflected critically on the methodology used to interpret the results, avoiding the ›black-box‹ tendency of some projects to produce tables of numbers without any serious assessment of what those numbers might mean.[10]

It is from this theoretical awareness that sophisticated studies take their analytical strength, because »as hardware has become tremendously powerful, most people have come to realize that the limitations of computer-assisted textual analysis are methodological rather than technological. At the moment we have all the computing power we could possibly need [...].«[11] The authors of the best computer-assisted studies maintain that the computer can be considered useful for the process of data collection only. In studies of literature computers »are no more able to ›decode‹ rich imaginative texts than human beings are. What they can be made to do, however, is expose textual features that lie outside the usual purview of human readers.«[12]

Here the computer is seen as a tool that facilitates certain repeated procedures, and this tool greatly enhances the scope of texts or the range of sampling that provides data for the ensuing analysis. But it is in the nature of a tool to be guided by human intuition and experience. A tool is designed and constructed specifically to enhance human work. In this context the computer as a tool is regarded as the extension of human abilities and skills, and in the nature of this extension lies its greatest potential and, at the same time, its fundamental limitation.

If tedious procedures that require repeated, identical processes that rely on precisely defined formal properties of text can be committed to the computer, then human resources, freed from the constraints of numbing work, can be used productively: the textual material compiled by the tool in a first step will, in a second step, be analyzed, contextualized and finally in a third step be interpreted by the human critic. Seen in this light the computer constitutes a tool perfectly suited for some types of literary analysis. Every approach that depends on access to a limited, but precisely defined textual features is greatly helped. If the criteria for sampling need to be re-defined, new search routines and sampling procedures can be implemented seamlessly based on the formalisms initially established, and in this the scope and breadth of computer-assisted textual analysis is unprecedented.

As Susan Hockey writes in her book on Electronic Texts in the Humanities. Principles and Practice, the computer

is best at finding features or patterns within a literary work and counting occurrences of those features. If the features which interest a scholar can be identified by computer programs, the computer will provide an overall picture which would be impossible to derive accurately by manual methods. It can also pinpoint specific features within a text or collection of texts and lead the researcher to further areas of enquiry. It is often best treated as an adjunct to other research methods.[13]

The success of this type of analysis depends on the way the text is perceived by the critic. If textual features that can be formalized form the basis of the analysis, then the precise and unambiguous definitions of the phenomena to be identified have to be provided. This precision in describing textual properties determines the quality of the analysis, and it is crucial that at this point decisions be made about what properties to include and what to exclude. It is from a thorough knowledge of the text that any assumption about some of its properties can be made, and at this initial stage of the analysis there exists no methodological difference between a critic who is planning to conduct a stylistic investigation into textual properties in a traditional way and a critic who is planning to use electronic procedures.

Ideally, even the process of sampling, i.e. of identifying, locating and extracting textual features is identical in both types of studies. The crucial difference, however, is the fact that any ›manual‹ sampling will take much more time than the same process conducted with electronic means. If the criteria according to which the textual phenomena are identified remain unchanged throughout the entire process, then the computer – with unerring accuracy, super-human speed and an unfailing memory – will by far outperform any manual approach to the same text. Given that the criteria for searches can be changed and that multiple searches can be conducted on the same material without any time constraints, then it becomes obvious why computer-assisted procedures are vastly superior to manual approaches. The central advantage can be seen in pattern-matching routines, i.e. the search for

Zeichenketten [strings], also nach beliebigen Kombinationen von Buchstaben, Zahlen oder Satzzeichen. Dabei können zumeist auch Platzhalter für beliebige Zeichen eingesetzt werden. Eine abstrakte Form dieser Verwendung von Platzhaltern ist der Einsatz von ›regulären Ausdrücken‹, womit Zeichenmuster beschrieben werden können. Einzelne Zeichenketten können durch die Verwendung von Booleschen Operatoren zu komplexen Abfragen kombiniert werden.[14]

The complexity of search procedures, the possibility of virtually endless variations of patterns that can be identified constitute a major advantage of computer-assisted studies.

These striking advantages, however, remain limited to a rather narrow area of stylistic study, and within this field the »discussion of the history of literary computing shows that only a limited number of textual phenomena can be analysed profitably in the context of a qualitative, computer-assisted analysis of style. These phenomena have to have some surface features that can be identified by electronic means.«[15]

No hermeneutic procedures that change the reader's perception of the text find their way into an electronic analysis. Every modification of a search, every subtle re-arrangement of sampling procedures needs to be fed into the system at a stage when the data suggests a modification. This is typically the case after a complete scan of a text or a set of texts has been performed, and while a reader of a text will in the process of reading re-adjust his or her criteria, computer-generated data will – each and every time – provide precisely what has been defined as the result of a search.

This, in itself, is a great advantage, but it requires a more stringent and formalized approach to a text than is commonly preferred in mainstream literary criticism. Even stylistics as the discipline most interested in textual properties usually associated with surface features does not always recognize the potential of a rigid, formalized approach. Much more so for the general field of humanities education and scholarship; this »will not take the use of digital technology seriously until one demonstrates how its tools improve the ways we explore and explain aesthetic works – until, that is, they expand our interpretational procedures«.[16]

Some of the most rewarding computer-assisted studies of electronic texts focus on the identification of specific textual features. These features are usually repeated strings of characters – letters, syllables, individual words, word combinations and phrases – and their repeated occurrence can be traced by electronic means. Patterns of distribution can be generated, presences and absences can be mapped, and the results of computer-assisted procedures generate a complete survey of all phenomena found in the text.

Two principles and methodical procedures are characteristic of this kind of analysis: a precise definition of the features to be analysed has to be produced prior to the analysis. This definition is by itself based on an examination of the text with a view to the scope of features found in the text, and in a next step stringent criteria need to be established for the identification of patterns. The precise definition of features and criteria for inclusion in or exclusion from the analysis is one of the central requirements. Exceptions and possible variant readings need to be defined, and in this procedure of a minute description a computer-assisted study by far exceeds the rigour of a traditional stylistic analysis. The human reader will decide according to a set of rules whether to include or exclude phenomena, and these rules are applied stringently across the entire text with a view to the aim of the analysis, and »as error-prone manual sampling becomes obsolete, textual analysis as well as the ensuing interpretation of a text as a whole can be based on a complete survey of all passages meeting predefined patterns or criteria«.[17] The computer needs to rely on a complete set of highly specific rules for the analysis. These rules will have to accommodate all possible findings that are of relevance to the analysis, and they will have to put in such a way as to identify rather more than less phenomena, because »you don't know what you are missing«, as Catherine Ball has it.[18]

The fundamental difference between computer-assisted studies of literature and those that rely on a human reader only are that the sets of findings are complete and accurate when compiled by the computer. While a human reader may arrive at the same result, every sampling is more error-prone when factors such as memory, attention, and the stringent application of pre-defined criteria are taken into consideration. It is indeed possible to compile complete sets of data from literary works by human readers – the medieval monks who manually produced the first concordances of the Bible are perfect examples of dedicated work that continued for months, uninterrupted.

The notion that minute details in a text, such as repeated stylistic devices or function words that form the bulk of every text, do indeed influence the reader and reflect on the author of the text, is one of the fundamental assumptions of stylistics. In these cases, electronic procedures are most usefully employed. Stylometric analysis of authorship in attribution studies[19] has shown that some textual characteristics can be analysed fruitfully, and one of the most important computer-based studies of literature, John Burrows' Computation into Criticism, employs similar techniques.[20]

It is in this area of text analysis that a new study challenges established views and promises to engage both computer-assisted work and mainstream literary criticism in a new debate. In the summer of 2003 an inconspicuous headline caught the attention of literary critics: »Computer program detects author gender.«[21] The somewhat more catchy subtitle, Simple algorithm suggests words and syntax bear sex and genre stamp explains to the non-specialist that certain textual properties can be identified by electronic means, and that these textual properties can be used to identify some characteristics of the author. Interestingly, sex and gender are taken as synonymous descriptive terms by the author of nature's ›scienceupdate‹.

The article on which these and similar news reports are based was published by Moshe Koppel et alii as »Automatically Categorizing Written Texts by Author Gender« in Literary and Linguistic Computing.[22]

Koppel uses automated text categorization techniques and, by focussing on a specific set of lexical and syntactic features, manages to infer the gender of the author with about 80% accuracy. His team of computer scientists used automated text classification, and by relying on relatively small numbers of content-independent textual features such as function words they could observe »a difference in male and female writing styles in modern English books and articles«.[23] For non-computing literary criticism the chapter »1.3 Gender« is most interesting. Here the strategies used in the analysis of English documents from the BNC are outlined:

The object of this paper is to explore the possibility of automatically classifying formal written texts according to author gender. This problem differs from the typical text categorization problem which focuses on categorization according to topic. It also differs from the typical stylometric problem which focuses on authorship attribution – individual authors are more likely to exhibit consistent habits of style than large classes of authors.[24]

The problems described here highlights why attempts at identifying male or female authorship by electronic means – and by focussing on de-contextualized text only – are difficult. And as there is little documented material to draw on, Koppel continues that »there has been scant evidence thus far that differences between male and female writing are pronounced enough that they could be parlayed into an algorithm for categorizing all unseen text as being authored by a male or by a female«.[25] In 1975 Robin Lakoff maintained that »›Women's language‹ shows up in all levels of the grammar of English. We find differences in the choice and frequency of lexical items; in the situations in which certain syntactic rules are performed; in intonational and other supersegmental patterns.«[26] Jennifer A. Simkins-Bullock and Beth G. Wildman are more reluctant to accept this view; in 1991 they state that there is an a-priori »lack of agreement about whether males and females use language differently«.[27] But precisely this evidence of a noticeable (or measurable) difference is produced by the procedures of sampling and filtering described in the paper, and if the findings can be corroborated by others then this paper will probably be considered a major contribution to humanities computing.

The implications of this analysis are far-reaching and of particular relevance to mainstream literary criticism. Here the problems of authorship, of writing, of sex and gender, and of the difference between author and narrator are central concerns. In his essay What is an author Michel Foucault maintains that »in a novel narrated in the first person, neither the first person pronoun, nor the present indicative refer exactly either to the writer or to the moment in which he writes, but rather to an alter ego whose distance from the author varies, often changing in the course of the work«.[28] How can this statement be aligned with Koppel's findings that something of the author, some historical/biographical/personal information, can be detected in the text no matter how much the author tries to disguise it? If it can be shown that not every aspect of the text is under the control of the author, then the question arises how the artistic autonomy of the author is to evaluated. And it would be most promising to see if this is detected by the reader.

Literary criticism maintains that everything that is in the text contributes to the overall impression of the text, that nothing is ›superfluous‹, that every textual feature in some way influences the reader. If the author's control over his or her text is limited in such a way as to reveal some important biographical facts about the author unintentionally, then some fundamental assumptions about control and textual features have to be questioned. If a text gives away the gender of its author, is it still possible for a female author to assume the persona of a male narrator (or vice versa) in a text? Can an author not get away from the tell-tale stylistic indicators that label him or her?[29] Does this not mean that the author has far less control over the text, how it is perceived by the outside world – be this man or machine – and does this not severely impinge on what is commonly perceived as a mark of competence, that an author can assume any identity without giving away his or her true self? What about some of the most interesting narrative procedures in literature – simulation and parody – is it not possible for an author to camouflage fundamentals about his/her language?[30] And, finally, are there no ›unmarked‹ texts, or would not it be possible to disguise the gender of an author? What about misclassified authors – why is Antonia S. Byatt's novel Possession the only text by a female author amongst the six misclassified fiction samples?

If what Koppel and his co-authors have found is true, then no author, no matter how much he or she tries, can portray in a convincing way another person in a fictional text. No male author is then in a position to convey the views of a female character, no female author can assume the perspective and voice of a male character convincingly, because the language of the text will give away the gender of the author speaking through the narrator.[31] And if this can be shown by means of an analysis of textual properties, then, surely, it must have an effect on the reader. One may not be aware of gendered language right away, and in most cases readers do know something about the author anyway, because a look at the cover of the book one is reading quickly establishes the identity of the author – or the persona as whom he or she would like to be perceived.

Koppel's contribution to automated text categorization techniques raises a number of questions about fictional texts that aim at the very basis of modern concepts of reader, text, and author. It remains to be seen if mainstream literary criticism perceives the potential of this study, and in how far some the implicationsof ›80% accuracy‹ will be dealt with by scholars not used to statistics.[32] Automatically Categorizing Written Texts by Author Gender is a paper that has the potential to once again engage the marginal discipline of computer-based literary studies on the one hand and mainstream scholarship on the other in a fruitful debate. It is telling, however, that the impulse for this engagement should come from computer science, from ›the other‹.

An evaluation of computer-assisted studies today of literature suggests a number of different developments that seem possible in the near future. It seems likely that with studies such as Koppel's on ›author gender‹ a controversial but fruitful debate between mainstream literary criticism, computer science, and computer-assisted literary criticism will evolve. Here a continuation of previous work will certainly contribute to a better understanding of what has already been achieved, and it is possible that through a re-evaluation of tested techniques the potential of computer-assisted work will become apparent to a wider audience.

Related to this is what David Robey sees as the interdisciplinary aspect of computer-related studies: »A decade ago we knew enough to relate common techniques to the various disciplines: we first suspected, then partly knew that humanities computing was concerned with a methodological common ground within which disciplinary boundaries did not apply.«[33] This view of the nature of humanities computing has to be extended in the present situation. Specialists from the different disciplines, and this does not alone apply to literary studies, are asked to utilize the potential of interdisciplinary work:

The emergence of this multidisciplinary digital library has served not to fragment the methodological common ground but to emphasize its centrality and extend its breadth. The future directions for humanities computing therefore involve systematic exploration of this common ground to ensure that developments are coherent, cohesive and responsible to its cultural inheritance. Humanities computing specialists thus have a vital role as interdisciplinary and interprofessional mediators. The old model of support services is no longer valid: research should he seen us a common enterprise between ›technologists‹ and ›scholars‹.[34]

And finally a different view of what can be done with text analysis tools and literary texts is presented by Geoffrey Rockwell and others whose view of text and textuality enables new possibilities in humanities computing, particularly in computer-assisted literary studies. Geoffrey Rockwell argues that tools for text analysis themselves produce new texts that are generated through search processes. The idea is that the analysis of texts is by no means limited to the scanning for surface features, but that the potential of computer applications in the humanities, and more precisely in literary/textual studies, lies in opening up new views of text. According to Rockwell the concept of textuality itself and what scholars can do with those ›new‹ texts needs to be reconsidered.[35]

In this humanities computing faces great challenges, but it promises to bring out in computer-assisted literary studies the potential of what is past, or passing, or to come.

Bibliography

Ball, Catherine N.: Automated Text Analysis. Cautionary Tales. In: Literary and Linguistic Computing 9 (1994), pp. 293-302.

Bloom, Harold: The Western Canon. The Books and Schools of the Ages. New York: Riverhead 1994.

Burrows, John F.: A Computation Into Criticism. A Study of Jane Austen's Novels and an Experiment in Method. Oxford: Oxford University Press 1987.

Butler, Judith: Gender Trouble. Feminism and the Subversion of Identity. London: Routledge 1990.

Corns, Thomas: Computers in the Humanities: Methods and Applications in the Study of English Literature. In: Literary and Linguistic Computing 6/2 (1991), pp. 127-130.

Forsyth, Richard S./David Holmes: Feature-Finding for Text Classification. In: Literary and Linguistic Computing 11/4 (1996), pp. 163-174.

Fortier, Paul: Babies, Bathwater and the Study of Literature. In: Computers and the Humanities 27 (1993), p. 375-385.

Foucault, Michel: What is an Author? In: David Lodge (Ed.): Modern Criticism and Theory. A Reader. London/New York: Longman 1988, pp. 197-210.

Gabler, Hans-Walter: There is Virtue in Virtuality. Future potentials of electronic humanities scholarship. In: ALLC/ACH 2002. New Directions in Humanities Computing. Conference Abstracts. Tübingen: Zentrum für Datenverarbeitung ZDV 2002, pp. 40-41.

Hockey, Susan: Electronic Texts in the Humanities. Principles and Practice. Oxford: Oxford University Press 2000.

Holmes, David I.: The Evolution of Stylometry in Humanities Scholarship. In: Literary and Linguistic Computing 13/3 (1998), pp. 111-117.

Jannidis, Fotis: Computerphilologie. In: Ansgar Nünning (Ed.): Metzler Lexikon Literatur- und Kulturtheorie. Stuttgart/Weimar: Metzler 1998, pp. 70-72.

Koppel, Moshe et al.: Automatically Categorizing Written Texts by Author Gender. In: Literary and Linguistic Computing 17/4 (2002), pp. 401-412.

Lakoff, Robin: Language and Woman's Place. New York/London: Harper Collins 1975.

McGann, Jerome J.: Radiant Textuality. Literature After the World Wide Web. New York: Palgrave 2001.

Ostriker, Alicia: The Thieves of Language. Women Poets and Revisionist Mythmaking. In: Elaine Showalter (Ed.): The New Feminist Criticism. Essays on Women, Literature, and Theory. New York: Pantheon 1985, pp. 314-338.

Preminger, Alex/Terry V. F. Brogan (Eds.): The New Princeton Encyclopedia of Poetry and Poetics. Princeton, NJ: Princeton University Press 1993.

Robey, David: Round Table on New Directions in Humanities Computing. In: ALLC/ACH 2002. New Directions in Humanities Computing. Conference Abstracts. Tübingen: ZDV 2002, pp. 106-109.

Rockwell, Geoffrey: What is Text Analysis, Really? In: Literary and Linguistic Computing 18/2 (2003), pp. 209-219.

Rommel, Thomas: »And trace it in this poem every line.« Methoden und Verfahren computerunterstützter Textanalyse am Beispiel von Lord Byrons Don Juan. (Tübinger Beiträge zur Anglistik; 15). Tübingen: Narr 1995.

Rommel, Thomas: The Internet Survey for English Studies. In: Doris Feldmann/Fritz-Wilhelm Neumann/Thomas Rommel (Eds.): Anglistik im Internet. Proceedings of the 1996 Erfurt Conference on Computing in the Humanities. Heidelberg: Carl Winter 1997, pp. 101-112.

Simkins-Bullock, Jennifer A./Wildman, Beth G.: An Investigation into the Relationship Between Gender and Language. In: Sex Roles, 24, 3/4 (1991), pp. 149-160.

Smith, J. B.: Computer Criticism. In: Roseanne G. Potter (Ed.): Literary Computing and Literary Criticism. Theoretical and Practical Essays on Theme and Rhetoric. Philadelphia: University of Pennsylvania Press 1989, p. 13-44.

Sutherland, Kathryn: Introduction. In: Kathryn Sutherland (Ed.): The Electronic Text. Investigations in Method and Theory. Oxford: Oxford University Press 1997, p. 1-18.

Thomas Rommel (Bremen)

Prof. Dr. Thomas Rommel
De Montfort University
International University Bremen
PO Box 750561
28725 Bremen
t.rommel@iu-bremen.de

(24. März 2004)
[1] Harold Bloom: The Western Canon. The Books and Schools of the Ages. New York: Riverhead 1994, p. 15.
[2] On ›archive‹ cf. Kathryn Sutherland: Introduction. In: Kathryn Sutherland (Ed.): The Electronic Text. Investigations in Method and Theory. Oxford: Oxford University Press 1997, p.1-18. Here p. 9.
[3] John F. Burrows: A Computation Into Criticism. A Study of Jane Austen's Novels and an Experiment in Method. Oxford: Oxford University Press 1987, p. 1.
[4] Paul Fortier: Babies, Bathwater and the Study of Literature. In: Computers and the Humanities 27 (1993), p. 375-385. Here 376.
[5] Susan Hockey: Electronic Texts in the Humanities. Principles and Practice. Oxford: Oxford University Press 2000, p. 6.
[6] John B. Smith: Computer Criticism. In: Roseanne G. Potter (Ed.).: Literary Computing and Literary Criticism. Theoretical and Practical Essays on Theme and Rhetoric. Philadelphia: University of Pennsylvania Press 1989, 13-44. Here p. 14.
[7] Thomas Corns: Computers in the Humanities. Methods and Applications in the Study of English Literature. In: Literary and Linguistic Computing 6/2 (1991), pp. 127-130. Here p. 129.
[8] Compare, for instance, the essays in a recent issue of Literary and Linguistic Computing on text analysis and text analysis tools. Literary and Linguistic Computing 18/2 (2003).
[9] Hans-Walter Gabler: There is Virtue in Virtuality. Future potentials of electronic humanities scholarship. In: ALLC/ACH 2002. New Directions in Humanities Computing. Conference Abstracts. Tübingen: Zentrum für Datenverarbeitung ZDV 2002, pp. 40-41. Here p. 40.
[10] Susan Hockey: Electronic Texts in the Humanities, p. 84. (footnote 5).
[11] Thomas Rommel: The Internet Survey for English Studies. In: Doris Feldmann/Fritz-Wilhelm Neumann/Thomas Rommel (Eds.): Anglistik im Internet. Proceedings of the 1996 Erfurt Conference on Computing in the Humanities. Heidelberg: Carl Winter 1997, pp. 101-112. Here p. 112.
[12] Jerome J. McGann: Radiant Textuality. Literature After the World Wide Web. New York: Palgrave 2001, p. 190-191.
[13] Susan Hockey: Electronic Texts in the Humanities, p. 66. (footnote 5).
[14] Fotis Jannidis: Computerphilologie. In: Ansgar Nünning (Ed.): Metzler Lexikon Literatur- und Kulturtheorie. Stuttgart/Weimar: Metzler 1998, pp. 70-72. Here p. 70.
[15] Thomas Rommel: »And trace it in this poem every line.« Methoden und Verfahren computerunterstützter Textanalyse am Beispiel von Lord Byrons Don Juan. (Tübinger Beiträge zur Anglistik; 15). Tübingen: Narr 1995, p. 384.
[16] Jerome J. McGann: Radiant Textuality, p. XII. (footnote 12).
[17] Thomas Rommel: »And trace it in this poem every line.« Methoden und Verfahren computerunterstützter Textanalyse, p. 384. (footnote 15).
[18] Cf. Catherine N. Ball: Automated Text Analysis. Cautionary Tales. In: Literary and Linguistic Computing 9 (1994), pp. 293-302.
[19] Cf. David I. Holmes: The Evolution of Stylometry in Humanities Scholarship. In: Literary and Linguistic Computing 13/3 (1998), pp. 111-117.
[20] Cf. John F. Burrows: A Computation Into Criticism. A Study of Jane Austen's Novels and an Experiment in Method. Oxford: Oxford University Press 1987.
[21] Nature <http://www.nature.com/nsu/030714/030714-13.html> (27.1.2004).
[22] Moshe Koppel et al.: Automatically Categorizing Written Texts by Author Gender. In: Literary and Linguistic Computing 17/4 (2002), pp. 401-412. Also <http://www.cs.biu.ac.il/~koppel/male-female-llc-final.pdf> (27.1.2004).
[23] M. Koppel et al.: Automatically Categorizing Written Texts by Author Gender. (footnote 22). <http://www.cs.biu.ac.il/~koppel/male-female-llc-final.pdf> (27.1.2004) »8. Conclusions«.
[24] Ibid., »1.3 Gender«.
[25] Ibid.
[26] Robin Lakoff: Language and Woman’s Place. New York/London: Harper Collins 1975, p. 8.
[27] Jennifer A. Simkins-Bullock/Beth G. Wildman: An Investigation into the Relationship Between Gender and Language. In: Sex Roles, 24, 3/4 (1991), pp. 149-160. Here p. 149.
[28] Michel Foucault: What is an Author? In: David Lodge (Ed.): Modern Criticism and Theory. A Reader. London/New York: Longman 1988, pp. 197-210. Here p. 205.
[29] See the entry »Feminist Poetics«. In: Alex Preminger/Terry V. F. Brogan (Eds.): The New Princeton Encyclopedia of Poetry and Poetics. Princeton, NJ: Princeton University Press 1993, p. 404-407.
[30] Compare in this context the notion of »parodic practices« that »disrupt the categories of the body, sex, gender and sexuality«. Judith Butler: Gender Trouble. Feminism and the Subversion of Identity. London: Routledge 1990, p. XII.
[31] The related question of who uses whose language in the context of debates on sex and gender is discussed in Alicia Ostriker: The Thieves of Language. Women Poets and Revisionist Mythmaking. In: Elaine Showalter (Ed.): The New Feminist Criticism. Essays on Women, Literature, and Theory. New York: Pantheon 1985, pp. 314-338.
[32] A reduced list of features and/or criteria for example is central in this respect; cf. »10. Discussion« in Richard S. Forsyth/David Holmes: Feature-Finding for Text Classification. In: Literary and Linguistic Computing 11/4 (1996), pp. 163-174. Here p. 170 ff.
[33] David Robey: Round Table on New Directions in Humanities Computing. In: ALLC/ACH 2002: New Directions in Humanities Computing. Conference Abstracts. Tübingen: ZDV 2002, pp. 106-109. Here p. 109.
[34] Ibid.
[35] Cf. Geoffrey Rockwell: What is Text Analysis, Really? In: Literary and Linguistic Computing 18/2 (2003), pp. 209-219.