Kegel/Elsacker über Digital edition of the Volledige Werken (Complete Works) of Willem Frederik Hermans

Abstract

A large edition project is nowadays bound to investigate means to bridge the gap between computer science and traditional scholarly research. In this paper we give an overview of ways in which the use of structural markup and computer based collation, analysis and presentation has informed the edition of the complete works of the Dutch writer W.F. Hermans.

[1]

Het grote medelijdenoriginally appeared in Randstad (February 1962) and was republished in the short story collection Een wonderkind of een total loss (1967).

[2]

Nooteboom (2005: 18).

[3]

Brunskill (2006).

[4]

The Volledige Werken are edited by Jan Gielkens and Peter Kegel, supported by Bert Van Elsacker (specialist in IT applications for the Humanities), Jolanda Hamers (research assistent), Marjo Eijgenraam and Connie Klützow (technical support). Projectleader is Annemarie Kets-Vree.

[5]

Hermans (1996: 13).

[6]

Myers (1986); see also [6] for an introduction.

[7]

An analytical bibliography for all primary sources, Het bibliografische universum van Willem Frederik Hermans, enabled us to make a selection of relevant textual witnesses.

[8]

The websites of the Electronic Text Center of the University of Virginia [9] and the Leeds Electronic Text Centre [10] provide excellent information about the digitisation of print material and related problems.

[9]

See Peter Kegel en Bert Van Elsacker 2003 [11].

[10]

In theory it's possible that one of the other versions contains OCR-errors which are identical to the digital reference version. This would cause the omission of real variants. However, an alternative approach would be very time-consuming, and in practice, it's unlikely these kind of errors occur.

[11]

For example The house of Refuge (New York: 1966); recently published as La Casa Vuota (Milan: 2006). A Chinese translation of the novel is forthcoming.

[12]

Mulisch (1952).

[13]

Pam (1983: 338). Other remarks by Hermans on his rewriting of the novel can be found in De Vree (1983: 265-281).

[14]

A basic search on all variants with the witness sigil of the magazine publication (T) produces over three thousand variants. This amount of variants doesn`t correspond with the exact number of textual revisions, for example because Collate sometimes treats larger textual variants (such as sentences) as a series of smaller variants. However, even then the amount of differences between the two successive publications remains significant.

[15]

Frans A. Janssen (1980: 30).

[16]

Frans A. Janssen (1980: 19).

[17]

See for a more detailed survey of alterations Jan Gielkens and Peter Kegel (forthcoming).

[18]

Theo A.J.M. Janssen (1994: 30–39).

[19]

Multatuli, Max Havelaar and De raadselachtige Multatuli, both to appear in Volledige Werken 17 (autumn 2012).

[20]

»Een mensenleven is een verzameling, een enorme opeenhoping bewegingen en denkbeelden«. (Hermans 2006,1: 216).


[1]	Introduction
[2]	In November 2005 the first volume of the Complete Works of Willem Frederik Hermans appeared. After several years of preparation this publication marked the official beginning of the largest Dutch edition project ever undertaken in the field of modern literature. This project is exceptional not only because of its size, but also because right from the beginning it has been set up as an experimental digital research project.
[3]	The initial impetus came from the need for automated text comparison. An academic edition cannot do without careful comparison of the different versions of a text, and the volume of research material in this project (over 50,000 pages) implied that manual collation was not feasible. At a later stage we formulated additional requirements for the digital collation data: in the first place we wanted to incorporate the textual research for the edition and the creation of the edition text into a digital working environment which was in line with international XML-TEI standards; secondly, this full-text research environment was not only to constitute the basis of the edition, but also to be used for new research by the editors or third parties.
[4]	In this article we will first give a short account of W.F. Hermans’s oeuvre and of this edition. Then we will discuss the possibilities available for automated comparison of texts and the use of XML for this edition. After a short section about the website that will be set up as a concomitant of the edition (and thereby providing significant added value), we will give examples of new analytical research and new forms of presentation which stem from practical experience in digital research.
[5]	Oeuvre and edition
[6]	Willem Frederik Hermans is widely regarded as the most important Dutch author of the second half of the twentieth century. In addition to novels, Hermans (1921–1995) wrote short stories, plays, poetry and essays; he also translated several texts, including Ludwig Wittgenstein’s Tractatus Logico-Philosophicus. Hermans was also a fierce polemicist – and in that quality was feared and admired in the Netherlands for decades. Hermans’s work is often characterized by the phrase ›creative nihilism, aggressive compassion, total misanthropy‹, the final sentence of the autobiographical short story Het grote medelijden (The Great Compassion) published in the 1960s. [1] At the presentation of the first volume of the present edition fellow author Cees Nooteboom, with whom Hermans maintained a lasting friendship, summarized the essence of Hermans’s work as follows:
[7]	Riddles which remain unsolved, inevitable fate, surreal intrigues, and again and again people in their helpless smallness, a prey to ›malice and misunderstanding‹ – another title by Hermans – with no foreseeable catharsis, that was the quintessence of his oeuvre. [2]
[8]	The work of Willem Frederik Hermans is receiving more and more international attention. Over the past few years Waltraud Hüsmert has translated three of Hermans’ major novels into German: Die Tränen der Akazien (2004, originally published as De tranen der acacia’s in 1949), Die Dunkelkammer des Damokles (2001, originally published as De donkere kamer van Damocles, 1958) and Au Pair (2003, originally published with the same title in 1989), which were praised by critics. The second of these novels recently also appeared in a French translation by Daniel Cunin, La Chambre noire de Damoclès, published by Gallimard. Another important novel by Hermans, Nooit meer slapen, appeared in England this summer. Beyond Sleep, a translation by Ina Rilke, was hailed in the English press as a forgotten masterpiece of post-war European literature, or in the words of the Times »a welcome if belated introduction to an original and challenging voice in modern European literature«. [3]
[9]	Hermans was a very prolific author, as is shown clearly by the publication schedule of the Volledige Werken, which will consist of a total of twenty-four volumes. Two volumes of the edition will be published annually until 2016, each with an average of about eight hundred pages. To the present day, with the publication of the essayistic works Boze Brieven van Bijkaart and Houten leeuwen en leeuwen van goud (Volledige Werken 12) in December 2006, three volumes have appeared. The Volledige Werken is a collaborative venture of the Willem Frederik Hermans Institute[1], which – also on behalf of Hermans’s heirs – aims to draw enduring national and international attention to his oeuvre, of Hermans’s regular publishing house De Bezige Bij[2] and of the Huygens Institute[3] (a research institute within the Royal Netherlands Academy of Arts and Sciences), which is responsible for the scholarly work on the text edition. [4]
[10]	The Volledige Werken are being published as a critical edition for a broad general public; the point of departure is the last version of the text authorized by Hermans. In view of Hermans’s working method an ultima manus edition was an obvious choice. Even after the first publication in print Hermans continued to work on his texts. Hermans himself would have preferred only the last editions of his works to be available to readers: »I wish that all old editions of books which have been reprinted in an improved version would crumble to dust as if by magic, even if the change only involves a comma.« [5]
[11]	In the printed volumes of this series the definitive texts are accompanied by commentaries which discuss the publication history at length, on the basis of numerous letters and other documents from the extensive Hermans archive. We also examine the reception of the works in newspapers and magazines. In order to bridge any knowledge gap there may be for readers in 2006 and later, explanatory notes will be added to Hermans’s essayistic work, which is more dated than the other genres. Any volume containing essayistic work will also include indexes of persons and titles.
[12]	There is also a website available as a companion to the edition [4]. This website contains the scholarly documentation of the separate volumes and the series as a whole, and also serves as a platform for ancillary digital publications and supplementary information. Because the scholarly documentation is published digitally, the book can remain first and foremost a book to read for pleasure, and therefore reach a wide public of interested readers.
[13]	Automated text comparison
[14]	Traditionally the meticulous comparison of selected versions of a text has been one of the most important tasks of an editor. Until a few decades ago manual comparison – which is extremely time-consuming – was the only possibility. However, in computer science theoretical and practical research into automated text comparison has been taking place since the 1970s. A major application is the analysis of successive versions of source code, for instance to trace the introduction of bugs. In computer science, the general task of ›text comparison‹ has to be expressed as a formal procedure in order to make the development of an algorithm possible. More precisely, ›text comparison‹ has been understood as a procedure which results in a list of changes between two versions of a text. The application of all changes to version A transforms version A into version B. The algorithm should try to keep the list as short as possible. This approach has led to a basic algorithm by Eugene Myers, on which some variations have been developed, and various implementations, of which the Unix/Linux tool ›diff‹ is the most widely used. [6] Source code implementing this algorithm is readily available for all major programming languages (C, C++, Java, Python, Perl, ...). One interesting application is to be found in the Wikis, websites which can be dynamically adapted by the users (see for example wikipedia, the well-known online encyclopaedia [5]). Each time a user modifies the content of a page, the differences between the new version and the previous one are calculated, so that all versions remain permanently available.
[15]	Somewhat apart from these developments in computer science, in the world of scholarly editing there have been initiatives to use computers for text comparison and, by extension, for the production of editions. The best-known examples are Peter Robinson’s Collate and Wilhelm Ott’s TUSTEP. Collate, the program in use by the Hermans project, is only available for the Macintosh Classic platform, an operating system which has now been superseded by OS X. The program is particularly suitable for older texts which have been divided into relatively short passages beforehand and in which there are not too many long or complex variants. The algorithm used to extract variants remains undocumented. Peter Robinson, who is the director of the Institute for Textual Scholarship and Electronic Editing (founded in 2005), has announced a successor to Collate called EDITION [7]. TUSTEP is actually a comprehensive environment for textual research and the production of editions. It is therefore a rather complex instrument to work with and sometimes seems like a programming language in itself. Another impediment is the absence of a graphical user interface (GUI).
[16]	On the one hand automatic text comparison has enormous advantages: the comparison is based on a formally defined algorithm, free of errors and in principle reproducible by others. Moreover, the use of computers saves a huge amount of time, which may be of crucial importance as often resources are lacking to collate texts manually.
[17]	On the other hand, there is no ready-made solution for the average user; whichever option is chosen, some extra training of the prospective user is mandatory, and experience with scripting may come in handy. Of course this learning process also takes time and energy. For small projects so much effort may outweigh the advantages, but considering the size of the Hermans project, in this case the investment did seem worthwhile. To date we have made extensive use of Collate, but we have also done satisfactory experiments using software tools such as diff.
[18]	The production of reliable digital data of the material to be compared should not be underestimated. If the transcription of the sources is carried out by a specialized firm a warranty of quality will usually be provided. If the transcription is done internally (manually or by OCR) a checking system is needed. In preparing the Volledige Werken most of the texts needed for research were digitized in two stages. [7] First, partly because some of the material to be digitized came from a private collection and was fragile, all the research material was put on microfilm. This was done in collaboration with the National Library of the Netherlands [8]. High standards of quality were set for the filming, since any shortcomings or errors in the films would lead to major problems later. After filming, the microfilms were scanned and converted into computer-readable text with the help of Optical Character Recognition (OCR). Although as a rule the digitization of twentieth-century material produces good results, [8] numerous problems of detail still had to be solved. [9]
[19]	In spite of the high quality of the digital copies, at least one manual check of the OCR result is still required. Even the most advanced text recognition programs are not accurate enough for their output to be used directly for text comparison. In order to attain an acceptable level of reliability for the Hermans project, we therefore introduced a separate phase with the specific aim of checking the accuracy of the data produced so far, before proceeding to the actual text comparison. For each title the digital version of the base text is compared extremely thoroughly with the source by a professional proof reader. Then we compare this digital reference version with all other digitized sources in a text editor which displays the differences. An assistant can then easily trace any false variants and correct them immediately after checking them with the printed source. In this way the text comparison which follows will remain free of corruption. [10]
[20]
[21]	Ill. I: Screenshot of Araxis Merge, a visual file comparison application
[22]	Thanks to the intensive preparation carried out in the field of automated text comparison it is possible to produce reliable and detailed overviews of all the differences between up to a dozen versions of a text. This means that a complex text history can be examined down to the tiniest detail in a manageable way. In the Hermans project such an instrument for the analysis of the extensive and complex material is of paramount importance.
[23]	XML and textual research
[24]	The final result of the automated text comparison produces a wealth of information which can be registered in a list, a synoptic presentation, a database, et cetera. However, in the production of an edition there are also considerable advantages to integrating this information into an XML document containing the edition text itself, and then to continue editing this document.
[25]	The basic idea of XML (and other markup languages) is to regard a text as a structure rather than as a series of characters. Typical examples of structural components are sections and chapters. By allocating a special meaning to certain characters a distinction can be made between the text itself and the markup, which makes the structure explicit and computer-readable. In XML the characters ›<› and ›>‹ designate markup codes. For example, a word from a foreign language can be encoded as <foreign>joie de vivre‹/foreign>. Because an XML document consists of plain unformatted text, it is not linked to specific programs and remains readable, even in the long term. XML encoding conventions have now been established for many different types of text. These models offer a package of codes for a certain type of text. The Text Encoding Initiative (TEI) [12] has prepared very comprehensive encoding conventions for applications in the humanities, with specific description models not only for prose, drama and poetry, but also for example for transcriptions of speech, dictionaries and critical editions.
[26]	We not only use the TEI tag sets to encode the base texts for the edition and the variants, but also document the textual research directly in the digital research environment. The main focus is on information about textual history. Editorial interventions are described and accounted for in a standardized way, if necessary with explanatory notes. Variants illustrative of the revision or production process, for example, or variants which are important from a more text analytical point of view are accompanied by a note with a classified attribute value. With the production of the texts in mind, fragments with special typography are explicitly singled out. Similarly, we add notes with fixed attribute values to places in the text with unusual spelling and punctuation, to stylistic, orthographic or grammatical peculiarities which are typical of the author, and to unclear passages in the text.
[27]	The example below, a paragraph from the short story Paranoia from the volume of short stories also titled Paranoia, shows what a base text with critical apparatus encoded in XML-TEI looks like. Three fragments of the encoded text have been highlighted in bold character.
[28]	<p id="p13.0"><milestone n="4" unit="block" /> <q id="q16.0">Mijnheer Wester vertelde mij dat Cleever op de <app><lem ed="PK" edRat="KRIT">HBS</lem><rdg wit="BT D16 D15 D14 D12 D11 D7 D1 DJ138">H. B. S.</rdg></app><note resp="PK" type="typo">kleinkapitaal</note> al<lb n="12" /> een heel vreemde jongen was. Mensenschuw. Hij ging met<lb n="13" /> niemand om. Het enige wat hij in zijn vrije tijd thuis deed, was<lb n="14" /> figuurzagen. Soms bracht hij lampen en brievenhangers mee<lb n="15" /> die<app id="A42"><lem cause="AC" id="l25" ref="D15M1 D15M2">hij</lem><rdg type="WV" wit="D15 D14 D12 D11 D7"><note resp="JH" type="textinfo">sic</note>bij</rdg></app> <anchor id="A43" /> gemaakt had en werd dan uitgelachen. In de oorlog, in<app id="A44"><lem cause="AC" id="l26" ref="D6M1"><lb n="16" /> mei</lem><rdg type="ZV" wit="D1 DJ138">Mei</rdg></app> '40, was Wester reserve<seg subtype="kop" type="teken"></seg> kapitein en Cleever diende in zijn<lb n="17" /> compagnie. Ze lagen in Zuid<seg subtype="kop" type="teken"></seg> Limburg en<app id="A47"><lem id="l28">kregen bevel</lem><rdg type="ZV" wit="DJ138">moesten al gauw</rdg></app> <anchor id="A48" /> de<lb n="18" /> benen<app id="A49"><lem id="l29">te</lem><rdg type="WW" wit="DJ138"><note resp="auto" type="textinfo">weggelaten</note></rdg></app> <anchor id="A50" /> nemen<app id="A51"><lem id="l30">voor ze één Duitser hadden gezien</lem><rdg type="ZW" wit="DJ138"><note resp="auto" type="textinfo">weggelaten</note></rdg></app> <anchor id="A52" />. Daar is<lb n="19" /> Cleever nooit overheen gekomen. Na de<app id="A53"><lem ed="PK" edRat="VAR" id="l31">demobilisatie</lem><rdg wit="BT D15 D14 D12 D11 D7 D1">mobilisatie</rdg><rdg type="WV" wit="DJ138">demobilisatie</rdg></app><note resp="PK" type="tc kl">Tijdens de mobilisatie hoeft Wester hem niet op te zoeken, dan zitten ze bij elkaar in een compagnie. Oorlog - Dienst in compagnie Wester- demobilisatie.</note> <anchor id="A54" /> heeft<lb n="20" /> Wester hem geregeld opgezocht. Tijdens de hele bezetting
[29]	Ill. 2: View of a passage from Paranoia in XML-TEI source encoding
[30]	The first example illustrates the structure of the variant apparatus. The apparatus element (<app>) always consists of the lemma (<lem>) of the base text or approved text, followed by the readings <rdg> of previous text versions, which are briefly encoded by means of sigla. In the second example shown here the attribute ›cause‹ with the value ›AC‹ inside the lemma element indicates that the alteration is based on a revision by the author, while the source of the revision – in this case a proof copy – is also indicated by fixed sigil encoding. In the third example the attribute ›edRat‹ in the lemma indicates a clarification of the editorial intervention, which is to be found in a note directly after the apparatus.
[31]	The number of encodings in this small piece of text from the short story Paranoia is still fairly limited. Nevertheless, the XML-TEI presentation of source encoding is not immediately comprehensible: it is impossible for the human eye to examine the history of the text quickly and systematically in an encoded document like this.
[32]	To facilitate the editorial work we therefore looked for a software program that could display the XML documents in a more transparent way. For this purpose we thoroughly tested the functionality of a large number of XML editors. Eventually we chose a program called XMLmind XML Editor [13] (XXE). This product is ›document-oriented‹: it resembles an ordinary word processor, works in much the same way and is therefore user-friendly. XXE is a much-used, stable program; moreover, it is free and the technical documentation of the program (manuals and online support) is satisfactory. In XXE the same text fragment, with identical XML-TEI encoding, is much easier to comprehend: the base text is now accompanied by a synoptic presentation of the variants. The main XML codes are shown graphically.
[33]
[34]	[Ill. 3: View of fragment from Paranoia in XXE]
[35]	The availability of an easily surveyable XML-TEI version of the base text with an inline variant apparatus is indispensable in view of the publication schedule – which has been laid down in a contract – according to which the editors of the Volledige Werken have to work. Most of the research work needed for the constitution of the text can be done within the digital research environment. As we have seen in the examples shown above, in order to facilitate the analysis of the variant material all corrections deriving from primary sources (in the case of Hermans mainly correction copies and proofs) have been added to the variant documentation by means of unambiguous encoding. As a means of checking the digital texts, the editor also has digital images of all the texts involved in the project. Typescripts and manuscripts, of which there are very few in the case of Hermans because much of this material has been destroyed over the years, are not systematically incorporated into the digital documentation, but are, of course, consistently taken into account in the research. Relevant information deriving from these is also systematically added to the digital documentation in the form of editorial notes. Mainly thanks to the digital research environment it was possible to carry out the textual research needed for a volume of 800 to 1000 pages (with two to as many as six separate texts) within a few months.
[36]	Website
[37]	The website which accompanies the edition and which will be expanded with each new volume of the series enables us to devote more attention to various aspects of the history of the text than we can do in the commentary alone. For example, we give a description of the kind of corrections Hermans made in all the surviving correction copies and proofs. On the website we show representative reproductions of primary sources from the Hermans archive. These descriptions – which are provided for each source but because of the form of presentation chosen can also be read as a more or less continuous text – offer more insight into the patterns of revision over a long series of years, in first instance per text, but also for all the texts published in the edition as a whole.
[38]	All surviving typescripts and manuscripts are described on the basis of a number of fixed categories such as date (if possible), size and completeness, storage location and identification marker, closer characterization (rough copy or fair copy typescript etc.), the writing materials used, and references to any striking similarities or differences compared with earlier or later versions of the text. Often these typescripts contain interesting information about the genesis of the text, for example in relation to narrative technique or interpretation. The novella Het behouden huis (The House of Refuge) is a striking example of this. On the website we describe three surviving typescripts of this short story, a key text in the Willem Frederik Hermans’s oeuvre which has been translated several times. Recently, agreement was reached about the production of new translations. [11] The many differences between the first and later typescripts make it clear that only very gradually Hermans attained the »extraordinarily effective style« [12] of this short story. More information about these textual witnesses can be found through the witness list, which is also published on the website.
[39]	Digital publications to supplement the edition will also appear on the website. In the autumn of 2007, when the second volume of essayistic work by Hermans is to appear, on the website we will present a cumulative digital index to the essayistic work covering at least all the volumes containing essayistic works which have appeared so far. This digital index will be based on the XML encodings which were added to the XML data when the editorial index for the publication in book form was prepared, partly by means of specially developed software. At present experiments are carried out with scripts to further automate this tagging. At some point – hopefully long before the publication of the last volume of the edition – we intend to provide access to all of the essayistic work (including all articles by Hermans which appeared in newspapers and magazines but never in book form) via this comprehensive index. The example below shows a short encoded fragment from Hermans’s collection of columns from Paris, Boze Brieven van Bijkaart (Angry Letters by Bijkaart). All historical names of individuals in the text have been given a <name> element with the attribute ›reg‹ which contains the standardized form of that name.
[40]	Van tijd tot tijd breken ze een lans voor de nagedachtenis <lb TEIform="lb" n="9"/>van de nobele <name reg="Allende, Salvador" resp="auto">Allende.</name></p> <p TEIform="p" id="p313.0"><name reg="Allende, Salvador" resp="auto"> <app> <lem id="d1e10581" type="var-DJ"><note resp="auto" type="textinfo">weggelaten</note></lem> <rdg wit="DJ"><seg subtype="wit" type="teken"/></rdg> </app>Allende</name>, volkomen wettig gekozen, net als <name reg="Hitler, Adolf" resp="auto">Hitler</name> indertijd. Of <lb TEIform="lb" n="10"/>als <name reg="Nixon, Richard" resp="auto">Nixon</name> die zelfs met de grootste meerderheid uit de Amerikaanse <lb TEIform="lb" n="11"/>geschiedenis gekozen is.</p> <p TEIform="p" id="p314.0"><app> <lem id="d1e10602" type="var-DJ"><note resp="auto" type="textinfo">weggelaten</note></lem> <rdg wit="DJ"><seg subtype="wit" type="teken"/></rdg> </app>De laatste foto van <name reg="Allende, Salvador" resp="auto">Allende</name> toont de vlezige socialist met een <lb TEIform="lb" n="12"/>vergiet op zijn kop en onder z'n arm z'n eigen, hoogstpersoonlijke <lb TEIform="lb" n="13"/>machinegeweer. <name reg="Brezjnev, Leonid">Brezjnev</name> had hem dat cadeau gegeven, of <lb TEIform="lb" n="14"/>was het <name reg="Castro, Fidel" resp="auto">Fidel Castro?</name></p> <p TEIform="p" id="p315.0"><app> <lem id="d1e10627" type="var-DJ"><note resp="auto" type="textinfo">weggelaten</note></lem> <rdg wit="DJ"><seg subtype="wit" type="teken"/></rdg> </app>
[41]	Ill. 4: Fragment of Boze Brieven van Bijkaart in XML-TEI encoding
[42]	Analytical research and presentation
[43]	The insights into Hermans’s texts which result from the analysis of his working methods and the editorial work on the text for the edition in the digital working environment may serve very well as a point of departure for further research by scholarly editors and researchers in literature and other fields. The editorial work involved in the first two volumes of Hermans’s works has already brought to light a number of interesting findings which raise questions about commonly accepted views on the development of his work. The fortune of De tranen der acacia’s (The Tears of the Acacias), is a good illustration of this.
[44]	The textual history of De tranen der acacia’s as described on the website shows that in fact Hermans continually revised this text, from its first serial magazine publication in 1946 and 1947 to the ultima manus edition of 1993. These conclusions, based on the study of the primary sources, tell a different story than the one repeatedly presented by Hermans in interviews. Hermans stressed that De tranen der acacia’s was a spontaneously written novel in which initially only minor corrections were made by the author. According to Hermans only later on he began to regret his former working method. This was one of the reasons why he published a revised, ›definitive version‹ of the novel in 1971:
[45]	Writing had to be spontaneous. It was wrong to devise or construct something. Correction was regarded as redundant, because you had to write as it came to you. Therefore there were never fair copies made of my first books, Conserve and De tranen der acacia’s. They were typed, and some corrections were made in ink, but eventually they went to the printer just like that. In retrospect this was an error, but at the time I knew no better. Later I rewrote Conserve and drastically improved De tranen der acacia’s. [13]
[46]	However, it was only close study of details in the digital research material that made it really clear how different this self-constructed picture was from the actual course of events. The XML-TEI data show that for the first publication in book form of De tranen der acacia’s Hermans thoroughly revised the earlier version. A simple search in the XML-data for all points of variation in the text with the magazine publication as a variant produces several thousand differences. [14] The revisions are partly editorial and stylistic (punctuation and word variants), but are often also substantial. There are a few striking revisions which are directly related to the main theme of the book, identified by later researchers as »the unknowability of man«. [15] In fact, in later analyses of the novel one of these fragments which was added only when the work was published in book form, was repeatedly referred to as a crucial passage. [16] When Hermans was preparing the text for publication as a book, he also made sizeable narrative alterations which strengthened the construction of the novel. It also became clear that the later revision of the novel for the 1971 edition was certainly not a one-off. The changes correspond with alterations Hermans made in previous reprints of the novel. At least four recurring categories of these alterations may be distinguished: 1) spelling alterations, 2) modernization of language and style to keep the text optimally readable, 3) explanations of passages relating to historical events and 4) strengthening of the thematic unity of the novel. [17]
[47]	It is precisely the technical environment in which the edition is being prepared which enables the editor to observe these patterns, to categorize findings, and to examine them in greater detail. Due to systematic encoding of the material the accumulated data can be searched accurately and checked for correlations, and working hypotheses can be continually tested (for example by using XML search languages such as XPath and XQuery) and if necessary modified.
[48]	At present we are examining the possibilities of presenting text and the conclusions reached in research in a more dynamic way, partly on the basis of the digital research documentation and the findings of the analysis – to date – of De tranen der acacia’s. The short story Paranoia will serve as a pilot. The online version of this story contains the reading text of the printed edition and all the text versions of the short story which were of importance in the editorial research. We intend to place a short introductory section before the full-text documentation in which a few important revisions in the short story are discussed, as a sort of reader’s guide. For example, in the version of this story in the first publication in book form in 1953 Hermans puts more emphasis on the theme of the housing shortage, which was a key concern in the Netherlands after the war, especially in Amsterdam, where the story is set. Due to adjustments in substance and narrative, the events in the book version of Paranoia are described more from the perspective of the character Cleever than in the magazine publication (which appeared in 1948), a significant revision in a story about someone who suffers from persecution mania.
[49]	In the digital publication we would like to integrate observations and analyses of this kind into the texts to which they refer in a dynamic way, as a form of empirical evidence. Relevant text passages will therefore be tagged in the online presentation so that they can be seen separately and in context. We are also examining other presentation options. Overviews of the alterations Hermans made in correction copies and proofs can be generated on the basis of the encodings added to the text. These overviews show again that in his unremitting efforts to perfect his texts Hermans not only corrected printer’s errors but also adapted the spelling of his texts and modernized their style. Orthographic inconsistencies and peculiarities, or substantial inconsistencies which were caused by the author’s revisions and will therefore not be corrected in the edition can also be made clearly visible in a digital presentation.
[50]	Conclusion
[51]	When the preparations for the Volledige Werken began, there were few examples or guidelines for the use of automated text comparison and XML in textual research. At that point the experiment consisted of setting up a functional system and establishing conventions and procedures. That stage is now behind us. We have a research environment based on XML-TEI which not only makes it possible to publish two volumes of the edition annually but also offers many new possibilities. As more and more publications in book form of the Volledige Werken appear, the new challenges will lie mainly in the area of digital research.
[52]	As far as we are concerned, the examples shown in the previous section are just the beginning of more extensive research into one or more texts by Hermans, to be conducted by editors themselves or by other literary scholars, on the basis of the XML data. Research in a digital environment in the fields of poetics, literary history or narratology may well lead to surprising new insights. There are also possibilities for linguistic research. When in 1992 a critical edition (in book form) appeared of Multatuli’s Max Havelaar (the most important nineteenth century Dutch novel by the most important nineteenth-century Dutch author), linguist Theo Janssen saw the variant apparatus accompanying this edition as a goldmine for research in fields such as sociolinguistic trends, and syntactic and lexical developments. [18] Hermans, who produced an edition of Max Havelaar himself and wrote a biography of Multatuli, [19] published and revised his oeuvre over a period of fifty years, a period during which the Dutch language was in a state of flux, if only because of the constantly recurring discussions about and changes in the spelling of the language. Digital availability of the entire oeuvre of possibly the most important Dutch author of the twentieth century would presumably also be a potential goldmine from a linguistic perspective.
[53]	The publication in digital form of new digital research, and by extension of the edition itself, is a second domain in which we want to continue experimenting. The XML document containing a text and its variants represents not just one fixed text but several versions at once. Were this to be published, we think a minimum requirement would be for this multiformity to be made visible. Another aspect is the integration of edition and analysis. On the one hand, when passages of text have been construed in a certain way by researchers, the digital edition can be used, as we have seen, to look up these passages in their original context – in other words as empirical evidence. On the other hand other analyses are also conceivable, such as a narratological study of narrative structure, which in the form of hypertext might serve as a point of access or a guide to Hermans’s work. Ideally, in future digital text presentations text and research will constitute an integrated collection of data which can constantly be consulted, modified and expanded. Or as Hermans himself once put it in his Preambule (›Preamble‹) to the volume of short stories Paranoia: ›…a collection, an enormous accumulation of movements and ideas.‹ [20]

»A collection, an enormous accumulation of movements and ideas«.
Research documentation for the digital edition of
the Volledige Werken (Complete Works)
of Willem Frederik Hermans

Abstract

Introduction

Oeuvre and edition

Automated text comparison

XML and textual research

Website

Analytical research and presentation

Conclusion

Works cited:

Websites:

»A collection, an enormous accumulation of movements and ideas«. Research documentation for the digital edition of the Volledige Werken (Complete Works) of Willem Frederik Hermans

Abstract

Introduction

Oeuvre and edition

Automated text comparison

XML and textual research

Website

Analytical research and presentation

Conclusion

Works cited:

Websites:

»A collection, an enormous accumulation of movements and ideas«.
Research documentation for the digital edition of
the Volledige Werken (Complete Works)
of Willem Frederik Hermans