Word Sequence REPETITION in Danish Legend Tradition

This article, co-written with two colleagues from UCLA (Pete Broadwell and Tim Tangherlini), used software developed at the Yale DHLab on a large corpus of Danish folklore. Broadwell extended the underlying textual reuse engine, Intertext (written by the DHLab’s Doug Duhaime) to produce new heatmap visualizations showing patterns in how re-used phrases were distributed in the entire corpus. You can read our article online here, and see the entire issue of the journal here.


The interrelationship between individual story repertoire and overall legend tradition, particularly in the context of performance, can now be more readily addressed as large collections of legends are made accessible in machine actionable form. An important, and open, question in the study of legend is whether tradition participants exhibit expressive consistencies in language within and across their individual story repertoires. These consistencies may be indicative of certain tradition-based phenomena, such as the crystallization of story elements into a specific linguistic form, or the use of formula as an integral part of the story itself. The recurrence of word sequences may reflect stylistic aspects of an individual’s storytelling (idiolect) or, on a broader scale, language use in a specific region (regiolect). The discovery of consistent word sequences across multiple stories also supports investigations into similarities between stories that are not topically related. Such discoveries might facilitate, in turn, comparative studies across repertoires of individual storytellers, classes of storytellers (e.g. gender), and regions, thus augmenting more standard comparisons of genre and topic. Finally, the investigation of repeated word sequences in a large folklore corpus may provide insight into the practices of folklore collectors, including how they created fair copy out of their at times chaotic field notes, and how they edited their collections for publication.