Sarah Schulz

Misses  Dr.
Sarah Schulz

Sarah Schulz
Phone +49 711 685-81394
Universität Stuttgart
Institut für Maschinelle Sprachverarbeitung
Pfaffenwaldring 5 b
70569 Stuttgart

I am a Computational Linguist with a Humanities background in Theater and Media Science and German. Digital Humanities give me the great opportunity to combine my interests in Computer Science with my curiosity about Humanities studies.I mainly work on non-standard text processing with a focus on automatic processing of historical languages.

I am a postdoc in the Centre for Reflected Text Analytics (CRETA) CRETA Website

I am also on GitHub

  • non-standard text processing
  • Digital Humanities
  • historical language processing

Computerlinguistische Methoden für die Digital Humanities WS 2015/16

Projektseminar: Untertitel im Film (SS2016)

Computerlinguistische Methoden für die Digital Humanities WS 2016/17

Projektseminar: Entitäten im Fokus (SS2017) 

Computerlinguistische Methoden für die Digital Humanities WS 2017/18

To appear: 
  • Janis Pagel, Nils Reiter, Ina Rösiger, Sarah Schulz. A Unified Annotation Workflow for Diverse Goals. In Sandra Kübler, Heike Zinsmeister (eds.): Proceedings of the Workshop: Annotation in Digital Humanities (annDH), August 2018.
  • Ina Rösiger, Sarah Schulz, Nils Reiter. Towards Coreference for Literary Text: Analyzing Domain-Specific Phenomena. In Proceedings of LaTeCH-CLfL, August 2018.



  • Sarah Schulz: The Taming of the Shrew - Non-Standard Text Processing in the Digital Humanities. Doctoral thesis. University of Stuttgart. 2018.


  • Sarah Schulz and Jonas Kuhn. Multi-modular domain-tailored OCR post-correction. Empirical Methods for Natural Language Processing (EMNLP) 2017. Copenhagen, 2017.
  • Nora Echelmeyer, Nils Reiter, Sarah Schulz. 2017. PoS­-Tagger für „das” Mittelhochdeutsche. In Book of Abstracts of DHd 2017, Bern, Switzerland, 2017.
  • Nils Reiter, Sarah Schulz, Gerhard Kremer, Roman Klinger, Gabriel Viehhauser, Jonas Kuhn. 2017. Teaching Computational Aspects in the Digital Humanities Program at University of Stuttgart – Intentions and Experiences.  Teaching NLP for Digital Humanities, Workshop at GSCL, 43-48.
  • Derek Doran , Sarah Schulz , and Tarek R. Besold. 2017. What Does Explainable AI Really Mean? A New Conceptualization of Perspectives. Comprehensibility and Explanation in AI and ML, Workshop at AI*IA.


  • Sarah Schulz and Jonas Kuhn. Learning from Within? Comparing PoS Tagging Approaches for Historical Text. In Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Marko Grobelnik and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis, editors. {Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), Portorož, Slovenia, May, 2016. European Language Resources Association (ELRA).
  • Sarah Schulz, Guy De Pauw, Orphée De Clercq, Bart Desmet, Véronique Hoste, Walter Daelemans, and Lieve Macken. 2016. Multimodular Text Normalization of Dutch User-Generated Content. ACM Trans. Intell. Syst. Technol. 7, 4, Article 61 (July 2016), 22 pages. DOI:
  • Schulz, S. & Reiter, N. (2016). Authorship Attribution of Mediaeval German Text: Style and Contents in Apollonius von Tyrland . Proceeding of Digital Humanities 2016 (p./pp. 883-885), July, Krakau.
  • Schulz, S. & Keller, M. (2016). Code-Switching Ubique Est - Language Identification and Part-of-Speech Tagging for Historical Mixed Text. LaTeCH@ACL, August, Berlin: The Association for Computer Linguistics.
  • Çetinoğlu, Ö., Schulz, S. & Vu, N. T. (2016). Challanges of Computational Processing of Code-Switching. Proceedings of EMNLP Workshop on Computational Approaches to Linguistic Code Switching (CALCS 2016) @EMNLP, November, Austin, Texas, USA.


  •  Orphée De Clercq, Schulz Schulz, Bart Desmet, and Véronique Hoste. Towards Shared Datasets for Normalization Research. In Nicoletta Calzolari (Conference Chair), Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, and Stelios Piperidis, editors, Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), Reykjavik, Iceland, May 2014. European Language Resources Association (ELRA).
  • Bart Desmet, Orphée De Clercq, Marjan Van de Kauter, Sarah Schulz, Cynthia Van Hee, and Veronique Hoste. Taaltechnologie 2.0: sentimentanalyse en normalisatie, pages 157–161. Beschouwingen uit een talenhuis : opstellen over onderwijs en onderzoek in de vakgroep Vertalen, Tolken en Communicatie aangeboden aan Rita Godyns. Academia Press, 2014.
  • Sarah Schulz. Named-Entity Recognition for User-Generated Content. In Proceedings of European Summer School in Logic Language and Computation 2014 Student Session. Springer, 2014.


  •  Sarah Schulz, Verena Lyding, and Lionel Nicolas. Compiling a diverse web corpus for South Tyrolean German - STirWaC. In Proceedings of the 8th Web as Corpus Workshop, pages 37–45, Lancaster, UK, 2013.
  • Orphée De Clercq, Sarah Schulz, Bart Desmet, Els Lefever, and Véronique Hoste. Normalization of Dutch User-Generated Content. In Proceedings of the 9th International Conference on Recent Advances in Natural Language Processing, Hissar, Bulgaria, 2013.


  • Marisa Delz, Benjamin Layer, Sarah Schulz, and Johannes Wahle. Overgeneralization of verbs — The change of the German verb system. In Proceedings of the 9th International Conference on the Evolution of Language, Evolang IX, pages 96–103, Kyoto, Japan, 3 2012.
  • Middle High German POS Tagger Model: MHG Pos
  • Language Identification and POS Tagging for Mixed Middle English - Latin text: Webapplication
  • OCR and OCR post-correction: Webapplication (for access please contact me, you need login data)
Curriculum Vitae