Readme: Dataset of conceptual terminology in Carolingian royal advice literature

This dataset was gathered as part of my doctoral research, 'Ideal and moral rulership in the Carolingian world'. It was funded by the AHRC NWCDTP, grant number 2636070, and supervised by Dr Marios Costambeys and Dr Ingrid Rembold. Some work on gathering this data was done at the University of Georgia, Athens, USA, in the DigiLab.

Data creation and observation conditions: I gathered this data using quanteda (quanteda.io) (Benoit et al. 2018) Cleaning and processing of data involved use of the Classical Language Toolkit (CLTK) (Johnson et al. 2014-2021) and my own RegEx scripts, alongside use of tidyverse tools in R. Keyness data was produced through quanteda.textstats, a supplementary R package to quanteda. When both lemmatised and unlemmatised data were used in the PhD thesis, both sets of data are provided. When individual authors were compared to each other, primarily in the use of the terms in chapter 6, each author's keyness dataset is provided alongside the set for the corpus as a whole.

Code label and field descriptors: Relative frequency data is classified by text. The number for each word reflects the occurrence of that word per 100 words in the text. 
Keyness data is also classified in the same way. The number for each word reflects the chi^2 keyness of that word within a 10-gram of the stem mentioned in the csv file (for example, ecclesia_keyness.csv reflects keyness in the ecclesia stem). The occurrences in the sample set and reference set are noted, as n_target and n_reference, and then the chi^2 keyness is provided, next to the p value.
Cosine data is classified in a similar way. The number for each text reflects the cosine similarity of that text to the text named in the csv file name (for example, DeXIIabusivis_cosine.csv is the cosine similarity data to De XII abusivis).

All data is as outputted by Quanteda. 

Both lemmatised and unlemmatised original texts are available on request (as some critical editions remain within copyright).

Contact: e.meehan@liverpool.ac.uk (or, if this e-mail address does not work, my ORCiD is 0000-0003-3138-1048).

Works Cited
Benoit, Kenneth, Kohei Watanabe, Haiyan Wang, Paul Nulty, Adam Obeng, Stefan Müller, and Akitaka Matsuo. (2018) “quanteda: An R package for the quantitative analysis of textual data”. Journal of Open Source Software. 3(30), 774. https://doi.org/10.21105/joss.00774.
Johnson, Kyle P., Patrick Burns, John Stewart and Todd Cook. (2014-2021). CLTK: The Classical Language Toolkit. https://github.com/cltk/cltk.