The Corpus of Early Modern English Trials (1650-1700): Building of the Corpus and Hypotheses of Normalization
DOI:
https://doi.org/10.13133/2239-1983/18569Abstract
The present paper discusses the building stages of the Corpus of Early Modern English Trials (1650-1700), henceforth EMET, a 1.8 million words highly specialized historical corpus of trial proceedings. The main purpose of the creation of the above-mentioned corpus is to shed light on the pragmatic aspects of Early Modern spoken English, since trial proceedings are considered records of authentic dialogues (Culpeper and Kytö 2010, 17). More specifically, the EMET was created in order to investigate the pragmatic influences both on the choice of the second person pronoun, which coexisted in the forms thou and you, and of any T- and Y-form used during the Restoration: thee, prithee, prethee, prethy, pray thee, thy, thy self, thyself, thine, you, ye, your, your self, yourself, yours and pray you.
The initial part of the essay will briefly explore the phase of the archives’ consultation, the criteria behind the selection of the trials and the technical stages that are necessary to the uploading of a corpus on #LancsBox and its study. Afterwards, the EMET itself will be presented (number of documents, total number of tokens and average number of tokens per text, and types of charges involved).
Then, the essay will focus on editing, normalization and POS tagging. More specifically, it will be illustrated how trials, and historical documents in general, should be edited in order to successfully analyse them with corpus linguistics tools. Then, different hypotheses of normalization of the EMET will be compared in detail and discussed. After determining which normalization parameters suit best the corpus, the advantages of such process will be highlighted. Lastly, the issues derived from the normalization process – mainly bound to proper nouns, badly preserved documents (i.e., noisy texts), and Latin (and foreign) terms – will be examined.
Downloads
Published
How to Cite
Issue
Section
License
Gli autori che pubblicano su questa rivista accettano le seguenti condizioni:- Gli autori mantengono i diritti sulla loro opera e cedono alla rivista il diritto di prima pubblicazione dell'opera, contemporaneamente licenziata sotto una Licenza Creative Commons - Attribuzione che permette ad altri di condividere l'opera indicando la paternità intellettuale e la prima pubblicazione su questa rivista.
- Gli autori possono aderire ad altri accordi di licenza non esclusiva per la distribuzione della versione dell'opera pubblicata (es. depositarla in un archivio istituzionale o pubblicarla in una monografia), a patto di indicare che la prima pubblicazione è avvenuta su questa rivista.
- Gli autori possono diffondere la loro opera online (es. in repository istituzionali o nel loro sito web) prima e durante il processo di submission, poiché può portare a scambi produttivi e aumentare le citazioni dell'opera pubblicata (Vedi The Effect of Open Access).