Here is an overview of the methodology that is implemented to work with ArchBio.

  • Infrastructure

    The site infrastructure is done with the CMS Drupal 8, using a customized template built basically from scratch. The choice was based on its suitability for scalable applications, its use of web standards, and its speed. General contents of the site are for the moment written only in English, but a Spanish version will be soon offered. All images used are in open access and free of copyright.

  • Database

    The variety and complexity of this biographical production demands the creation of a database; translating this literary phenomenon into computational data will furnish a new method of exploration and enable the recovery of structured data, which will inform a variety of research questions: Which authors wrote biographies? How many biographical texts are preserved? Who were the biographees (were they historical or fictional characters)? Which occupation did they have (noblemen, kings or queens, rulers, prelates, etc.)? How is gender represented? How many manuscripts and editions are available and where are they located? Which authors were translated? What languages were used (in originals and translations)? Moreover, the database also gathers chronological and geographical data: dates of birth and death, locations of birth, residency and death.

    The database is created within the CMS with the PHP MySQL system and it is in charge to organize all the materials and data: authors, editions, biographies, historical characters, gender, place of birth and death, noble title, job, geographical coordinates, among many other variables. Here are more details on the information that is being gathered:

    • The creation of the database features for now only Iberian authors and their literary works, such as collective and single biographies, or mémoires.

    • Each author, biographee, and even works when possible are connected to authority files and semantic data (BNE Datos, VIAF, Wikidata)

    • Biographical works have an additional section, under construction, that aims to compile existing manuscripts, old and modern editions of the text, year and places of publications. For the manuscripts, most of the information is manually recovered from PhiloBiblon.

  • Works and Texts

    The other main goal of ArchBio is to create a digital library of the Iberian biographical writings, some of which are not currently accessible in a digital form.

    • For now, the texts are recovered from the last edition without copyright, most of them from the beginning of the 20th century. This is not ideal, but so far is the fastest way to recover these texts. Generally, texts are converted into plain text (UTF-8) through OCR, and they are cleaned up of errors. Those texts without a modern print edition will need to be transcribed by hand since they only exist in manuscripts.

    • Then, text are enriched and encoded with XML markup, following the Text Encoding Initiative guidelines https://tei-c.org/. This XML-TEI markup is minimal and it only encodes structural information (chapters, paragraphs, notes, etc.) and semantic elements (i.e. name places, person names, dates, geographical places, etc.). The encoded corpus will be available soon in GitHub, as well as the documentation.

    • In the future, texts should offer diplomatic editions of their manuscripts, as well as a modernized version.