Abstract
In WP7, the aim is to develop expert-augmented machine learning algorithms that can reconnect archival documents to their regesta (scholarly summaries), with a focus on ecclesiastical sources. These algorithms will be able to operate with different languages and domain-specific categories, and will be trained to work in both directions (from full texts to summaries and vice versa) using a positive feedback mechanism. The project will also develop a visualisation software to present the connections between regesta and full documents and automatically hyperlink the summarized text to its source. The focus will be on the Regesta Pontificum Romanorum, a collection of ecclesiastical documents from the origins of the Latin Church to the 12th century.
Staffing
For WP7, the leader is Alberto MELLONI and the product owner is Laura RIGHI. Alberto MELLONI is responsible for overseeing the overall direction and management of WP7, while Laura RIGHI serves as the main point of contact for defining and prioritizing the features and requirements of the project. They are experts in their respective fields and play key roles in ensuring the success of WP7.
Steps
- Analysis and prototyping: This activity involves analyzing the material related to REVER, including data and software, and determining the technical, scientific, and user-related requirements for REVER to operate on the corpus of the Regesta Pontificum Romanorum.
- Scientific preparation: This includes tasks related to IT requirements and expertise, collection of the Regesta Pontificum Romanorum and full-text documents, pre-processing of the corpus, and scientific supervision and coordination.
- Development: This activity involves creating a repository of common architectural and design components, designing a performant, secure, robust, and stable architecture and design for REVER, developing tools for handwritten recognition and OCR, and optimizing an algorithm for linking summaries to the documents they summarize.
- Testing and validation: This includes testing REVER on the corpus of the Regesta Pontificum Romanorum and ensuring that it meets the necessary KPIs, as well as carrying out user acceptance testing.
- Dissemination and exploitation: This involves disseminating the results of REVER through publications and presentations, as well as exploring potential exploitation opportunities for the developed tools and algorithms.
Outcomes
- Develop and optimize an algorithm for automatically linking domain-specific summaries (regesta) with a standard structure to the documents they summarise
- Develop and attach a software for searching and visualizing the results of the algorithm
- Tools able to work online and offline with an accessible user interface
- Algorithm and software add value to the summarised text by applying semantic and domain-specific principles
- Integrate technology with palaeographic and diplomatic methodology to fuel a machine learning process
- Produce a domain-specific summary (regestum) with criteria different from generic text summarization
- Provide a hyperlink from the regestum to the full document for the domain-specific summary