Beyond human imagination: The art of creating prompt-driven3D scenes with Generative AI

AUTHORS: Giulio Federico, Fabio Carrara, Giuseppe Amato, Marco Di Benedetto

WORK PACKAGE: WP 10 – Retina

URL: Pageflex Server [document: D-VTT-2555FF3A_00001]

Keywords: Generative AI, Computer Graphics, Denoising Diffusion Probabilistic Model, Gaussian
Splatting, NeRF, Signed Distance Field, Video Reconstruction, Deep Learning, Machine
Learning, Artificial Intelligence, Text-to-3D, Image-to-3D, Urban Environment, Score
Distillation Sampling

Abstract
The reconstruction of large-scale real outdoor environments is crucial for promoting the adoption
of Extended Reality (XR) in industrial and entertainment sectors. This task often requires significant
resources such as depth cameras, LiDAR sensors, drones, and others, alongside traditional data
processing pipelines like Structure-from-Motion (SfM), which demand extensive computational
resources, thus preventing real-time processing. Additional constraints arise from the limited
accessibility to the aforementioned resources. While 3D laser scanners (e.g., LiDAR) are precise and fast,
they are expensive, often bulky
especially the high-quality models
and their effectiveness is
contingent on the type of environment being scanned. Depth sensors offer a more affordable and
compact alternative; however, due to their limited range, they are ideal only for indoor settings.
Photogrammetry, while capable of producing high-quality results at a lower cost, can be time
consuming and computationally intensive. It also suffers from limited accuracy, strong dependence on
lighting conditions, and the need for numerous photos from various angles that can be not always easily
accessible. (…)

Spatio-Temporal 3D Reconstruction from Frame Sequences and Feature Points

AUTHORS: Giulio Federico, Fabio Carrara, Giuseppe Amato, Marco Di Benedetto

WORK PACKAGE: WP 10 – Retina

URL: https://dl.acm.org/doi/10.1145/3672406.3672415

Keywords:

Abstract
Reconstructing a large real environment is a fundamental task to promote eXtended Reality adoption in industrial and entertainment fields. However, the short range of depth cameras, the sparsity of LiDAR sensors, and the huge computational cost of Structure-from-Motion pipelines prevent scene replication in near real time. To overcome these limitations, we introduce a spatio-temporal diffusion neural architecture, a generative AI technique that fuses temporal information (i.e., a short temporally-ordered list of color photographs, like sparse frames of a video stream) with an approximate spatial resemblance of the explored environment. Our aim is to modify an existing 3D diffusion neural model to produce a Signed Distance Field volume from which a 3D mesh representation can be extracted. Our results show that the hallucination approach of diffusion models is an effective methodology where a fast reconstruction is a crucial target.

Optimizing LLMs for Italian: Reducing Token Fertility and Enhancing Efficiency Through Vocabulary Adaptation

AUTHORS: Luca Moroni, Giovanni Puccetti, Pere-Lluís Huguet Cabot, Andrei Stefan Bejgu, Alessio Miaschi, Edoardo Barba, Felice Dell’Orletta, Andrea Esuli, Roberto Navigli

WORK PACKAGE: WP 8 – UbiQuity

URL:

Keywords:

Abstract
An increasing number of pretrained Large Language Models (LLMs) are being released, though the majority are predominantly designed for English. While they can often handle other languages due to contamination or some degree of multilingual pretraining data, English-centric LLMs are not optimized for non-English languages. This leads to inefficient encoding (high token ‘fertility’) and slower inference times for those languages. In this work, we explore various vocabulary adaptation techniques to tailor English LLMs for the Italian language. We introduce Semantic Alignment Vocabulary Adaptation (SAVA), a novel method that learns neural mapping to accomplish vocabulary substitution, which achieve state-of-the-art performances on several downstream tasks. We adapted two LLMs: Mistral-7b-v0.1, reducing token fertility by 25%, and Llama-3.1-8b, optimizing the vocabulary and reducing the number of parameters by 1 billion. We show that, after the adaptation of the vocabulary, these models can recover their performances with a relatively limited stage of continual training on the target language. Finally, we test the adapted models’ capabilities on several multi-choice and generative tasks.

Wordnet and Word Ladders: Climbing the abstraction taxonomy with LLMs

AUTHORS: Giovanni Puccetti, Andrea Esuli, Marianna Bolognesi

WORK PACKAGE: WP 8 – UbiQuity

URL: https://github.com/unipv-larl/GWC2025/releases/download/papers/GWC2025_paper_18.pdf

Keywords:

Abstract
WordNet has long served as a benchmark for approximating the mechanisms of semantic categorization in the human mind, particularly through its hierarchical structure of word synsets, most notably the IS-A relation. How ever, these semantic relations have traditionally been curated manually by expert lexicographers, relying on external resources like dictionaries and corpora. In this paper, we explore
whether large language models (LLMs) can be leveraged to approximate these hierarchical semantic relations, potentially offering a scalable and more dynamic alternative for maintaining and updating the WordNet taxonomy.
This investigation addresses the feasibility and implications of automating this process with LLMs by testing a set of prompts encoding different sociodemographic traits and finds that adding age and job information to the prompt affects the model ability to generate text in agreement with hierarchical semantic relations while gender does not have a statistically significant impact.

The Invalsi Benchmarks: measuring the Linguistic and Mathematical understanding of Large Language Models in Italian

AUTHORS: Giovanni Puccetti, Maria Cassese, Andrea Esuli

WORK PACKAGE: Wp 8 – UbiQuity

URL: https://aclanthology.org/2025.coling-main.453/

Keywords:

Abstract
While Italian is a high-resource language, there are few Italian-native benchmarks to evaluate generative Large Language Models (LLMs) in this language. This work presents three new benchmarks: Invalsi MATE to evaluate models performance on mathematical understanding in Italian, Invalsi ITA to evaluate language under standing in Italian and Olimpiadi MATE for more complex mathematical understanding. The first two benchmarks are based on the Invalsi tests, which are administered to students of age between 6 and 18 within the Italian school system and have been validated by several experts in teaching and pedagogy, the third one comes from the Italian highschool math Olympics. We evaluate 10 powerful language models on these benchmarks and we find that they are bound by 71% accuracy on Invalsi MATE, achieved by Llama 3.1 70b instruct and by 88% on Invalsi ITA. For both Invalsi MATE and Invalsi ITA we compare LLMs with the average performance of Italian students to show that Llama 3.1 is the only one to outperform them on Invalsi MATE while most models do so on Invalsi ITA, we then show that Olimpiadi MATE is more challenging than Invalsi MATE and the highest accuracy, achieved by Llama 3.1 405b instruct accuracy is 45%.

ABRICOT – ABstRactness and Inclusiveness in COntexT: A CALAMITA Challenge

AUTHORS: Giovanni Puccetti, Claudia Collacciani, Andrea Amelio Ravelli, Andrea Esuli, Marianna Bolognesi

WORK PACKAGE:

URL: ABRICOT peach – ABstRactness and Inclusiveness in COntexT: A CALAMITA Challenge

Keywords: Abstraction, Inclusiveness, Context, LLM evaluation, Italian Language Models

Abstract
The ABRICOT Task is designed to evaluate Italian language models on their ability to understand and assess the abstractness and inclusiveness of language, two nuanced features that humans naturally convey in everyday communication. Unlike binary categorizations such as abstract/concrete or inclusive/exclusive, these features exist on a continuous spectrum with varying degrees of intensity. The task is based on a manual collection of sentences that present the same noun phrase (NP) in different contexts, allowing its interpretation to vary between the extremes of abstractness and inclusiveness. This challenge aims to verify the how LLMs perceive subtle linguistic variations and their implications in natural language.

INVALSI – Mathematical and Language Understanding in Italian: A CALAMITA Challenge

AUTHORS: Giovanni Puccetti, Maria Cassese, Andrea Esuli

WORK PACKAGE:

URL: INVALSI – Mathematical and Language Understanding in Italian: A CALAMITA Challenge

Keywords: Mathematical Understanding, Language Understanding, Invalsi, Large Language Models, Italian Language Models

Abstract
While Italian is a high resource language, there are few Italian-native benchmarks to evaluate Language Models (LMs) generative abilities in this language. This work presents two new benchmarks: Invalsi MATE to evaluate models performance on mathematical understanding in Italian and Invalsi ITA to evaluate language understanding in Italian.
These benchmarks are based on the Invalsi tests, which are administered to students of age between 6 and 18 within the Italian school system. These tests are prepared by expert pedagogists and have the explicit goal of testing average students’ performance over time across Italy. Therefore, the questions are well written, appropriate for the age of the students, and are developed with the goal of assessing students’ skills that are essential in the learning process, ensuring that the benchmark proposed here measures key knowledge for undergraduate students.
Invalsi MATE is composed of 420 questions about mathematical understanding, these questions range from simple money counting problems to Cartesian geometry questions, e.g. determining if a point belongs to a given line. They are divided into 4 different types: scelta multipla (multiple choice), vero/falso (true/false), numero (number), completa frase (fill the gap).
Invalsi ITA is composed of 1279 questions regarding language understanding, these questions involve both the ability to extract information and answer questions about a text passage as well as questions about grammatical knowledge. They are divided into 4 different types: scelta multipla (multiple choice), binaria (binary), domanda aperta (open question), altro (other).
We evaluate 4 powerful language models both English-first and tuned for Italian to see that best accuracy on Invalsi MATE is 55% while best accuracy on Invalsi ITA is 80%.

AI ‘News’ Content Farms Are Easy to Make and Hard to Detect: A Case Study in Italian

AUTHORS: Giovanni Puccetti, Anna Rogers, Chiara Alzetta, Felice Dell’Orletta, Andrea Esuli

WORK PACKAGE:

URL: AI ‘News’ Content Farms Are Easy to Make and Hard to Detect: A Case Study in Italian – ACL Anthology

Keywords:

Abstract
Large Language Models (LLMs) are increasingly used as ‘content farm’ models (CFMs), to generate synthetic text that could pass for real news articles. This is already happening even for languages that do not have high-quality monolingual LLMs. We show that fine-tuning Llama (v1), mostly trained on English, on as little as 40K Italian news articles, is sufficient for producing news-like texts that native speakers of Italian struggle to identify as synthetic.We investigate three LLMs and three methods of detecting synthetic texts (log-likelihood, DetectGPT, and supervised classification), finding that they all perform better than human raters, but they are all impractical in the real world (requiring either access to token likelihood information or a large dataset of CFM texts). We also explore the possibility of creating a proxy CFM: an LLM fine-tuned on a similar dataset to one used by the real ‘content farm’. We find that even a small amount of fine-tuning data suffices for creating a successful detector, but we need to know which base LLM is used, which is a major challenge.Our results suggest that there are currently no practical methods for detecting synthetic news-like texts ‘in the wild’, while generating them is too easy. We highlight the urgency of more NLP research on this problem.

Is CLIP the main roadblock for fine-grained open-world perception?

AUTHORS: Lorenzo Bianchi, Fabio Carrara, Nicola Messina, Fabrizio Falchi

WORK PACKAGE:

URL: Is CLIP the main roadblock for fine-grained open-world perception?

Keywords: fine-grained understanding, open-vocabulary object detection, image-text matching, evaluation study

Abstract
Modern applications increasingly demand flexible computer vision models that adapt to novel concepts not encountered during training. This necessity is pivotal in emerging domains like extended reality, robotics, and autonomous driving, which require the ability to respond to open-world stimuli. A key ingredient is the ability to identify objects based on free-form textual queries defined at inference time – a task known as open-vocabulary object detection. Multimodal backbones like CLIP are the main enabling technology for current open-world perception solutions. Despite performing well on generic queries, recent studies highlighted limitations on the fine-grained recognition capabilities in open-vocabulary settings – i.e., for distinguishing subtle object features like color, shape, and material. In this paper, we perform a detailed examination of these openvocabulary object recognition limitations to find the root cause. We evaluate the performance of CLIP, the most commonly used vision-language backbone, against a fine-grained objectmatching benchmark, revealing interesting analogies between the limitations of open-vocabulary object detectors and their backbones. Experiments suggest that the lack of fine-grained understanding is caused by the poor separability of object characteristics in the CLIP latent space. Therefore, we try to understand whether fine-grained knowledge is present in CLIP embeddings but not exploited at inference time due, for example, to the unsuitability of the cosine similarity matching function, which may discard important object characteristics. Our preliminary experiments show that simple CLIP latent-space re-projections help separate fine-grained concepts, paving the way towards the development of backbones inherently able to process fine-grained details. The code for reproducing these experiments is available at https://github.com/lorebianchi98/FG-CLIP.

Same or Different? Diff-Vectors for Authorship Analysis

AUTHORS: Silvia Corbara, Alejandro Moreo, Fabrizio Sebastiani

WORK PACKAGE: WP 8 – UbiQuity

URL: https://dl.acm.org/doi/10.1145/3609226

Keywords: deep learning, machine learning, information retrieval, computer science, data mining, support vector, logistic regression, artificial intelligence, supervised learning

Abstract
In this article, we investigate the effects on authorship identification tasks (including authorship verification, closed-set authorship attribution, and closed-set and open-set same-author verification) of a fundamental shift in how to conceive the vectorial representations of documents that are given as input to a supervised learner. In “classic” authorship analysis, a feature vector represents a document, the value of a feature represents (an increasing function of) the relative frequency of the feature in the document, and the class label represents the author of the document. We instead investigate the situation in which a feature vector represents an unordered pair of documents, the value of a feature represents the absolute difference in the relative frequencies (or increasing functions thereof) of the feature in the two documents, and the class label indicates whether the two documents are from the same author or not. This latter (learner-independent) type of representation has been occasionally used before, but has never been studied systematically. We argue that it is advantageous, and that, in some cases (e.g., authorship verification), it provides a much larger quantity of information to the training process than the standard representation. The experiments that we carry out on several publicly available datasets (among which one that we here make available for the first time) show that feature vectors representing pairs of documents (that we here call Diff-Vectors) bring about systematic improvements in the effectiveness of authorship identification tasks, and especially so when training data are scarce (as it is often the case in real-life authorship identification scenarios). Our experiments tackle same-author verification, authorship verification, and closed-set authorship attribution; while DVs are naturally geared for solving the 1st, we also provide two novel methods for solving the 2nd and 3rd that use a solver for the 1st as a building block. The code to reproduce our experiments is open-source and available online.

A Simple Method for Classifier Accuracy Prediction Under Prior Probability Shift

AUTHORS: Lorenzo Volpi, Alejandro Moreo, Fabrizio Sebastiani

WORK PACKAGE: WP 8 – UbiQuity

URL: A Simple Method for Classifier Accuracy Prediction Under Prior Probability Shift

Keywords: Classifier accuracy prediction, Prior probability shift, Label shift, Quantification

Abstract
The standard technique for predicting the accuracy that a classifier will have on unseen data (classifier accuracy prediction – CAP) is cross-validation (CV). However, CV relies on the assumption that the training data and the test data are sampled from the same distribution, an assumption that is often violated in many real-world scenarios. When such violations occur (i.e., in the presence of dataset shift), the estimates returned by CV are unreliable. In this paper we propose a CAP method specifically designed to address prior probability shift (PPS), an instance of dataset shift in which the training and test distributions are characterized by different class priors. By solving a system of n2 independent linear equations, with n the number of classes, our method estimates the n2 entries of the contingency table of the test data, and thus allows estimating any specific evaluation measure. Since a key step in this method involves predicting the class priors of the test data, we further observe a connection between our method and the field of “learning to quantify”. Our experiments show that, when combined with state-of-the-art quantification techniques, under PPS our method tends to outperform existing CAP methods.

The Questio de aqua et terra: A Computational Authorship Verification Study

AUTHORS: Martina Leocata, Alejandro Moreo, Fabrizio Sebastiani

WORK PACKAGE: WP 3 – T-Res

URL: The \textit{Questio de aqua et terra}: A Computational Authorship Verification Study

Keywords:

Abstract
The Questio de aqua et terra is a cosmological treatise traditionally attributed to Dante Alighieri. However, the authenticity of this text is controversial, due to discrepancies with Dante’s established works and to the absence of contemporary references. This study investigates the authenticity of the Questio via computational authorship verification (AV), a class of techniques which combine supervised machine learning and stylometry. We build a family of AV systems and assemble a corpus of 330 13th- and 14th-century Latin texts, which we use to comparatively evaluate the AV systems through leave-one-out cross-validation. Our best-performing system achieves high verification accuracy (F1=0.970) despite the heterogeneity of the corpus in terms of textual genre. The key contribution to the accuracy of this system is shown to come from Distributional Random Oversampling (DRO), a technique specially tailored to text classification which is here used for the first time in AV.
The application of the AV system to the Questio returns a highly confident prediction concerning its authenticity. These findings contribute to the debate on the authorship of the Questio, and highlight DRO’s potential in the application of AV to cultural heritage.

«Vi fui presente e vidi». Contributi per uno studio dei diari di pellegrinaggio a Roma tra genere letterario e documento storico

AUTHORS: Ilaria Sabbatini

WORK PACKAGE: WP 7 – REVER

URL: Pubblicazione | ILARIA SABBATINI | Università degli Studi di Palermo

Keywords: Pilgrimage, Diaries, Rome, Literature, History

Abstract
The document analyzes pilgrimage diaries to Rome, focusing on their value as both a literary genre and historical documents. By examining various texts, including medieval itineraries and travelers’ accounts, the author explores the unique characteristics of these diaries, highlighting how they combine elements of personal narrative with geographical and historical descriptions. The concept of pilgrimage is examined not only as a physical journey but also as a spiritual path, and the importance of religious destinations such as Rome and Jerusalem is discussed. The work raises questions about the nature and evolution of pilgrimage diaries, seeking to understand the motivations and experiences of pilgrims through the centuries.

ABBIE: Attention-Based BI-Encoders for Predicting Where to Split Compound Sanskrit Words

AUTHORS: Irfan Ali, Liliana Lo Presti, Igor Spanò, Marco La Cascia

WORK PACKAGE: WP 4 – DamySim

URL ABBIE: Attention-Based BI-Encoders for Predicting Where to Split Compound Sanskrit Words – ICAART 2025

Keywords: Word Segmentation, Sanskrit Language, Sandhi Rule, Bi-Encoders, Attention.

Abstract

Sanskrit is a highly composite language, morphologically and phonetically complex. One of the major challenges in processing Sanskrit is the splitting of compound words that are merged phonetically. Recognizing the exact location of splits in a compound word is difficult since several possible splits can be found, but only a few of them are semantically meaningful. This paper proposes a novel deep learning method that uses two bi-encoders and a multi-head attention module to predict the valid split location in Sanskrit compound words. The two bi-encoders process the input sequence in direct and reverse order respectively. The model learns the character-level context in which the splitting occurs by exploiting the correlation between the direct and reverse dynamics of the characters sequence. The results of the proposed model are compared with a state-of-the-art technique that adopts a bidirectional recurrent network to solve the same task. Experimental results show that the proposed model correctly identifies where the compound word should be split into its components in 89.27% of cases, outperforming the state-of-the-art technique. The paper also proposes a dataset developed from the repository of the Digital Corpus of Sanskrit (DCS) and the University of Hyderabad (UoH) corpus.

Benchmarking BERT-based Models for Latin: A Case Study on Biblical References in Ancient Christian Literature

AUTHORS: Davide Caffagni, Federico Cocchi, Anna Mambelli, Fabio Tutrone, Marco Zanella Marcella Cornia, Rita Cucchiara

WORK PACKAGE: WP 8 – UbiQuity

URL: Benchmarking BERT-based Models for Latin: A Case Study on Biblical References in Ancient Christian Literature

Keywords:

Abstract
Transformer-based language models like BERT have revolutionized Natural Language Processing (NLP) research, but their application to historical languages remains underexplored. This paper investigates the adaptation of BERT-based embedding models for Latin, a language central to the study of the sacred texts of Christianity. Focusing on Jerome’s Vulgate, pre-Vulgate Latin translations of the Bible, and patristic commentaries such as Augustine’s De Genesi ad litteram, we address the challenges posed by Latin’s complex syntax, specialized vocabulary, and historical variations at the orthographic, morphological, and semantic levels. In particular, we propose fine-tuning existing BERT-based embedding models on annotated Latin corpora, using self-generated hard negatives to improve performance in detecting biblical references in early Christian literature in Latin. Experimental results demonstrate the ability of BERT-based models to identify citations of and allusions to the Bible(s) in ancient Christian commentaries while highlighting the complexities and challenges of this field. By integrating NLP techniques with humanistic expertise, this work provides a case study on intertextual analysis in Latin patristic works. It underscores the transformative potential of interdisciplinary approaches, advancing computational tools for sacred text studies and bridging the gap between philology and computational analysis.

Data extraction from 3D scanning: post-processing filtering for analytic and informative models of small archaeological finds

AUTHORS: Filippo DIARA

URL: Data extraction from 3D scanning: post-processing filtering for analytic and informative models of small archaeological finds | Archeologia e Calcolatori

Work Package: WP 9 – Taurus

Abstract
Actual 3D scanners based on the structured-light principle are opening to possibilities for creating detailed models (polygon populations) with micrometric resolutions. Consequently, highly detailed models allow specific investigations. This work focuses on 3D scanning and post-processing analysis/filtering of Ancient Near East finds, especially seals and cuneiform clay tablets, fragile artefacts that can hold a lot of semantic information beyond transliteration: e.g. seal impressions (figurative and textual sealings), fingerprint evidence, retracing and erased text. Behind the ease of use of portable structured-light scanners, hides the enormous potential for feature extraction and processing. Metric analysis (e.g. deviation analysis) coupled with the application of MSII (Multi-Scale Integral Invariant) filter enhance data extraction, changing the overall perception on details of the archaeological artefact.

Medieval Sanctuaries and Miraculous Images and Relics: Tracing the Gaze through Eye Trackers

AUTHORS: Federico Ruozzi, Marco Papasidero,

WORK PACKAGE: WP 6 – YASMINE

URL: Medieval Sanctuaries and Miraculous Images and Relics: Tracing the Gaze through Eye Trackers

Keywords: Devotion, Gaze Studies, Sanctuaries

Abstract
This article is part of the research activities of the PNRR ITSERR project, which seeks to apply new digital technologies to religious studies. Specifically focusing on gaze studies, we utilised Aria eye trackers provided by Meta to the team of computer engineers at the University of Modena and Reggio Emilia (Italy), with whom this study is being carried out. These devices can record the gaze of users who wear them, as well as identify the objects or spatial elements being observed, the user’s location, and the duration of their focus. Adopting an interdisciplinary approach, the article explores the application of this technology to Catholic sacred spaces, specifically two sanctuaries in the Tuscan-Emilian Apennines: the Sanctuary of Our Lady of Bismantova (Reggio Emilia) and that of Saints Pellegrino and Bianco in Alpe (Modena). By observing and analysing the gaze patterns of ten users – varying in age, profession, and religious orientation – the study examines how individuals engage with these sacred contexts, with particular attention to the Marian image and the relics of the saints.

The Devil is in the Fine-Grained Details: Evaluating Open-Vocabulary Object Detectors for Fine-Grained Understanding

AUTHORS: Lorenzo Bianchi, Fabio Carrara, Nicola Messina, Claudio Gennaro, Fabrizio Falchi;

URL: https://openaccess.thecvf.com/content/CVPR2024/html/Bianchi_The_Devil_is_in_the_Fine-Grained_Details_Evaluating_Open-Vocabulary_Object_CVPR_2024_paper.html

Work Package : WP 5 – Digital Maktaba

Keywords: open-vocabulary detection, fine-grained understanding, benchmark

Abstract
Recent advancements in large vision-language models enabled visual object detection in open-vocabulary scenarios where object classes are defined in free-text formats during inference. In this paper we aim to probe the state-of-the-art methods for open-vocabulary object detection to determine to what extent they understand fine-grained properties of objects and their parts. To this end we introduce an evaluation protocol based on dynamic vocabulary generation to test whether models detect discern and assign the correct fine-grained description to objects in the presence of hard-negative classes. We contribute with a benchmark suite of increasing difficulty and probing different properties like color pattern and material. We further enhance our investigation by evaluating several state-of-the-art open-vocabulary object detectors using the proposed protocol and find that most existing solutions which shine in standard open-vocabulary benchmarks struggle to accurately capture and distinguish finer object details. We conclude the paper by highlighting the limitations of current methodologies and exploring promising research directions to overcome the discovered drawbacks. Data and code are available at https://lorebianchi98.github.io/FG-OVD .

Ubiquity. Il design della comunicazione nel progetto ITSERR

AUTHORS: Fabrizio D’Avenia, Cinzia Ferrara, Marcello Costa, Chiara Palillo

URL: http://www.societaitalianadesign.it/2024/10/29/design-per-la-diversita-2/

Work Package : WP 8 – UbiQuity

Keywords:

Abstract
Within the Italian Strengthening of ESFRI RI Resilience ITSERR project, Ubiquity is a research platform
developed for detecting literal and non-literal quotations of the Bible and the Quran in later exegetic Greek,
Latin and Arab commentaries. The objective of Ubiquity’s team, which is made up of humanists, computer
scientists and designers, is to study and visualize data of sacred texts and interact with them thanks to
visual components belonging to analogue and digital infographic systems. This widespread availability of
skills for designing material and immaterial artefacts could be a great support for religious studies and
scientific research.

μgat: Improving Single-Page Document Parsing by Providing Multi-Page Context

AUTHORS: Fabio Quattrini, Carmine Zaccagnino, Silvia Cascianelli, Laura Righi, Rita Cucchiara

URL: [2408.15646] μgat: Improving Single-Page Document Parsing by Providing Multi-Page Context

Work Package : WP 6 – YASMINE _ WP 7 – REVER

Keywords:

Abstract
Regesta are catalogs of summaries of other documents and, in some cases, are the only source of information about the content of such full-length documents. For this reason, they are of great interest to scholars in many social and humanities fields. In this work, we focus on Regesta Pontificum Romanum, a large collection of papal registers. Regesta are visually rich documents, where the layout is as important as the text content to convey the contained information through the structure, and are inherently multi-page documents. Among Digital Humanities techniques that can help scholars efficiently exploit regesta and other documental sources in the form of scanned documents, Document Parsing has emerged as a task to process document images and convert them into machine-readable structured representations, usually markup language. However, current models focus on scientific and business documents, and most of them consider only single-paged documents. To overcome this limitation, in this work, we propose {\mu}gat, an extension of the recently proposed Document parsing Nougat architecture, which can handle elements spanning over the single page limits. Specifically, we adapt Nougat to process a larger, multi-page context, consisting of the previous and the following page, while parsing the current page. Experimental results, both qualitative and quantitative, demonstrate the effectiveness of our proposed approach also in the case of the challenging Regesta Pontificum Romanorum.

Alfie: Democratising RGBA Image Generation With No $$$

AUTHORS: Fabio Quattrini, Vittorio Pippi Silvia Cascianelli Rita Cucchiara

URL: [2408.14826] Alfie: Democratising RGBA Image Generation With No $$$

Work Package : WP 6 – YASMINE

Keywords:

Abstract
Designs and artworks are ubiquitous across various creative fields, requiring graphic design skills and dedicated software to create compositions that include many graphical elements, such as logos, icons, symbols, and art scenes, which are integral to visual storytelling. Automating the generation of such visual elements improves graphic designers’ productivity, democratizes and innovates the creative industry, and helps generate more realistic synthetic data for related tasks. These illustration elements are mostly RGBA images with irregular shapes and cutouts, facilitating blending and scene composition. However, most image generation models are incapable of generating such images and achieving this capability requires expensive computational resources, specific training recipes, or post-processing solutions. In this work, we propose a fully-automated approach for obtaining RGBA illustrations by modifying the inference-time behavior of a pre-trained Diffusion Transformer model, exploiting the prompt-guided controllability and visual quality offered by such models with no additional computational cost. We force the generation of entire subjects without sharp croppings, whose background is easily removed for seamless integration into design projects or artistic scenes. We show with a user study that, in most cases, users prefer our solution over generating and then matting an image, and we show that our generated illustrations yield good results when used as inputs for composite scene generation pipelines. We release the code at this https URL.

Merging and Splitting Diffusion Paths for Semantically Coherent Panoramas

AUTHORS: Fabio Quattrini, Vittorio Pippi Silvia Cascianelli Rita Cucchiara

URL: https://link.springer.com/chapter/10.1007/978-3-031-72986-7_14

Work Package : WP 6 – YASMINE

Keywords: Image Generation Diffusion Models Text-to-Image

Abstract
Diffusion models have become the State-of-the-Art for text-to-image generation, and increasing research effort has been dedicated to adapting the inference process of pretrained diffusion models to achieve zero-shot capabilities. An example is the generation of panorama images, which has been tackled in recent works by combining independent diffusion paths over overlapping latent features, which is referred to as joint diffusion, obtaining perceptually aligned panoramas. However, these methods often yield semantically incoherent outputs and trade-off diversity for uniformity. To overcome this limitation, we propose the Merge-Attend-Diffuse operator, which can be plugged into different types of pretrained diffusion models used in a joint diffusion setting to improve the perceptual and semantical coherence of the generated panorama images. Specifically, we merge the diffusion paths, reprogramming self- and cross-attention to operate on the aggregated latent space. Extensive quantitative and qualitative experimental analysis, together with a user study, demonstrate that our method maintains compatibility with the input prompt and visual quality of the generated images while increasing their semantic coherence. We release the code at https://github.com/aimagelab/MAD.

Binarizing Documents by Leveraging both Space and Frequency

AUTHORS: Fabio Quattrini, Vittorio Pippi Silvia Cascianelli Rita Cucchiara

URL: https://dl.acm.org/doi/10.1007/978-3-031-70543-4_1

Work Package : All ITSERR WPs using Artificial Intelligence

Keywords: Document Enhancement Document Image Binarization Fast Fourier Convolution

Abstract
Document Image Binarization is a well-known problem in Document Analysis and Computer Vision, although it is far from being solved. One of the main challenges of this task is that documents generally exhibit degradations and acquisition artifacts that can greatly vary throughout the page. Nonetheless, even when dealing with a local patch of the document, taking into account the overall appearance of a wide portion of the page can ease the prediction by enriching it with semantic information on the ink and background conditions. In this respect, approaches able to model both local and global information have been proven suitable for this task. In particular, recent applications of Vision Transformer (ViT)-based models, able to model short and long-range dependencies via the attention mechanism, have demonstrated their superiority over standard Convolution-based models, which instead struggle to model global dependencies. In this work, we propose an alternative solution based on the recently introduced Fast Fourier Convolutions, which overcomes the limitation of standard convolutions in modeling global information while requiring fewer parameters than ViTs. We validate the effectiveness of our approach via extensive experimental analysis considering different types of degradations.

VATr++: Choose Your Words Wisely for Handwritten Text Generation

AUTHORS: Bram Vanherle Vittorio Pippi Silvia Cascianelli Nick Michiels, Frank Van Reeth, Rita Cucchiara

URL: https://ieeexplore.ieee.org/document/10716806

Work Package : All ITSERR WPs using Artificial Intelligence

Keywords: Handwritten text generation, handwritten text generation evaluation, synthetic data

Abstract
Styled Handwritten Text Generation (HTG) has received significant attention in recent years, propelled by the success of learning-based solutions employing GANs, Transformers, and, preliminarily, Diffusion Models. Despite this surge in interest, there remains a critical yet understudied aspect – the impact of the input, both visual and textual, on the HTG model training and its subsequent influence on performance. This work extends the VATr (Pippi et al. 2023) Styled-HTG approach by addressing the pre-processing and training issues that it faces, which are common to many HTG models. In particular, we propose generally applicable strategies for input preparation and training regularization that allow the model to achieve better performance and generalization capabilities. Moreover, in this work, we go beyond performance optimization and address a significant hurdle in HTG research – the lack of a standardized evaluation protocol. In particular, we propose a standardization of the evaluation protocol for HTG and conduct a comprehensive benchmarking of existing approaches. By doing so, we aim to establish a foundation for fair and meaningful comparisons between HTG strategies, fostering progress in the field.

Dreams, Texts, and Truths: Augustine on Hermeneutics and Oneirocriticism

AUTHORS: Fabio Tutrone

URL: http://hdl.handle.net/10447/667817

Work Package : WP 8 – UbiQuity

Keywords: Augustine, dreams, oneirocriticism, oneirology, Bible, biblical exegesis, allegory, Tertullian, Origen, Philo of Alexandria, Passio Perpetuae et Felicitatis, early Christian literature, Artemidorus of Daldis

Abstract
In the Greek and Roman worlds, oneirocriticism is hermeneutics and presupposes an epistemology – these and other cognate fields of inquiry being involved in a continuous process of social, political, and religious change. The present paper explores the relationship between dreams and hermeneutics in a meaningful passage of Augustine’s twelve-book commentary On the Literal Meaning of Genesis (De Genesi ad litteram) – a work rightly considered the most important testimony to the Christian cosmology of antiquity and the Middle Ages – in which the greatest of the Latin Church Fathers establishes a parallel between the interpretation of dreams and that of sacred texts. By elucidating the cultural background of Augustine’s understanding of dream images as cognitive phenomena that underlie both crucial passages of the Bible and the common experience of humans – both the soul and the body, both natural and supernatural powers – this paper sheds new light upon Augustine’s reaction to the materialism and literalism of Tertullian and early Christian communities, his reception of the allegorical method of Origen and the Alexandrian school, and his mystical embracing of Neoplatonic theories of knowledge. Indeed, Augustine turns out to be perfectly aware of many Greco-Roman and early Christian debates on oneirology and hermeneutical methods, and while he fiercely warns against the belief that the revelation of the Bible can be superseded or contradicted by the individual revelations of dreams, he strives to put together an original paradigm of natural philosophy, cognitive psychology, and symbolic interpretation, in an attempt to give dreams a definite place in the order of things.

Aëriae Animae: Souls and Elements from the Roman Cosmos to the Christian Afterworld

AUTHORS: Fabio Tutrone

URL: http://hdl.handle.net/10447/667820

Work Package : WP 8 – UbiQuity

Keywords: Augustine, early Christian psychology, soul, body, four elements, M. Terentius Varro, Antiochus of Ascalon, Middle Platonism, Neoplatonism, Bible, Stoicism, demonology, philosophy of nature, theology

Abstract
It has been widely recognized that until the fourth century AD Christians discussed freely about the source and the nature of the soul – the cases of Origen and Tertullian being emblematic of this situation in the East and in the West, respectively. It was only in the fourth century AD – after the so-called conversion of Constantine, with the Church’s increasing entanglement with political and social power and the emergence of a new generation of Platonizing intellectuals from the ranks of the upper class – that Christian bishops and theologians inaugurated a new discourse on the soul, its transcendent origin, immaterial constitution, and immortal destiny, which entailed the banishment and repression of earlier alternative visions. In the present paper, I shall be exploring an episode in this crucial historical transition, which, though limited in scope, can shed light upon the long-standing interactions between Greco-Roman theories of matter, elements, and principles, on the one hand, and Christian ideas of the soul and the afterworld, on the other. I am going to focus on the treatise On the City of God (De Civitate Dei) by Augustine of Hippo, who is usually regarded as one of the most decisive and influential figures in what can be called the Neoplatonic turn of fourth-century AD Christian eschatology. It is too often forgotten that throughout his long engagement with the issue of the nature and origin of the soul Augustine maintained an agnostic position, which is faithfully mirrored in all his writings. Indeed, I shall attempt to show that Augustine’s troubled reflection on the soul – on what he repeatedly terms as the ‘extremely obscure question of the soul’ (obscurissimam de anima quaestionem) – includes a meaningful dialogue with Book 16 of Varro’s Divine Antiquities (Antiquitates Rerum Divinarum) and its theory that the four elements of the cosmos host four different kinds of souls. I will investigate the philosophical pedigree of Varro’s cosmological-cum-psychological doctrine, with its recognizable mixture of Platonic and Stoic notions, arguing that Varro’s teacher, the Middle Platonist philosopher Antiochus of Ascalon, is its most likely source. However, far from restricting myself to an exercise in Quellenforschung, I shall claim that the Varronian theory reported in Book 7 of Augustine’s City of God should be read in light of Augustine’s sustained reception of the Platonic tradition in Book 8 of the same work, where the view that the body of demons is made up of air is endorsed by Augustine and attests to his serious pondering of the role of the natural elements in the emergence of a creature’s essence.

Revisiting Image Captioning Training Paradigm via Direct CLIP-based Optimization

AUTHORS: Nicholas Moratelli, Davide Caffagni, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara

WORK PACKAGE: WP 6 – YASMINE

URL: Revisiting Image Captioning Training Paradigm via Direct CLIP-based Optimization

Keywords:

Abstract
The conventional training approach for image captioning involves pre-training a network using teacher forcing and subsequent fine-tuning with Self-Critical Sequence Training to maximize hand-crafted captioning metrics. However, when attempting to optimize modern and higher-quality metrics like CLIP-Score and PAC-Score, this training method often encounters instability and fails to acquire the genuine descriptive capabilities needed to produce fluent and informative captions. In this paper, we propose a new training paradigm termed Direct CLIP-Based Optimization (DiCO). Our approach jointly learns and optimizes a reward model that is distilled from a learnable captioning evaluator with high human correlation. This is done by solving a weighted classification problem directly inside the captioner. At the same time, DiCO prevents divergence from the original model, ensuring that fluency is maintained. DiCO not only exhibits improved stability and enhanced quality in the generated captions but also aligns more closely with human preferences compared to existing methods, especially in modern metrics. Additionally, it maintains competitive performance in traditional metrics.

The Revolution of Multimodal Large Language Models: A Survey

AUTHORS: Davide Caffagni, Federico Cocchi, Nicholas Moratelli, Sara Sarto, Luca Barsellotti, Lorenzo Baraldi Lorenzo Baraldi, Marcella Cornia, Rita Cucchiara

WORK PACKAGE: WP 6 – YASMINE

URL: https://aclanthology.org/2024.findings-acl.807/

Keywords:

Abstract
Connecting text and visual modalities plays an essential role in generative intelligence. For this reason, inspired by the success of large language models, significant research efforts are being devoted to the development of Multimodal Large Language Models (MLLMs). These models can seamlessly integrate visual and textual modalities, while providing a dialogue-based interface and instruction-following capabilities. In this paper, we provide a comprehensive review of recent visual-based MLLMs, analyzing their architectural choices, multimodal alignment strategies, and training techniques. We also conduct a detailed analysis of these models across a wide range of tasks, including visual grounding, image generation and editing, visual understanding, and domain-specific applications. Additionally, we compile and describe training datasets and evaluation benchmarks, conducting comparisons among existing models in terms of performance and computational requirements. Overall, this survey offers a comprehensive overview of the current state of the art, laying the groundwork for future MLLMs.

Pixels of Faith: Exploiting Visual Saliency to Detect Religious Image Manipulation

AUTHORS: Giuseppe Cartella, Vittorio Cuculo, Marcella Cornia, Marco Papasidero, Federico Ruozzi, Rita Cucchiara

WORK PACKAGE: WP 6 – YASMINE

URL: 2024_ECCVW_Gaze_ITSERR.pdf

Keywords: Gaze-assisted AI Human Attention Deepfake Detection Religious Studies

Abstract
The proliferation of generative models has revolutionized various aspects of daily life, bringing both opportunities and challenges. This paper tackles a critical problem in the field of religious studies: the automatic detection of partially manipulated religious images. We address the discrepancy between human and algorithmic capabilities in identifying fake images, particularly those visually obvious to humans but challenging for current algorithms. Our study introduces a new testing dataset for religious imagery and incorporates human-derived saliency maps to guide deep learning models toward perceptually relevant regions for fake detection. Experiments demonstrate that integrating visual attention information into the training process significantly improves model performance, even with limited eye-tracking data. This human-in-the-loop approach represents a significant advancement in deepfake detection, particularly for preserving the integrity of religious and cultural content. This work contributes to the development of more robust and human-aligned deepfake detection systems, addressing critical challenges in the era of widespread generative AI technologies.

Isometric Sets of Words and Generalizations of the Fibonacci Cubes

AUTHORS: Marcella Anselmo Giuseppa Castiglione Manuela Flores, Dora Giammarresi,

Maria Madonia Sabrina Mantaci

WORK PACKAGE: WP 7 – REVER

URL https://link.springer.com/chapter/10.1007/978-3-031-64309-5_35

Keywords: Isometric sets of words Hamming distance Hypercubes Generalized Fibonacci Cubes

Abstract
The hypercube Q_n is a graph whose 2_ⁿ vertices can be associated to all binary words of length n in a way that adjacent vertices get words that differ only in one symbol. Given a word f, the subgraph Q_n(f) is defined by selecting all vertices not containing f as a factor. A word f is said to be isometric if Q_n(f) is an isometric subgraph of Q_n, i.e., keeping the distances between the remaining nodes. Graphs Q_n(f) were defined and studied as a generalization of Fibonacci cubes Q_n(11). Isometric words have been completely characterized using combinatorial methods for strings.

We introduce the notion of isometric sets of words with the aim of capturing further interesting cases in the scenario of isometric subgraphs of the hypercubes. We prove some combinatorial properties and study special interesting cases.

Density of Ham- and Lee- non-isometric k-ary Words

AUTHORS: Marcella Anselmo Manuela Flores Maria Serafina Madonia

WORK PACKAGE: WP 7 – REVER

URL https://ceur-ws.org/Vol-3587/3914.pdf

Keywords: Isometric words, Overlap with errors, Hamming and Lee distance, Density

Abstract
Isometric k-ary words have been defined referring to the Hamming and the Lee distances. A word is
non-isometric if and only if it has a prefix at distance 2 from the suffix of same length; such a prefix is
called 2-error overlap. The limit density of isometric binary words based on the Hamming distance has
been evaluated by Klavz ˇar and Shpectorov, obtaining that about 8% of all binary words are isometric. In
this paper, the issue is addressed for k-ary words and referring to the Hamming and the Lee distances.
Actually, the only meaningful case of Lee-isometric k-ary words is when k = 4. It is proved that, when the
length of words increases, the limit density of quaternary Ham-isometric words is around 17%, while the
limit density of quaternary Lee-isometric words is even bigger, it is about 30%. The results are obtained
using combinatorial methods and algorithms for counting the number of k-ary isometric words.

Using large language models to create narrative events

AUTHORS: Valentina Bartalesi, Emanuele Lenzi, Claudio De Martino

WORK PACKAGE:

URL https://peerj.com/articles/cs-2242/

Keywords:

ḥadīṯ collections

Abstract
Narratives play a crucial role in human communication, serving as a means to convey experiences, perspectives, and meanings across various domains. They are particularly significant in scientific communities, where narratives are often utilized to explain complex phenomena and share knowledge. This article explores the possibility of integrating large language models (LLMs) into a workflow that, exploiting the Semantic Web technologies, transforms raw textual data gathered by scientific communities into narratives. In particular, we focus on using LLMs to automatically create narrative events, maintaining the reliability of the generated texts. The study provides a conceptual definition of narrative events and evaluates the performance of different smaller LLMs compared to the requirements we identified. A key aspect of the experiment is the emphasis on maintaining the integrity of the original narratives in the LLM outputs, as experts often review texts produced by scientific communities to ensure their accuracy and reliability. We first perform an evaluation on a corpus of five narratives and then on a larger dataset comprising 124 narratives. LLaMA 2 is identified as the most suitable model for generating narrative events that closely align with the input texts, demonstrating its ability to generate high-quality narrative events. Prompt engineering techniques are then employed to enhance the performance of the selected model, leading to further improvements in the quality of the generated texts.

Sensitive Topics Retrieval in Digital Libraries: A Case Study of ḥadīṯ collections

AUTHORS: Giovanni Sullutrone Riccardo Amerigo Vigliermo Luca Sala, Sonia Bergamaschi

WORK PACKAGE: WP 5 – Digital Maktaba

URL https://link.springer.com/chapter/10.1007/978-3-031-72440-4_5

Keywords: Retrieval-Augmented Generation Bias Digital Libraries Sensitive Topics Islamic studies

ḥadīṯ collections

Abstract
The advent of Large Language Models (LLMs) has led to the development of new Question-Answering (QA) systems based on Retrieval-Augmented Generation (RAG) to incorporate query-specific knowledge at inference time. In this paper, the trustworthiness of RAG systems is investigated, particularly focusing on the performance of their retrieval phase when dealing with sensitive topics. This issue is particularly relevant as it could hinder a user’s ability to analyze sections of the available corpora, effectively biasing any following research. To mimic a specialised library possibly containing sensitive topics, a ḥādīṯ dataset has been curated using an ad-hoc framework called Question-Classify-Retrieve (QCR), which automatically assesses the performance of document retrieval by operating in three main steps: Question Generation, Passage Classification, and Passage Retrieval. Different sentence embedding models for document retrieval were tested showing significant performance gap between sensitive and non-sensitive topics compared to baseline. In real-world applications this would mean relevant documents placed lower in the retrieval list leading to the presence of irrelevant information or the absence of relevant one in case of a lower cut-off.

Text-to-SQL with Large Language Models: Exploring the Promise and Pitfalls

AUTHORS: Luca Sala, Giovanni Sullutrone, Sonia Bergamaschi

WORK PACKAGE: WP 5 – Digital Maktaba

URL https://ceur-ws.org/Vol-3741/paper65.pdf

Keywords: Large Language Models, Text-to-SQL, Relational Databases, SQL

Abstract
The emergence of Large Language Models (LLMs) represents a fundamental change in the ever-evolving
field of natural language processing (NLP). Over the past few years, the enhanced capabilities of these
models have led to their widespread use across various fields, in both practical applications and research
contexts. In particular, as data science intersects with LLMs, new research opportunities and insights
emerge, notably in translating text into Structured Query Language (Text-to-SQL). The application of
this technology to such task poses a unique set of opportunities and related issues that have significant
implications for information retrieval. This discussion paper delves into these intricacies and limitations,
focusing on challenges that jeopardise efficacy and reliability. This research investigates the scalability,
accuracy, and concerning issue of hallucinated responses, questioning the trustworthiness of LLMs.
Furthermore, we point out the limits of the current usage of test dataset created for research purposes
in capturing real-world complexities. Finally, we consider the performance of Text-to-SQL with LLMs
from different perspectives. Our investigation identifies the key challenges faced by LLMs and proposes
viable solutions to facilitate the exploitation of these models to advance data retrieval, bridging the gap
between academic researcher and real-world application scenarios.

Automatic Lemmatization of Old Church Slavonic Language Using A Novel Dictionary-Based Approach

AUTHORS: Usman Nawaz, Liliana Lo Presti, Marianna Napolitano, Marco La Cascia

WORK PACKAGE: WP 4 – DamySim

URL https://link.springer.com/chapter/10.1007/978-3-031-70442-0_25

Keywords: Old Church Slavonic Lemmatization Ancient Language Natural Language Processing

Abstract
Old Church Slavonic (OCS) is an ancient language, and it has unique challenges and hurdles in natural language processing. Currently, there is a lack of Python libraries devised for the analysis of OCS texts. This research is not just filling the crucial gap in the computational treatment of OCS language but also producing valuable resources for scholars in historical linguistics, cultural studies, and humanities for the development of further research in the field of ancient language processing. The main contribution of this research work is the development of an algorithm for the lemmatization of OCS texts based on a learned dictionary. The approach can deal with ancient languages without the need for prior linguistic knowledge. Preparing a dataset of more than 330K words of OCS and their corresponding lemmas, this approach integrates the algorithm and dictionary efficiently to achieve accurate lemmatization on test data.

Unveiling the Truth: Exploring Human Gaze Patterns in Fake Images

AUTHORS: Giuseppe Cartella, Vittorio Cuculo, Marcella Cornia, Rita Cucchiara

WORK PACKAGE: WP 6 – YASMINE

URL https://ieeexplore.ieee.org/document/10465604

Keywords: Deepfakes, gaze tracking, human in the loop, visual perception

Abstract
Creating high-quality and realistic images is now possible thanks to the impressive advancements in image generation. A description in natural language of your desired output is all you need to obtain breathtaking results. However, as the use of generative models grows, so do concerns about the propagation of malicious content and misinformation. Consequently, the research community is actively working on the development of novel fake detection techniques, primarily focusing on low-level features and possible fingerprints left by generative models during the image generation process. In a different vein, in our work, we leverage human semantic knowledge to investigate the possibility of being included in frameworks of fake image detection. To achieve this, we collect a novel dataset of partially manipulated images using diffusion models and conduct an eye-tracking experiment to record the eye movements of different observers while viewing real and fake stimuli. A preliminary statistical analysis is conducted to explore the distinctive patterns in how humans perceive genuine and altered images. Statistical findings reveal that, when perceiving counterfeit samples, humans tend to focus on more confined regions of the image, in contrast to the more dispersed observational pattern observed when viewing genuine images.

RoBERT2VecTM: ANovel Approach for Topic Extraction in IslamicStudies

AUTHORS: Sania Aftar Luca Gagliardelli Amina El Ganadi Federico Ruozzi Sonia Bergamaschi

WORK PACKAGE: WP 5 – DIGITAL MAKTABA

URL https://aclanthology.org/2024.findings-emnlp.534/

Keywords:

Abstract
Investigating “Hadith” texts, crucial for theological studies and Islamic jurisprudence, presents challenges due to the linguistic complexity of Arabic, such as its complex morphology. In this paper, we propose an innovative approach to address the challenges of topic modeling in Hadith studies by utilizing the Contextualized Topic Model (CTM). Our study introduces RoBERT2VecTM, a novel neural-based approach that combines the RoBERTa transformer model with Doc2Vec, specifically targeting the semantic analysis of “Matn” (the actual content). The methodology outperforms many traditional state-of-the-art NLP models by generating more coherent and diverse Arabic topics. The diversity of the generated topics allows for further categorization, deepening the understanding of discussed concepts. Notably, our research highlights the critical impact of lemmatization and stopwords in enhancing topic modeling. This breakthrough marks a significant stride in applying NLP to non-Latin languages and opens new avenues for the nuanced analysis of complex religious texts.

The Impact of Generative AI on Islamic Studies: Case Analysis of “Digital Muhammad ibn Isma’il Al-Bukharī”

AUTHORS: Amina El Ganadi Sania Aftar Luca Gagliardelli Sonia Bergamaschi Federico Ruozzi

WORK PACKAGE: WP 5 – DIGITAL MAKTABA

URL

https://ieeexplore.ieee.org/document/10852480

Keywords: Analytical models, Accuracy, Text analysis, Large language models, Collaboration, Training data, Chatbots, Reliability engineering, Prompt engineering, Artificial intelligence

Abstract

The emergence of large language models (LLMs) such as ChatGPT, LLaMA, Gemini, and Claude has transformed natural language processing (NLP) tasks by demonstrating remarkable capabilities in generating fluent and contextually appropriate responses. This paper examines the current state of LLMs, their applications, inherent challenges, and potential future directions necessitating multidisciplinary collaboration. A key focus is the application of generative AI in Islamic studies, particularly in managing sensitive content such as the Ahadith (corpus of sayings, actions, and approvals attributed to the Prophet Muḥammad). We detail the customization and refinement of the AI model, “Digital Muḥammad ibn Ismail Al-Bukhari,” designed to provide accurate responses based on the Sahih Al-Bukhari collection. Our methodology includes rigorous dataset curation, preprocessing, model customization, and evaluation to ensure the model’s reliability. Strategies to mitigate hallucinations involve implementing context-aware constraints, regular audits, and continuous feedback loops to maintain adherence to authoritative texts and correct biases. Findings indicate a significant reduction in hallucinations, though challenges such as residual biases and handling ambiguous queries persist. This research underscores the importance of recognizing LLMs’ limitations and highlights the need for collaborative efforts in fine-tuning these models with authoritative texts. It offers a framework for the cautious application of generative AI in Islamic studies, emphasizing continuous improvements to enhance AI reliability.

A Novel Methodology for Topic Identification in Hadith

AUTHORS: Sania Aftar Luca Gagliardelli Amina El Ganadi Federico Ruozzi Sonia Bergamaschi

WORK PACKAGE: WP 5 – DIGITAL MAKTABA

URL

https://ceur-ws.org/Vol-3643/paper12.pdf

Keywords: Topic Modeling, Hadith, Neural Topic Model

Abstract

In this paper, we present our preliminary work on developing a novel neural-based approach named
RoBERT2VecTM, aimed at identifying topics within the “Matn” of “Hadith”. This approach focuses on
semantic analysis, showing potential to outperform current state-of-the-art models. Despite the avail
ability of various models for topic identification, many struggle with multilingual datasets. Furthermore,
some models have limitations in discerning deep semantic meanings, not trained for languages such as
Arabic. Considering the sensitive nature of Hadith texts, where topics are often complexly interleaved,
careful handling is imperative. We anticipate that RoBERT2VecTM will offer substantial improvements
in understanding contextual relationships within texts, a crucial aspect for accurately identifying topics
in such intricate religious documents.

Trends, Applications, and Challenges in Human Attention Modelling

AUTHORS: Giuseppe Cartella, Marcella Cornia, Vittorio Cuculo, Alessandro D’Amelio, Dario Zanca, Giuseppe Boccignone, Rita Cucchiara

WORK PACKAGE: WP 6 – YASMINE

URL: https://www.ijcai.org/proceedings/2024/882

Keywords: Humans and AI: General, Humans and AI: HAI: Applications

Abstract
Human attention modelling has proven, in recent years, to be particularly useful not only for understanding the cognitive processes underlying visual exploration, but also for providing support to artificial intelligence models that aim to solve problems in various domains, including image and video processing, vision-and-language applications, and language modelling. This survey offers a reasoned overview of recent efforts to integrate human attention mechanisms into contemporary deep learning models and discusses future research directions and challenges. For a comprehensive overview of the ongoing research, refer to our dedicated repository available at https://github.com/aimagelab/awesome-human-visual-attention.

«Verranno giorni…» nel “Vangelo di Luca”: l’influenza di “Geremia LXX” sulle profezie di Gesù riguardanti la distruzione di Gerusalemme

AUTHORS: ANNA MAMBELLI;

WORK PACKAGE: WP 8 – uBIQUity

URL: https://www.rivisteweb.it/issn/1120-4001

Keywords: Gospel of Luke, Septuagint, Jeremiah, Lamentations, Destruction of Jerusalem, Prophetic, Language and Literature, Intertextuality

Abstract
This study investigates the prophecies of Jesus on the destruction of Jerusalem and the Temple as they appear in the Gospel of Luke (13:34–35, and especially 19:41–44; 21:5–6, 20–24; 23:28–31) in light of their intertextual relationship with passages or texts from Scripture. The analysis focuses on how certain terms or expressions of the prophetic language of Jeremiah, and to a lesser extent of Lamentations, are borrowed through the Septuagint version (e.g., ἡμέραι ἔρχονται), recombined, and modified by Luke. This research, however, is not only lexical and comparative but also enters the exegetical field. It explores the reasons for and meaning of the use of LXX Jeremiah in these particular passages of the Gospel of Luke, where Jesus himself is speaking in the midst of the impending catastrophe.

Intertestualità tra Bibbie e antichi commentari cristiani: l’esempio di simul nel De Genesi ad litteram di Agostino

AUTHORS: ANNA MAMBELLI; DAVIDE DAINESE

WORK PACKAGE: WP 8 – uBIQUity

URL: https://lexicon.cnr.it/ojs/index.php/LP/article/view/872

Keywords: Intertextuality; Biblical Quotations; Augustine; De Genesi ad litteram; Genesis (OT
Book); Patristic Exegesis

Abstract
This contribution presents a case study that, on the basis of some occurrences of the adverb simul in Augustine’s De Genesi ad litteram, allows us to illustrate the classification system we adopt to map the intertextual relationships between known Greek and Latin versions of the Bible and some patristic texts. This taxonomy has been set up within the framework of two research projects, joint together within European research infrastructure for Religious Studies “Resilience-RI”. After a methodological introduction based on the state of the art, the workflow will be explained and finally the concrete example of the adverb simul will be shown focusing on the use of some passages from Genesis 1 and Sirach 18:1 in Augustine’s commentary.

Digital Dark Ages: The Role of Medieval Corpora in the Context of the Digital Humanities and Religious Studies

AUTHORS: LAURA RIGHI

WORK PACKAGE: WP 7 – REVER

URL: https://www.rivisteweb.it/doi/10.17395/112876

KEYWORDS: Middle Ages, Digital Humanities, Religious Studies

Abstract
In recent years, the debate on the role and methodologies of the digital humanities has seen considerable development, including in the specific – but disciplinarily vast – domain of Religious Studies. Even if it is a recent debate, its premises are based on epistemological questions and assumptions whose history it’s important to outline. In this context, a great contribution could be provided by the research conducted on medieval textual corpora. Through the study of some cases, starting from Roberto Busas’ Index Thomisticus up to ongoing research projects, this contribution presents some trends and specificities of the analysis and publication of medieval sources in the digital environment. Aiming at discussing innovations and limits of this research field, and what can be its contribution to the ongoing debate on digital religious studies.

Moving beyond the Content: 3D Scanning and Post-Processing Analysis of the Cuneiform Tablets of the Turin Collection

AUTHORS: FILIPPO DIARA; FRANCESCO GIUSEPPE BARSACCHI; STEFANO DE MARTINO

URL: https://www.mdpi.com/2076-3417/14/11/4492

WORK PACKAGE: WP 9 – TAURUS

KEY WORDS: 3D scanning; cuneiform tablets; digital imaging; fingerprints; MSII; sealings

Abstract

This work and manuscript focus on how 3D scanning methodologies and post-processing analyses may help us to gain a deeper investigation of cuneiform tablets beyond the written content. The dataset proposed herein is a key part of the archaeological collection preserved in the Musei Reali of Turin in Italy; these archaeological artefacts enclose further important semantic information extractable through detailed 3D documentation and 3D model filtering. In fact, this scanning process is a fundamental tool for better reading of sealing impressions beneath the cuneiform text, as well as for understanding micrometric evidence of the fingerprints of scribes. Most of the seal impressions were made before the writing (like a watermark), and thus, they are not detectable to the naked eye due to cuneiform signs above them as well as the state of preservation. In this regard, 3D scanning and post-processing analysis could help in the analysis of these nearly invisible features impressed on tablets. For this reason, this work is also based on how 3D analyses may support the identification of the unperceived and almost invisible features concealed in clay tablets. Analysis of fingerprints and the depths of the signs can tell us about the worker’s strategies and the people beyond the artefacts. Three-dimensional models generated inside the Artec 3D ecosystem via Space Spider scanner and Artec Studio software were further investigated by applying specific filters and shaders. Digital light manipulation can reveal, through the dynamic displacement of light and shadows, particular details that can be deeply analysed with specific post-processing operations: for example, the MSII (multi-scale integral invariant) filter is a powerful tool exploited for revealing hidden and unperceived features such as fingerprints and sealing impressions (stratigraphically below cuneiform signs). Finally, the collected data will be handled twofold: in an open-access repository and through a common data environment (CDE) to aid in the data exchange process for project collaborators and common users.

Isometric Words and Edit Distance: Main Notions and New Variations

AUTHORS: G.Castiglione, M. Flores, D. Giammarresi

URL: https://link.springer.com/chapter/10.1007/978-3-031-42250-8_1

Work Package : Work Package 7 – REVER

Keywords: Isometric words, Edit distance, Generalized Fibonacci cubes

Abstract
Isometric words combine the notion of edit distance together with properties of words not appearing as factors in other words. An edit distance is a metric between words that quantifies how two words differ by counting the number of edit operations needed to transform one word into the other one. A word f is said isometric with respect to an edit distance if, for any pair of f-free words u and v, there exists a transformation of minimal length from u into v via the related edit operations such that all the intermediate words are also f-free. The adjective “isometric” comes from the fact that, if the Hamming distance is considered (i.e., only replacement operations are used), then isometric words are connected with the definitions of isometric subgraphs of hypercubes. We discuss known results and some interesting generalizations and open problems.

Hypercubes and IsometricWords Based on Swap and Mismatch Distance

AUTHORS: M. Anselmo, G.Castiglione, M. Flores, D. Giammarresi, M. Madonia, S. Mantaci

URL: https://link.springer.com/chapter/10.1007/978-3-031-34326-1_2

Work Package : Work Package 7 – REVER

Keywords: Swap and mismatch distance, Isometric words, Hypercube

Abstract
The hypercube of dimension n is the graph whose vertices are the 2ⁿbinary words of length n, and there is an edge between two of them if they have Hamming distance 1. We consider an edit distance based on swaps and mismatches, to which we refer as tilde-distance, and define the tilde-hypercube with edges linking words at tilde-distance 1. Then, we introduce and study some isometric subgraphs of the tilde-hypercube obtained by using special words called tilde-isometric words. The subgraphs keep only the vertices that avoid a given tilde-isometric word as a factor. An infinite family of tilde-isometric words is described; they are isometric with respect to the tilde-distance, but not to the Hamming distance. In the case of word 11, the subgraph is called tilde-Fibonacci cube, as a generalization of the classical Fibonacci cube. The tilde-hypercube and the tilde-Fibonacci cube can be recursively defined; the same holds for the number of their edges. This allows an asymptotic estimation of the number of edges in the tilde-Fibonacci cube, in comparison to the total number in the tilde-hypercube.

IsometricWords Based on Swap and Mismatch Distance

AUTHORS: M. Anselmo, G.Castiglione, M. Flores, D. Giammarresi, M. Madonia, S. Mantaci

URL: https://link.springer.com/chapter/10.1007/978-3-031-33264-7_3

Work Package : Work Package 7 – REVER

Keywords: Swap and mismatch distance, Isometric words, Overlap with errors

Abstract
An edit distance is a metric between words that quantifies how two words differ by counting the number of edit operations needed to transform one word into the other one. A word f is said isometric with respect to an edit distance if, for any pair of f-free words u and v, there exists a transformation of minimal length from u to v via the related edit operations such that all the intermediate words are also f-free. The adjective “isometric” comes from the fact that, if the Hamming distance is considered (i.e., only mismatches), then isometric words define some isometric subgraphs of hypercubes. We consider the case of edit distance with swap and mismatch. We compare it with the case of mismatch only and prove some properties of isometric words that are related to particular features of their overlaps.

Measuring fairness under unawareness of sensitive attributes: A quantification-based approach

AUTHORS: A. Fabris, A. Esuli, A. Moreo, F. Sebastiani

URL: https://doi.org/10.1613/jair.1.14033

Work Package : All ITSERR WPs using FAIR data

Keywords: Algorithms, Models, Decision Making, Group Fairness, Demographic Attributes, Data Minimisation, Privacy, Fairness Measurement, Sensitive Attributes, Quantification, Supervised Learning, Prevalence Estimates, Distribution Shifts, Demographic Parity, Classifier Fairness

Abstract
Algorithms and models are increasingly deployed to inform decisions about people, inevitably affecting their lives. As a consequence, those in charge of developing these models must carefully evaluate their impact on different groups of people and favour group fairness, that is, ensure that groups determined by sensitive demographic attributes, such as race or sex, are not treated unjustly. To achieve this goal, the availability (awareness) of these demographic attributes to those evaluating the impact of these models is fundamental. Unfortunately, collecting and storing these attributes is often in conflict with industry practices and legislation on data minimisation and privacy. For this reason, it can be hard to measure the group fairness of trained models, even from within the companies developing them. In this work, we tackle the problem of measuring group fairness under unawareness of sensitive attributes, by using techniques from quantification, a supervised learning task concerned with directly providing group-level prevalence estimates (rather than individual-level class labels). We show that quantification approaches are particularly suited to tackle the fairness-under-unawareness problem, as they are robust to inevitable distribution shifts while at the same time decoupling the (desirable) objective of measuring group fairness from the (undesirable) side effect of allowing the inference of sensitive attributes of individuals. More in detail, we show that fairness under unawareness can be cast as a quantification problem and solved with proven methods from the quantification literature. We show that these methods outperform previous approaches to measure demographic parity in five experimental protocols, corresponding to important challenges that complicate the estimation of classifier fairness under unawareness.

Volumetric Fast Fourier Convolution for Detecting Ink on the Carbonized Herculaneum Papyri

AUTHORS: Fabio Quattrini, R. Cucchiara, S. Cascianelli, V. Pippi

URL: https://openaccess.thecvf.com/content/ICCV2023W/e-Heritage/papers/Quattrini_Volumetric_Fast_Fourier_Convolution_for_Detecting_Ink_on_the_Carbonized_ICCVW_2023_paper.pdf

Work Package : All ITSERR WPs using Artificial Intelligence

Keywords: Digital Document Restoration, Virtual Unwrapping, Herculaneum Papyri, Ink Detection, Computer Vision, X-ray Micro-Computed Tomography, Artificial Intelligence, Volumetric Data, Fast Fourier Convolutions, Carbon-based Ink

Abstract
Recent advancements in Digital Document Restoration (DDR) have led to significant breakthroughs in analyzing highly damaged written artifacts. Among those, there has been an increasing interest in applying Artificial Intelligence techniques for virtually unwrapping and automatically detecting ink on the Herculaneum papyri collection. This collection consists of carbonized scrolls and fragments of documents, which have been digitized via X-ray tomography to allow the development of ad-hoc deep learningbased DDR solutions. In this work, we propose a modification of the Fast Fourier Convolution operator for volumetric data and apply it in a segmentation architecture for ink detection on the challenging Herculaneum papyri, demonstrating its suitability via deep experimental analysis. To encourage the research on this task and the application of the proposed operator to other tasks involving volumetric data, we will release our implementation (https://github.com/aimagelab/vffc).

How to Choose Pretrained Handwriting Recognition Models for Single Writer Fine-Tuning

AUTHORS: Vittorio Pippi Silvia Cascianelli Christopher Kermorvant Rita Cucchiara

URL: https://link.springer.com/chapter/10.1007/978-3-031-41679-8_19

Work Package : All ITSERR WPs using Artificial Intelligence

Keywords: Document synthesis, Historical document analysis, Handwriting recognition, Synthetic data

Abstract
Recent advancements in Deep Learning-based Handwritten Text Recognition (HTR) have led to models with remarkable performance on both modern and historical manuscripts in large benchmark datasets. Nonetheless, those models struggle to obtain the same performance when applied to manuscripts with peculiar characteristics, such as language, paper support, ink, and author handwriting. This issue is very relevant for valuable but small collections of documents preserved in historical archives, for which obtaining sufficient annotated training data is costly or, in some cases, unfeasible. To overcome this challenge, a possible solution is to pretrain HTR models on large datasets and then fine-tune them on small single-author collections. In this paper, we take into account large, real benchmark datasets and synthetic ones obtained with a styled Handwritten Text Generation model. Through extensive experimental analysis, also considering the amount of fine-tuning lines, we give a quantitative indication of the most relevant characteristics of such data for obtaining an HTR model able to effectively transcribe manuscripts in small collections with as little as five real fine-tuning lines.

Handwritten Text Generation from Visual Archetypes

AUTHORS: R. Cucchiara, S. Cascianelli, V. Pippi

URL: https://ceur-ws.org/Vol-3536/03_paper.pdf

Work Package : All ITSERR WPs using Artificial Intelligence

Keywords: HTG, Text Generation, Characters, Visual Archetypes, Transformer, Calligraphic, GANs, Encoding, Training, Synthetic

Abstract
Generating synthetic images of handwritten text in a writer-specific style is a challenging task, especially in the case of unseen styles and new words, and even more when these latter contain characters that are rarely encountered during training. While emulating a writer’s style has been recently addressed by generative models, the generalization towards rare characters has been disregarded. In this work, we devise a Transformer-based model for Few-Shot styled handwritten text generation and focus on obtaining a robust and informative representation of both the text and the style. In particular, we propose a novel representation of the textual content as a sequence of dense vectors obtained from images of symbols written as standard GNU Unifont glyphs, which can be considered their visual archetypes. This strategy is more suitable for generating characters that, despite having been seen rarely during training, possibly share visual details with the frequently observed ones. As for the style, we obtain a robust representation of unseen writers’ calligraphy by exploiting specific pre-training on a large synthetic dataset. Quantitative and qualitative results demonstrate the effectiveness of our proposal in generating words in unseen styles and with rare characters more faithfully than existing approaches relying on independent one-hot encodings of the characters.

Bridging Islamic Knowledge and AI: Inquiring ChatGPT on Possible Categorizations for an Islamic Digital Library (full paper)

AUTHORS: A. El Ganadi, R. A. Vigliermo, L. Sala, M. Vanzini, F. Ruozzi, F. Ruozzi, S. Bergamaschi

URL: https://ceur-ws.org/Vol-3536/03_paper.pdf

Work Package : WP5

Keywords: Libraries and Archives in CH, Digital Libraries and Religious Archives, ChatGPT, Islamic studies, Arabic script languages, Islamic knowledge classification, Islamic subjects

Abstract
This research evaluates the capabilities of ChatGPT in assisting with the categorization of an Islamic digital library exploiting incremental Machine Learning and Transfer Learning techniques. Noticeably, ChatGPT showcased a remarkable familiarity with Islamic knowledge, evident in its ability to classify subjects hierarchically based on their importance, from Qur’anic Studies to Modern Islamic Thought. The library aimed to cater to a diverse Arabic Islamic audience with collections sourced from varied digital donations. Despite ChatGPT’s commendable proficiency, challenges arose. In light of ChatGPT’s significant performance, several challenges arose, with interpretability, generalization, and the hallucination issue standing out as the most critical obstacles.

Knowledge extraction, management and long-term preservation of non-Latin cultural heritages-Digital Maktaba project presentation

AUTHORS: S. Bergamaschi, R. Martoglia, F. Ruozzi, R. A. Vigliermo, L. Sala, M. Vanzini

URL: https://ceur-ws.org/Vol-3365/short11.pdf

Work Package : WP5

Keywords: Cultural heritages, Non-Latin alphabets, Knowledge extraction, Machine Learning, Natural Language Processing, Big data management, Long-term preservation, Big data integration, Named Entity Recognition

Abstract
The services provided by today’s cutting-edge digital library systems may benefit from new technologies that can improve cataloguing efficiency and cultural heritages preservation and accessibility. Below, we introduce the recently started Digital Maktaba (DM) project, which suggests a new model for the knowledge extraction and semi-automatic cataloguing task in the context of digital libraries that contain documents in non-Latin scripts (e.g. Arabic). Since DM involves a large amount of unorganized data from several sources, particular emphasis will be placed on topics such as big data integration, big data analysis and long-term preservation. This project aims to create an innovative workflow for the automatic extraction of information and metadata and for a semi-automated cataloguing process by exploiting Machine Learning, Natural Language Processing, Artificial Intelligence and data management techniques to provide a system that is capable of speeding up, enhancing and supporting the librarian’s work. We also report on some promising results that we obtained through a preliminary proof of concept experimentation. (Short paper, discussion paper)

Knowledge Extraction and Cross-Language Data Integration in Digital Libraries

AUTHORS: L. Sala

URL: https://ceur-ws.org/Vol-3478/paper17.pdf

Work Package : WP5

Keywords: Data Integration, Cross-Language Record Linkage, Knowledge Extraction, Long-term Preservation

Abstract
Digital Humanities (DH) is an interdisciplinary field that has grown rapidly in recent years, requiring the creation of an efficient and uniform platform capable of managing various types of data in several languages. This paper presents the research objectives and methodologies of my PhD project: the creation of a novel framework for Knowledge Extraction and Multilingual Data Integration in the context of digital libraries in non-Latin languages, in particular Arabic, Persian and Azerbaijani. The research began with the Digital Maktaba (DM) project and continued within the PNRR ITSERR infrastructure, in which the DBGroup1 participates. The project aims to develop a two-component framework consisting of a Knowledge Extraction Subsystem and a Data Integration Subsystem. The case study is based on the DM project, which seeks to create a flexible and efficient digital library for preserving and analyzing multicultural heritage documents by exploiting the available and ad-hoc created datasets, Explainable Machine Learning , Natural Language Processing (NLP) technologies and Data Integration approaches. Key challenges and future developments in Knowledge Extraction and Data Integration are examined, which involve leveraging the MOMIS system for Data Integration tasks and adopting a microservices-based architecture for the effective implementation of the system. The goal is to provide a versatile platform for organizing and integrating various data sources and languages, thereby fostering a more inclusive and accessible global perspective on cultural and historical artefacts that encourage collaboration in building an expanding knowledge base.

A tool for semiautomatic cataloguing of an islamic digital library: a use case from the Digital Maktaba project (short paper)

AUTHORS: L. Sala, R. Martoglia, M. Vanzini, R. A. Vigliermo

URL: https://ceur-ws.org/Vol-3234/paper1.pdf

Work Package : WP5

Keywords: Cultural heritage, Digital Library, Islamic sciences, Arabic script OCR, Information extraction, Output alignment, Page layout analysis, Semiautomatic cataloguing, Software tool usage demo.

Abstract
Digital Maktaba (DM) is an interdisciplinary project to create a digital library of texts in non-Latin
alphabets (Arabic, Persian, Azerbaijani). The dataset is made available by the digital library heritage
of the ”La Pira” library in the history and doctrines of Islam based in Palermo, which is the hub of the
Foundation for Religious Sciences (FSCIRE, Bologna). Establishing protocols for the creation, maintenance
and cataloguing of historical content in non-Latin alphabets is the long-term goal of DM. The first step of
this project was to create an innovative workflow for automatic extraction of information and metadata
from title pages of Arabic script texts. The Optical Character Recognition (OCR) tool uses various
recognition systems, text processing techniques and corpora in order to provide accurate extraction and
metadata of document content. In this paper we address the ongoing development of this novel tool
and, for the first time, we present a demo of the current version that we have designed for the extraction
and cataloguing process by showing a use case on an Arabic book frontispiece. In particular, we delve
into the details of the tool workflow for automatically converting and uploading PDFs from the digital
library, for the automatic extraction of cataloguing metadata and the semiautomatic (at the current stage)
process of cataloguing. We also shortly discuss future prospects and the many additional features that
we are planning to develop.

Novel Perspectives for the Management of Multilingual and Multialphabetic Heritages through Automatic Knowledge Extraction: The DigitalMaktaba Approach

AUTHORS: S. Bergamaschi, R. Martoglia, F. Ruozzi, R. A. Vigliermo, L. Sala, M. Vanzini

URL: https://www.mdpi.com/1424-8220/22/11/3995

Work Package : WP5

Keywords: digital libraries; minority languages; humanistic informatics; computer archiving; intercultural communication

Abstract
The linguistic and social impact of multiculturalism can no longer be neglected in any sector, creating the urgent need of creating systems and procedures for managing and sharing cultural heritages in both supranational and multi-literate contexts. In order to achieve this goal, text sensing appears to be one of the most crucial research areas. The long-term objective of the DigitalMaktaba project, born from interdisciplinary collaboration between computer scientists, historians, librarians, engineers and linguists, is to establish procedures for the creation, management and cataloguing of archival heritage in non-Latin alphabets. In this paper, we discuss the currently ongoing design of an innovative workflow and tool in the area of text sensing, for the automatic extraction of knowledge and cataloguing of documents written in non-Latin languages (Arabic, Persian and Azerbaijani). The current prototype leverages different OCR, text processing and information extraction techniques in order to provide both a highly accurate extracted text and rich metadata content (including automatically identified cataloguing metadata), overcoming typical limitations of current state of the art approaches. The initial tests provide promising results. The paper includes a discussion of future steps (e.g., AI-based techniques further leveraging the extracted data/metadata and making the system learn from user feedback) and of the many foreseen advantages of this research, both from a technical and a broader cultural-preservation and sharing point of view.

Structured-Light Scanning and Metrological Analysis for Archaeology: Quality Assessment of Artec 3D Solutions for Cuneiform Tablets

AUTHORS: Filippo DIARA

URL: https://www.mdpi.com/2571-9408/6/9/317

Work Package: WP 9 – Taurus

Abstract
This paper deals with a metrological and qualitative evaluation of the Artec 3D structured-light scanners: Micro and Space Spider. As part of a larger European project called ITSERR, these scanners are tested to reconstruct small archaeological artefacts, in particular cuneiform tablets with different dimensions. For this reason, Micro and Space Spider are compared in terms of the entire workflow, from preparatory work to post-processing. In this context, three cuneiform replica tablets will serve as examples on which the Artec scanners will have to prove their worth. Metric analyses based on distance maps, RMSe calculations and density analyses will be carried out to understand metrological differences between these tools. The creation of 3D models of cuneiform tablets is the first step in developing a virtual environment suitable for sharing the archaeological collection with collaborators and other users. The inclusion of semantic information through specific ontologies will be the next step in this important project.

Preserving and conserving culture: first steps towards a knowledge extractor and cataloguer for multilingual and multi-alphabetic heritages

AUTHORS: S. Bergamaschi, R. Martoglia, F. Ruozzi, R. A. Vigliermo, L. Sala, M. Vanzini

URL: https://dl.acm.org/doi/abs/10.1145/3462203.3475927

Abstract
Managing and sharing cultural heritages also in supranational and multi-literate contexts is a very hot research topic. In this paper we discuss the research we are conducting in the DigitalMaktaba project, presenting the first steps for designing an innovative workflow and tool for the automatic extraction of knowledge from documents written in multiple non-Latin languages (Arabic, Persian and Azerbaijani languages). The tool leverages different OCR, text processing techniques and linguistic corpora in order to provide both a highly accurate extracted text and a rich metadata content, overcoming typical limitations of current state-of-the-art systems; this will enable in the near future the development of an automatic cataloguer which we hope will ultimately help in better preserving and conserving culture in such a demanding scenario.