Call for Applications RESILIENCE: Transnational Access Fellowships 2025-2026

RESILIENCE, the European cross-disciplinary research infrastructure serving the study of religion, launches its fifth call for applications for Transnational Access Fellowships. The call will be open from March 13 to May 1, 2025. Scholars from all career levels can apply for a short research visit to the host of their choice and on the conditions of that host.

The RESILIENCE TNA program aims to provide scholars with direct, fast, and effective access to research materials. Host institutions grant access to manuscripts, rare books, and documents, while also offering a unique match-making service that connects researchers with experts in their field.

TNA users can visit institutions in a country other than their own, benefiting from free access to collections, tailored institutional support, and guidance from local scholars. This enables faster resource access and more efficient research.

A typical TNA visit lasts two weeks, allowing researchers to meet experts, curators, conservators, and restorers, etc.

Benefits for TNA Users @ITSERR

The five ITSERR consortium partners, TNA hosts at RESILIENCE, offer attractive conditions for a TNA fellowship, among which coverage of cost for flights and accommodation, with full board or half board. Read more about ITSERR’s offering here.

Go here to read the fifth call for applications.

Beyond human imagination: The art of creating prompt-driven3D scenes with Generative AI

AUTHORS: Giulio Federico, Fabio Carrara, Giuseppe Amato, Marco Di Benedetto

WORK PACKAGE: WP 10 – Retina

URL: Pageflex Server [document: D-VTT-2555FF3A_00001]

Keywords: Generative AI, Computer Graphics, Denoising Diffusion Probabilistic Model, Gaussian
Splatting, NeRF, Signed Distance Field, Video Reconstruction, Deep Learning, Machine
Learning, Artificial Intelligence, Text-to-3D, Image-to-3D, Urban Environment, Score
Distillation Sampling

Abstract
The reconstruction of large-scale real outdoor environments is crucial for promoting the adoption
of Extended Reality (XR) in industrial and entertainment sectors. This task often requires significant
resources such as depth cameras, LiDAR sensors, drones, and others, alongside traditional data
processing pipelines like Structure-from-Motion (SfM), which demand extensive computational
resources, thus preventing real-time processing. Additional constraints arise from the limited
accessibility to the aforementioned resources. While 3D laser scanners (e.g., LiDAR) are precise and fast,
they are expensive, often bulky
especially the high-quality models
and their effectiveness is
contingent on the type of environment being scanned. Depth sensors offer a more affordable and
compact alternative; however, due to their limited range, they are ideal only for indoor settings.
Photogrammetry, while capable of producing high-quality results at a lower cost, can be time
consuming and computationally intensive. It also suffers from limited accuracy, strong dependence on
lighting conditions, and the need for numerous photos from various angles that can be not always easily
accessible. (…)

Spatio-Temporal 3D Reconstruction from Frame Sequences and Feature Points

AUTHORS: Giulio Federico, Fabio Carrara, Giuseppe Amato, Marco Di Benedetto

WORK PACKAGE: WP 10 – Retina

URL: https://dl.acm.org/doi/10.1145/3672406.3672415

Keywords:

Abstract
Reconstructing a large real environment is a fundamental task to promote eXtended Reality adoption in industrial and entertainment fields. However, the short range of depth cameras, the sparsity of LiDAR sensors, and the huge computational cost of Structure-from-Motion pipelines prevent scene replication in near real time. To overcome these limitations, we introduce a spatio-temporal diffusion neural architecture, a generative AI technique that fuses temporal information (i.e., a short temporally-ordered list of color photographs, like sparse frames of a video stream) with an approximate spatial resemblance of the explored environment. Our aim is to modify an existing 3D diffusion neural model to produce a Signed Distance Field volume from which a 3D mesh representation can be extracted. Our results show that the hallucination approach of diffusion models is an effective methodology where a fast reconstruction is a crucial target.

Optimizing LLMs for Italian: Reducing Token Fertility and Enhancing Efficiency Through Vocabulary Adaptation

AUTHORS: Luca Moroni, Giovanni Puccetti, Pere-Lluís Huguet Cabot, Andrei Stefan Bejgu, Alessio Miaschi, Edoardo Barba, Felice Dell’Orletta, Andrea Esuli, Roberto Navigli

WORK PACKAGE: WP 8 – UbiQuity

URL:

Keywords:

Abstract
An increasing number of pretrained Large Language Models (LLMs) are being released, though the majority are predominantly designed for English. While they can often handle other languages due to contamination or some degree of multilingual pretraining data, English-centric LLMs are not optimized for non-English languages. This leads to inefficient encoding (high token ‘fertility’) and slower inference times for those languages. In this work, we explore various vocabulary adaptation techniques to tailor English LLMs for the Italian language. We introduce Semantic Alignment Vocabulary Adaptation (SAVA), a novel method that learns neural mapping to accomplish vocabulary substitution, which achieve state-of-the-art performances on several downstream tasks. We adapted two LLMs: Mistral-7b-v0.1, reducing token fertility by 25%, and Llama-3.1-8b, optimizing the vocabulary and reducing the number of parameters by 1 billion. We show that, after the adaptation of the vocabulary, these models can recover their performances with a relatively limited stage of continual training on the target language. Finally, we test the adapted models’ capabilities on several multi-choice and generative tasks.

Wordnet and Word Ladders: Climbing the abstraction taxonomy with LLMs

AUTHORS: Giovanni Puccetti, Andrea Esuli, Marianna Bolognesi

WORK PACKAGE: WP 8 – UbiQuity

URL: https://github.com/unipv-larl/GWC2025/releases/download/papers/GWC2025_paper_18.pdf

Keywords:

Abstract
WordNet has long served as a benchmark for approximating the mechanisms of semantic categorization in the human mind, particularly through its hierarchical structure of word synsets, most notably the IS-A relation. How ever, these semantic relations have traditionally been curated manually by expert lexicographers, relying on external resources like dictionaries and corpora. In this paper, we explore
whether large language models (LLMs) can be leveraged to approximate these hierarchical semantic relations, potentially offering a scalable and more dynamic alternative for maintaining and updating the WordNet taxonomy.
This investigation addresses the feasibility and implications of automating this process with LLMs by testing a set of prompts encoding different sociodemographic traits and finds that adding age and job information to the prompt affects the model ability to generate text in agreement with hierarchical semantic relations while gender does not have a statistically significant impact.

The Invalsi Benchmarks: measuring the Linguistic and Mathematical understanding of Large Language Models in Italian

AUTHORS: Giovanni Puccetti, Maria Cassese, Andrea Esuli

WORK PACKAGE: Wp 8 – UbiQuity

URL: https://aclanthology.org/2025.coling-main.453/

Keywords:

Abstract
While Italian is a high-resource language, there are few Italian-native benchmarks to evaluate generative Large Language Models (LLMs) in this language. This work presents three new benchmarks: Invalsi MATE to evaluate models performance on mathematical understanding in Italian, Invalsi ITA to evaluate language under standing in Italian and Olimpiadi MATE for more complex mathematical understanding. The first two benchmarks are based on the Invalsi tests, which are administered to students of age between 6 and 18 within the Italian school system and have been validated by several experts in teaching and pedagogy, the third one comes from the Italian highschool math Olympics. We evaluate 10 powerful language models on these benchmarks and we find that they are bound by 71% accuracy on Invalsi MATE, achieved by Llama 3.1 70b instruct and by 88% on Invalsi ITA. For both Invalsi MATE and Invalsi ITA we compare LLMs with the average performance of Italian students to show that Llama 3.1 is the only one to outperform them on Invalsi MATE while most models do so on Invalsi ITA, we then show that Olimpiadi MATE is more challenging than Invalsi MATE and the highest accuracy, achieved by Llama 3.1 405b instruct accuracy is 45%.

ABRICOT – ABstRactness and Inclusiveness in COntexT: A CALAMITA Challenge

AUTHORS: Giovanni Puccetti, Claudia Collacciani, Andrea Amelio Ravelli, Andrea Esuli, Marianna Bolognesi

WORK PACKAGE:

URL: ABRICOT peach – ABstRactness and Inclusiveness in COntexT: A CALAMITA Challenge

Keywords: Abstraction, Inclusiveness, Context, LLM evaluation, Italian Language Models

Abstract
The ABRICOT Task is designed to evaluate Italian language models on their ability to understand and assess the abstractness and inclusiveness of language, two nuanced features that humans naturally convey in everyday communication. Unlike binary categorizations such as abstract/concrete or inclusive/exclusive, these features exist on a continuous spectrum with varying degrees of intensity. The task is based on a manual collection of sentences that present the same noun phrase (NP) in different contexts, allowing its interpretation to vary between the extremes of abstractness and inclusiveness. This challenge aims to verify the how LLMs perceive subtle linguistic variations and their implications in natural language.

INVALSI – Mathematical and Language Understanding in Italian: A CALAMITA Challenge

AUTHORS: Giovanni Puccetti, Maria Cassese, Andrea Esuli

WORK PACKAGE:

URL: INVALSI – Mathematical and Language Understanding in Italian: A CALAMITA Challenge

Keywords: Mathematical Understanding, Language Understanding, Invalsi, Large Language Models, Italian Language Models

Abstract
While Italian is a high resource language, there are few Italian-native benchmarks to evaluate Language Models (LMs) generative abilities in this language. This work presents two new benchmarks: Invalsi MATE to evaluate models performance on mathematical understanding in Italian and Invalsi ITA to evaluate language understanding in Italian.
These benchmarks are based on the Invalsi tests, which are administered to students of age between 6 and 18 within the Italian school system. These tests are prepared by expert pedagogists and have the explicit goal of testing average students’ performance over time across Italy. Therefore, the questions are well written, appropriate for the age of the students, and are developed with the goal of assessing students’ skills that are essential in the learning process, ensuring that the benchmark proposed here measures key knowledge for undergraduate students.
Invalsi MATE is composed of 420 questions about mathematical understanding, these questions range from simple money counting problems to Cartesian geometry questions, e.g. determining if a point belongs to a given line. They are divided into 4 different types: scelta multipla (multiple choice), vero/falso (true/false), numero (number), completa frase (fill the gap).
Invalsi ITA is composed of 1279 questions regarding language understanding, these questions involve both the ability to extract information and answer questions about a text passage as well as questions about grammatical knowledge. They are divided into 4 different types: scelta multipla (multiple choice), binaria (binary), domanda aperta (open question), altro (other).
We evaluate 4 powerful language models both English-first and tuned for Italian to see that best accuracy on Invalsi MATE is 55% while best accuracy on Invalsi ITA is 80%.

AI ‘News’ Content Farms Are Easy to Make and Hard to Detect: A Case Study in Italian

AUTHORS: Giovanni Puccetti, Anna Rogers, Chiara Alzetta, Felice Dell’Orletta, Andrea Esuli

WORK PACKAGE:

URL: AI ‘News’ Content Farms Are Easy to Make and Hard to Detect: A Case Study in Italian – ACL Anthology

Keywords:

Abstract
Large Language Models (LLMs) are increasingly used as ‘content farm’ models (CFMs), to generate synthetic text that could pass for real news articles. This is already happening even for languages that do not have high-quality monolingual LLMs. We show that fine-tuning Llama (v1), mostly trained on English, on as little as 40K Italian news articles, is sufficient for producing news-like texts that native speakers of Italian struggle to identify as synthetic.We investigate three LLMs and three methods of detecting synthetic texts (log-likelihood, DetectGPT, and supervised classification), finding that they all perform better than human raters, but they are all impractical in the real world (requiring either access to token likelihood information or a large dataset of CFM texts). We also explore the possibility of creating a proxy CFM: an LLM fine-tuned on a similar dataset to one used by the real ‘content farm’. We find that even a small amount of fine-tuning data suffices for creating a successful detector, but we need to know which base LLM is used, which is a major challenge.Our results suggest that there are currently no practical methods for detecting synthetic news-like texts ‘in the wild’, while generating them is too easy. We highlight the urgency of more NLP research on this problem.

Is CLIP the main roadblock for fine-grained open-world perception?

AUTHORS: Lorenzo Bianchi, Fabio Carrara, Nicola Messina, Fabrizio Falchi

WORK PACKAGE:

URL: Is CLIP the main roadblock for fine-grained open-world perception?

Keywords: fine-grained understanding, open-vocabulary object detection, image-text matching, evaluation study

Abstract
Modern applications increasingly demand flexible computer vision models that adapt to novel concepts not encountered during training. This necessity is pivotal in emerging domains like extended reality, robotics, and autonomous driving, which require the ability to respond to open-world stimuli. A key ingredient is the ability to identify objects based on free-form textual queries defined at inference time – a task known as open-vocabulary object detection. Multimodal backbones like CLIP are the main enabling technology for current open-world perception solutions. Despite performing well on generic queries, recent studies highlighted limitations on the fine-grained recognition capabilities in open-vocabulary settings – i.e., for distinguishing subtle object features like color, shape, and material. In this paper, we perform a detailed examination of these openvocabulary object recognition limitations to find the root cause. We evaluate the performance of CLIP, the most commonly used vision-language backbone, against a fine-grained objectmatching benchmark, revealing interesting analogies between the limitations of open-vocabulary object detectors and their backbones. Experiments suggest that the lack of fine-grained understanding is caused by the poor separability of object characteristics in the CLIP latent space. Therefore, we try to understand whether fine-grained knowledge is present in CLIP embeddings but not exploited at inference time due, for example, to the unsuitability of the cosine similarity matching function, which may discard important object characteristics. Our preliminary experiments show that simple CLIP latent-space re-projections help separate fine-grained concepts, paving the way towards the development of backbones inherently able to process fine-grained details. The code for reproducing these experiments is available at https://github.com/lorebianchi98/FG-CLIP.