BOUQUET: dataset, Benchmark and Open initiative for Universal Quality Evaluation in Translation

Feb 6, 2025·
The Omnilingual MT Team
,
Pierre Andrews
,
Mikel Artetxe
,
Mariano Coria Meglioli
,
Marta R. Costa-Jussà
,
Joe Chuang
,
David Dale
,
Cynthia Gao
,
Jean Maillard
,
Alex Mourachko
,
Christophe Ropers
,
Safiyyah Saleem
,
Eduardo Sánchez
,
Ioannis Tsiamas
,
Arina Turkatenko
,
Albert Ventayol-Boada
,
Shireen Yates
· 0 min read
Abstract
This paper introduces BOUQUET, a multi-way, multicentric, and multi-domain dataset and benchmark designed as a collaborative initiative for universal quality evaluation in translation. The dataset is handcrafted in 8 non-English languages, chosen for their potential as pivot languages to enable more accurate translations globally. BOUQUET is structured in paragraphs of varying lengths to move beyond sentence-level evaluation and is designed to represent a broader range of domains than existing datasets. Its simplified format makes it particularly suitable for crowd-sourced expansion, and we are launching a call to collect a multi-way parallel corpus covering any written language.
Type
Publication
arXiv