BOUQUET: dataset, Benchmark and Open initiative for Universal Quality Evaluation in Translation
Feb 6, 2025·
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
·
0 min read
The Omnilingual MT Team
Pierre Andrews
Mikel Artetxe
Mariano Coria Meglioli
Marta R. Costa-Jussà
Joe Chuang
David Dale
Cynthia Gao
Jean Maillard
Alex Mourachko
Christophe Ropers
Safiyyah Saleem
Eduardo Sánchez
Ioannis Tsiamas
Arina Turkatenko
Albert Ventayol-Boada
Shireen Yates

Abstract
This paper introduces BOUQUET, a multi-way, multicentric, and multi-domain dataset and benchmark designed as a collaborative initiative for universal quality evaluation in translation. The dataset is handcrafted in 8 non-English languages, chosen for their potential as pivot languages to enable more accurate translations globally. BOUQUET is structured in paragraphs of varying lengths to move beyond sentence-level evaluation and is designed to represent a broader range of domains than existing datasets. Its simplified format makes it particularly suitable for crowd-sourced expansion, and we are launching a call to collect a multi-way parallel corpus covering any written language.
Type
Publication
arXiv