The texts are sentences from the Europarl parallel corpus (Koehn, 2005). The textscontain the monolingual sentences from parallel corpora for the following
pairs: Bulgarian-English, Czech-English, Portuguese-English and Spanish-
English. The English corpus is comprised by the English side of the Spanish-
Basque is not in Europarl. In addition, it contains the Basque and English
sides of the GNOME corpus (Tiedemann, 2012).
The texts have been automatically annotated with NLP tools, including Word
Sense Disambiguation, Named Entity Disambiguation and Coreference