About parallel corpora

Parallel corpora: what is it?

A parallel corpus is a type of a linguistic corpus, one of the main tools used by linguists in the XXI century. Like the main part of linguistic corpora, the parallel corpus is usually provided with the so-called metainformation (information about each text — by whom and when it was created, what volume it is, etc.), as well as linguistic annotation (each word is assigned its initial form, grammatical information, etc.).

A parallel corpus is a collection of texts and their translations to another language. An important element of a parallel corpus annotation is alignment: each sentence (or a paragraph) in language X corresponds to a sentence in language Y. Thanks to the alignment, the parallel corpus becomes a useful tool for several categories of users:

Here are the most well-known examples of parallel corpora: