Russian National Corpus (RNC) is one of the largest and highest-quality families of corpora for the Russian language. There are a large number of so-called subcorpora in the corpus — small databases dedicated to a specific area of language research (syntax, stress, etc.). One of these subcorpora is parallel corpus; it is itself divided into twenty Russian-foreign corpora.
You can find out about what parallel corpora are here.
Our corpus was created within the RNC project in 2016. Since 2019, it is available on two pages:
In 2020 and 2021, we received support from HSE University within three projects: firstly, for the enhancement of the corpus infrastructure, secondly, for the linguistic annotation of the Chinese texts, thirdly, for the development of the corpus-assisted language learning programs based on Corpus.
The volume of the Corpus is over 3.5 million words. It consists of more than 1 000 texts; the majority of the texts is the fictional Russian and Chinese literature of XIX-XXI centuries, news and official texts.
Today, the Corpus has a Russian, English and Chinese interfaces (on HSE corpus projects).
If you want to know the functions of our Corpus, please follow the instructions on the search page: click on the orange question icon at the top of the page.
To date, our project is the only parallel corpus being developed in Russia that has four crucial features at once:
We only know about one analogue of our project, which is currently being developed in Beijing.
Our project involves students, teachers and researchers of the following institutes:
Dozens of people work on the corpus. But we still have a huge number of unresolved tasks, for which there are not enough active and courageous participants. Therefore, if you are interested in our project, be sure to look at our vacancies!