About the project

The activity on the development of sub-corpuses is aimed at the development of a new technological platform of the national corpus of the Kazakh language. Systematization and use of modern information becomes available in the conditions of digitalization. The main purposes of creating the corpus are to collect natural language resources, to normalize and systematize the language, and to provide results to a wide consumer for rational use.

"Development of the publicistic sub-corpus of the national corpus of the Kazakh language" is an important project that continues on the basis of the national project "Ulttyq Rukhani Zhangyru". The project was carried out by the National Scientific and Practical Center "Til-Qazyna" named after Shaysultan Shayakhmetov on the state target of the Language Policy Committee of the Ministry of Education and Science of the Republic of Kazakhstan. The project manager is Candidate of Philological Sciences, Associate Professor  N.Aitova. The head of IT support is M. Bakytkyzy. The project was attended by philologists, industry specialists from the following domestic universities and scientific organizations:

  • Baitursynov Institute of Linguistics;
  • Al-Farabi Kazakh National University;
  • N. Gumilyov Eurasian National University;
  • Kazakh National Women's Pedagogical University;
  • Aktobe Regional University named after K. Zhubanov;
  • Baishev University.
  • “Minialgo” LLP
  • “Qazkitap Publishing House” LLP

The sub-corpus of publicistic texts contains digitized republican printed materials with meta-markup and linguistic markup. In particular, the texts collected in the first year of the project were taken from the newspapers "Egemen Kazakhstan", "Ana tili", "Kazakh Adebieti", "Turkistan", "Zan", etc. In the future, it is planned to cover other internal genres of publicistic style, as well as increasing by publication periods and expanding by source names.

12-20 parametric meta-markups (author of the text, title of the text, text style, genre, text type, chronotope, source, publication date, etc.) have been compiled for a text exceeding 2 million words.

The sub-corpus database of publicistic texts is updated annually with content. Also, within the framework of the project, one of the types of sub-corpus will be included and offered annually. The volume of the text base of the project is planned to increase to 40 million words, which will be continued as "Five sub-corpus of the national corpus of the Kazakh language".

The national corpus is not only the base of all the language materials of the Kazakh language integrated into the system, but also a mechanism for expanding the functioning, semantic space of the state language in the virtual space, increasing of information dissemination, and mass access to language resources. This information and reference open system, reflecting the database of texts in the Kazakh language in the form of a digitized system, accumulates and provides the consumer with all kinds of styles of the literary language, language applications at a certain stage (or stages) of the existence of the national language.


  • Nurlykhan Nurullayevna Aitova – Candidate of Philological Sciences, Associate Professor
  • Bekzat Begalykyzy Dinayeva – Candidate of Philological Sciences, Associate Professor
  • Tynyshtyk Nurdauletovna Ermekova – Doctor of Philological Sciences, Professor
  • Orynai Sagyngalievna Zhubaeva – Doctor of Philological Sciences, Professor
  • Kuralai Tolegenkyzy Mukhamadi – Candidate of Philological Sciences, Associate Professor
  • Kulzat Kanievna Sadirova – Doctor of Philological Sciences, Professor
  • Sabira Minataevna Sapina – Candidate of Philological Sciences, Associate Professor
  • Dzhuldyzai Abdumanapova – Master of Arts
  • Ainur Tolybaevna Bayekeeva – doctoral student
  • Moldir Bakytkyzy –  Master of Philological Sciences
  • Almazhan Kadyrkhankyzy Kadyrkhan – Master of Arts
  • Akbota Kamash – Master of Philological Sciences, interpreter
  • Aigerim Kozhakhmet – doctoral student
  • Nurgul Kaldygulovna Kultanbayeva – doctoral student
  • Merey Yersayynuly Kupzhanov – doctoral student
  • Dana Ospanova – doctoral student
  • Aktoty Malikovna Khabasheva – Master of Arts
  • Aibanu Zhanibekovna Shamisheva – Master of Linguistics
  • Daniyar Yerbolovich Izbasarov – doctoral student


  • Anar Bekmyrzakyzy Salkynbay – Doctor of Philological Sciences, Professor
  • Elmira Nurlanovna Orazalieva – Doctor of Philological Sciences, Professor
  • Banu Zhantugankyzy Ergesh – PhD,
  • A.Baitursynuly Institute of Linguistics