Saturday, April 26, 2025

UP mathematicians develop Baybayin AI translator

- Advertisement -

A groundbreaking advancement has emerged from the minds of Filipino mathematicians–a computerized solution that can convert entire paragraphs and even full documents written in the ancient Filipino Baybayin writing system into easily understandable text for non-native readers. This innovative team is now diligently working on creating a comprehensive two-way translator.

By merging the realms of mathematics and technology, scientists from the University of the Philippines — Diliman College of Science Institute of Mathematics (UPD-CS IM) have achieved a remarkable feat. They have developed what is likely the world’s first paragraph-level optical character recognition (OCR) system capable of differentiating between complete blocks of Baybayin and Latin characters within a text image.

In their paper titled “Block-level Optical Character Recognition System for Automatic Transliterations of Baybayin Texts Using Support Vector Machine,” masters student Rodney Pino, along with associate professors Dr. Renier Mendoza and Dr. Rachelle Sambayan, introduced an algorithm that transforms a photograph of a text into binary data. This data is then processed through a support vector machine (SVM) character classifier, which automatically determines whether the characters belong to the Baybayin or Latin script.

- Advertisement -

“SVM is a machine learning algorithm used to solve regression or classification problems. We have a dataset for Baybayin characters–let’s say character A and then character BA. SVM uses techniques or mathematical methods that can separate the two datasets to determine characters BA and A,” Pino explained.

The team invested over three months in amassing more than a thousand images for each Baybayin character, resulting in a total of 110 paragraphs sourced from various websites containing hand- or typewritten Baybayin, Latin, or mixed Baybayin and Latin writing. Pino clarified, “Adding more character images improves the recognition rate of SVM.”

Currently, the OCR system is capable of providing the Latin equivalent of Baybayin characters, thereby generating a transliterated version of the text. However, the researchers aspire to augment its capabilities significantly.

The mathematicians aim to enhance the OCR system’s contextual understanding of Baybayin words and phrases, potentially paving the way for a fully functional translator. Furthermore, they are endeavoring to enable the system to convert Latin words with foreign sounds into Baybayin.

An example of the translation process using the Baybayin OCR system 2

“We’re refining the software we developed to enhance user navigation. Additionally, we dream of creating a mobile application that can automatically and accurately translate Baybayin characters by simply hovering over the phone,” Dr. Mendoza stated.

Nevertheless, there are challenges that need to be addressed. Dr. Mendoza noted that accurately translating Baybayin words and sentences proved to be a formidable task. “Currently, the system cannot differentiate between some visually similar Baybayin characters, such as E and I, or O and U. We also encounter numerous words with different Latin equivalents,” he explained. “The algorithm we used presents all possible translations of the Baybayin words.”

Although the interest and research on Baybayin are still limited, the mathematicians are hopeful that their endeavors will stimulate more Filipinos to take part in preserving Baybayin through research. The team published their data to encourage further studies on Baybayin and OCR. “We meticulously cleaned the data to enable researchers to analyze Baybayin using alternative algorithms,” shared Dr. Mendoza. “We made the data readily available to alleviate the difficulties we faced in data collection.”

Baybayin, along with other Philippine traditional writing systems, serves as a representation of Filipino heritage and national identity. Consequently, the government officials introduced the “Philippine Indigenous and Traditional Writing Systems Act,” which aims to promote, protect, and preserve Baybayin and other traditional writing systems.

According to the scientists, Baybayin serves as a testament to the technical sophistication of Filipino traditions. While they do not propose making Baybayin the primary writing system of the Philippines, the team believes that further research on Baybayin will contribute to the preservation of this heritage. “This can be forgotten,” warned Dr. Sambayan. “It is crucial to have a record of each Baybayin character, even in digital form.”

Dr. Sambayan expressed concern over the declining number of Filipinos who can read and write Baybayin, underscoring the importance of identifying and translating Baybayin characters into the Latin script. “Through this OCR system, we hope to preserve and transmit the knowledge of understanding Baybayin to future generations of Filipinos,” she emphasized.

Baybayin, together with other traditional writing systems, constitutes an integral part of the Philippines’ rich history. Numerous ancient Filipino documents are written in Baybayin–a treasure trove of information about Filipino culture. The scientists actively encourage fellow Filipinos to join their efforts in expanding the knowledge base on Baybayin. “If no one does this work, who will? Although the implications might seem niche, I believe this is a crucial research undertaking,” Dr. Mendoza asserted. — with Eunice Jean Patron

Author

- Advertisement -

Share post: