Overview | A text translation dataset with Hokkien, English, and Mandarin. |
Size | 202 sentences |
COLUMNS | |
English | ★ Source Text optimized for Machine Translation with sentences specifically written to cover diverse general domain concepts (embeddings), and exhibit diverse grammatical features. |
Hokkien (Traditional Chinese Script) |
★ High Quality Human Translations, using the prestigious Taiwanese Hokkien variant of Hokkien. |
Hokkien (POJ Latin Script) |
★ High Quality Human Translations in Pe̍h-ōe-jī. Written in the spoken register (i.e. Text-to-Speech compatible). |
Mandarin (Traditional Chinese Script) |
Google Neural Machine Translation from English texts. |
This dataset is available for creating commercial and non-commercial applications, such as Machine Translation Models and Educational apps, without requiring attribution. Buyers and external parties are also permitted to publish derivative datasets which contain ids from the "id" column - no other original data can be published in derivative datasets. This dataset can not be resold or published as is.
id | English | Hokkien (Traditional Chinese) | Hokkien (POJ Latin) | Mandarin (Traditional Chinese) |
---|---|---|---|---|
D50 | The intricate puzzle kept him entertained for hours on end. | 彼个精細的拼圖佮伊耗幾若點鐘的時間。 | Hiān-khak ê bí-á hō͘ i pīn-chò lâu-lâu bô-khùn. | 這個複雜的謎題讓他連續幾個小時都樂此不疲。 |
D100 | Is it ethical for companies to use personal data to influence consumer behavior? | 公司使用個人資料來影響消費者行為,這樣做合乎倫理嗎? | Kompaniyānnā vyaktigat mahiti vaprun grahakānchī vārtanuk prabhavit karṇe nāitik āhe kā? | 公司利用個人資料來影響消費者行為是否合乎道德? |
D150 | My grandmother's stories are filled with wisdom and humor. | 我阿嬤的故事充滿智慧和幽默。 | Góa ah-má ê sū-koa uì tshù-hok kap siāu-phìng. | 我祖母的故事充滿智慧和幽默。 |
… | … | … | … | … |
Names * | Hokkien, Min Nan, Southern Min, Taiwanese, Taigi, Banlam, Quanzhang, 閩南語, 咱儂話, 福建話, 臺灣話, 書語 (* Though Hokkien is technically part of the Southern Min (aka Min Nan) group of languages, the Hokkien language is also sometimes referred to as "Southern Min". This is because Hokkien is the most widely spoken Southern Min language.) |
Population | 40-50 million |
Regions | Primarily South-East Asia (Singapore, Taiwan, Malaysia, Philippines, Indonesia, Cambodia, Myanmar, Hong Kong, Thailand, Brunei, Vietnam, China) |
BCP 47 IETF Language Code | zh-hkm |
Glottolog Code | hokk1242 |
ISO 639-3 | nan |
Highest Resource Variant | Taiwanese Hokkien |
Written Scripts | Traditional Chinese Script, Simplified Chinese Script, Mixed Script (Hanlo), Latin Script (POJ or Tailo) |
Recommended Translator Script | Traditional Chinese Script (See reasoning) |
Recommended User-Facing Script | Latin Script (POJ or Tailo) (See reasoning) |