A Computer Simulation and Monitoring Method for Taiwanese Tone Sandhi

A Computer Simulation and Monitoring Method for Taiwanese Tone Sandhi
A Computer Simulation and Monitoring Method for Taiwanese Tone Sandhi
台語變調的電腦模擬與監測方法

Yu-chu Chang
Nanhua University, Taiwan
poirotdavid@yahoo.com.tw

Abstract

This research aims at using knowledge representation methods to simulate intra-sentence tone sandhi in Taiwanese. The simulation system includes a Taiwanese tone group parser and a simulation-monitoring interface that employs expert system techniques combined with linguistic knowledge and Taiwanese language experience to create a tone sandhi processing model. The significance of this study lies in three main aspects: first, it proposes the Tonal Derivation Strategy, which suggests that determining the tones of words within a sentence can also determine the boundaries of Taiwanese tone groups, and then verifies this through the implementation of the tone group parser. Second, it describes how to use default tone forms, default parts of speech (POS), and context mode marks to convert linguistic knowledge and experience into a knowledge base and illustrates the process of tone determination through the connection between the inference engine and the knowledge base. Finally, through the simulation monitoring procedure, debugging efficiency can be further optimized. This research applied the rule-based expert system to perform inside testing and then optimized the model. After ten supervised training, we finally performed outside testing on randomly selected articles and obtained an average tone sandhi accuracy of 96%. The simulation of Taiwanese tone sandhi is a relatively complex one-to-many mapping in AI application. Combining deep learning with the knowledge representation explored in this study can reduce data and computing requirements while improving accuracy. Rule-based methods, like Taiwanese tone sandhi parser, can complement AI models to enhance performance in the near future. In short, if AI can correctly process Taiwanese language, it will not be far from developing a machine that can simulate the human brain.

Keywords: Computer simulation, Taiwanese tone sandhi, Expert system, Tone group parser, Automatic tagging

Contents

1 Introduction
1.1 Statement of the Problem
1.2 Limitations of the Study
1.3 Significance of the Simulation of Taiwanese Tone Sandhi

2. Literature Review
2.1 The Nature of the Taiwanese Language: Phonological and Structural Perspectives
2.2 The Properties of Taiwanese Tone Sandhi and the Formation of Tone Groups
2.3 Rule-based Expert Systems
2.4 Knowledge Representation Methods and Examples of Linguistic Simulation
2.5 Deep Learning Methods and Modern Natural Language Processing
2.6 Current Research on Taiwanese Tone Sandhi and Text-to-speech System

3. The Development of Tonal Derivation Strategy
3.1 The Dilemma: Syntactic Analysis or Tone Form Analysis First?
3.2 Use of Medium Language
3.3 Tonal Derivation Strategy
3.4 From Tonal Derivation Strategy to Developing a Feasible Method

4. Methodology
4.1 Application of Knowledge Representation Method
4.2 Use Romanized Taiwanese as the written language
4.3 Frame-based Corpus
4.4 The Tagging of Markup Symbol and Rule Construction of Tone Sandhi Processor
4.5 The Configuration and Management of Program Memory
4.6 Capturing Tone Groups from Taiwanese Sentences
4.7 Designing a Tone Group Feedback Function
4.8 Monitoring Interface

5. Implementation of Taiwnese Tone Sandhi Simulation System
5.1 Coding Procedure
5.1.1 Overview of the Programming Workflow
5.1.2 Practical Coding Procedure
5.2 Construction of Tone Sandhi Processor
5.2.1 How to Build Tone Sandhi Rules
5.2.2 Rule Interference
5.2.3 Instantiation of the Rule Base
5.2.4 Boolean Verification
5.3 Recursive Feedback Mechanism of Tone Group
5.4 Monitor Interface for Processing Tracing and Debugging
5.5 Function of the Inference Engine
5.6 Application of Simulation Data
5.7 Output of the simulator
5.8 Evaluation of Tone Sandhi Accuracy Test
5.8.1 Testing Materials
5.8.2 Results of Tone Sandhi Accuracy Test
5.8.3 Connection Test of Tone Group Parser and Text-to-speech Software
5.9 Ultimate Solution to Tone Sandhi Error

6. Conclusions and Future Outlook
6.1 Conclusions
6.2 Future Outlook

References

摘要

本研究旨在應用知識表徵方法來模擬台語句內連讀變調。此模擬系統包括台語聲調群剖析器和模擬監控介面。採用專家系統技術結合語言知識和專業經驗來創建變調處理模型。這項研究的意義主要在於三個方面：首先，我們提出聲調推導策略，認為確定句內語詞的聲調調型就可以決定台語聲調群的分界，並以聲調群剖析器實作加以驗證。其次，描述如何使用預設聲調調型、預設詞類和語境模式標記，將語言知識和經驗轉化為知識庫，並說明經由推論引擎與知識庫的連結，完成語詞定調的運作過程。最後，經由模擬監控程序，進一步優化除錯效率。本研究應用基於規則的專家系統進行內部測試，然後優化模型。經過10次監督式訓練後，最終對於隨機選取的文章進行外部測試，平均變調正確率為96%。台語變調模擬在AI應用中是一個相對複雜的一對多映射。將深度學習與本研究所探討的知識表徵（符號人工智慧）相結合，就能減少數據和算力需求，同時提高正確率。基於規則的方法，例如台語聲調群剖析器，可以在不久的將來協助人工智慧模型提升效能。簡而言之，如果AI能夠正確處理台灣話，那麼距離開發出模擬人腦的機器就不遠了。

關鍵詞：電腦模擬、台語變調、專家系統、聲調群剖析器、自動標記

[ Works in Taiwanese | Taiwanese Speech Notepad | My Collections | Music | Home | Original Programs ]