Approaching the Chinese Word Segmentation Problem with CHR Grammars
Written Chinese text does not include separators between words, as do European languages using space characters, and this creates the Chinese Word Segmentation Problem: given a text in Chinese, divide it in a correct way into segments corresponding to words. Correctness means how a competent Chinese language user would do this. CHR Grammars (CHRG) is an implemented grammar system that allows highly flexible bottom-up analyses using rule-based constraint solving techniques. We demonstrate how different approaches to the problem can be expressed in CHRG in a highly concise way, and how different principles can complement each other in this paradigm. We do not claim to have provided any improvement with methods currently in use, our aims are a) to forward a way for further experimentation with solutions to the problem, and b) to show how CHRG gives rise to succinct and executable specifications of such methods. We present here some preliminary and promising experiments tested on simple examples.
