Chinese CCGbank is a 750,000 word corpus of Chinese newswire and magazine text, annotated according to the Combinatory Categorial Grammar (CCG) grammar formalism. It was developed specifically for wide-coverage Chinese parsing, and models have been successfully trained for the Berkeley parser, the C&C parser and ZPAR.

Chinese CCGbank is the result of an automatic corpus conversion from the Penn Chinese Treebank. Because I do not have a license to distribute derivative works of the Penn Chinese Treebank, you will need a copy of the Penn Chinese Treebank in order to generate Chinese CCGbank.

The code is available to researchers as a GitHub repository.

To get help with generating a copy of Chinese CCGbank, please contact me at moc.hcnuprevo@knabgccnc.

© 2013 Daniel Tse