This is an R package for converting Chinese characters into pinyin, four-corner codes, five-stroke codes, and more.
An brief introduction to pinyin can be found in Wikipedia:
Pinyin, or Hànyǔ Pīnyīn, is the official romanization system for Standard Chinese in mainland China, Malaysia, Singapore, and Taiwan. It is often used to teach Standard Chinese, which is normally written using Chinese characters. The system includes four diacritics denoting tones. Pinyin without tone marks is used to spell Chinese names and words in languages written with the Latin alphabet, and also in certain computer input methods to enter Chinese characters.
The pinyin system was developed in the 1950s by many linguists, including Zhou Youguang, based on earlier forms of romanization of Chinese. It was published by the Chinese government in 1958 and revised several times. The International Organization for Standardization (ISO) adopted pinyin as an international standard in 1982, followed by the United Nations in 1986. The system was adopted as the official standard in Taiwan in 2009, where it is used for romanization alone (in part to make areas more English-friendly) rather than for educational and computer-input purposes.
Since this package deals with Chinese characters, it is presumed that the users speak Chinese. Therefore I wrote the instruction in Chinese. In case that some users do not speak Chinese and want to use this package as well, please feel free to contact me via email, although the R codes in this document are self-explanatory.
这个 R 语言包粗暴地用拼音取名为 pinyin,作用是把汉字转换成拼音。从 v1.1.3 开始,增加了将汉字转换成四角号码或五笔字型的功能。从 v1.1.4 开始,用户可以指定自己的字典,随意转换。
{r, eval=FALSE} install.packages('pinyin') # or devtools::install_github("pzhaonet/pinyin")
安装时可能会出现一些关于 locale 的警告,净吓唬人,无视。
函数的用法当然可以看帮助信息就行了。可惜帮助信息里好像没法写中文,而一个处理中文的包的帮助信息和示例却写不了中文,十分遗憾。好在这里可以用中文解释一下。
pinyin 1.1.4 版包含 3 个主函数:
pydic()
用来载入内置的拼音字典(包括拼音,四角,五笔)。
如果内置字典不能满足用户需要,用户可以用load_dic()
来载入自定义字典。这里提供了四角,五笔 86、五笔 98 三个自定义字典。当然,用户可以自制字典,只需按上述几个字典的格式来制作即可。
py()
用来将指定字符通过查询所载入的字典来转换成对应的拼音、四角或五笔符号。
使用 pydic()
载入拼音字典时,可以选择以下参数:
method = 'quanpin'
),或method = 'tone'
) , 或method = 'toneless'
),only_first_letter = TRUE
),multi = FALSE
),dic = c("pinyin", "pinyin2")
)。使用py()
转换汉字时,
sep = '_'
),nonezh_replace = NULL
)还是转换成指定字符(如nonezh_replace = '-'
)。使用load_dic()
载入自定义字典时,目前有三个可用字典(欢迎提交新字典):
另外还有 3 个订制函数,是 py()
的延伸和示例:
file.rename2py()
用来对文件重命名,将文件名里的汉字按载入的字典转换。file2py()
用来将指定文件夹里的一个或多个文本文件里的汉字按载入的字典全部转换。bookdown2py()
是专门为 bookdown 包服务的,作用是为章节的中文标题自动添加个对应的字符 ID {#biaotipinyin}
,避免在生成网页文件时文件名里出现一大堆乱码,并且解决标题里中英文混合的问题。— 当然这事儿手动完全可以处理,只是手动处理的过程毫无乐趣可言罢了。pinyin()
. Support vector calculation.zh2py()
has been removed. Now the main function is pinyin()
. Submitted to CRAN!zh2py(multi = TRUE)
to display multiple procounciations of a Chinese character.file2py()
was created according to Dong’s comment.