旺文社国語辞典第十一版

我记录了我到目前为止转换所用到的所有正则表达式(除了给惯用语添加词头的部分,忘了写了):
I’ve writed all the regex I’ve used for the transform (except the part adding headwords to idioms):

正则
1.添加词头

<1F09><(000[12348])>(<1F41><0160><1FE2><0001>)(.*)<1FE3>(.*)<1F61><1F0A>
替换:</>\n\3\n<span class="\1">\3<kana>\4</kana></span>

2.将字母开头条目(文件末尾)单独提取出来后:

(^.*<1FE2><0001>)(.*?)(<1FE3>)
替换:\2\n\1\2\3

(</>\n)(.*?)<....>
多次替换:\1\2

然后放回原文件末尾

3.调整词头:
(</>\n)(.*?)<1F0E>.<1F0F>
多次替换:\1\2

(</>\n)(.*?)<B947>(.+?)<B948>
(</>\n)(.*?)<1F04>(.+?)<1F05>
(</>\n)(.*?)<B249>(.+?)<B24A>
分别多次替换:\1\2\3

随后逐一排查:</>\n.*<
替换外字:
<B226>	蔲
<B227>	𧘱
<B533>	麬
<B77B>	靠
<B666>	廴
<B66C>	蟒
<B324>	鰧
<B33E>	鱲
<B73A>	赳
<B73F>	響
<B741>	鞏
<B650>	靭
<B73D>	馭
<B752>	倥
<B372>	蠁

4.替换外字:

外字序号(替换到外字字体):
<B15B>	<gaiji>砿</gaiji>
<B15C>	<gaiji>鋼</gaiji>
<B15D>	<gaiji>閤</gaiji>
<B15E>	<gaiji>降</gaiji>

用方框括上的字:
<B232>	<rect>春</rect>
<B233>	<rect>夏</rect>
<B234>	<rect>秋</rect>
<B235>	<rect>冬</rect>
<B236>	<rect>新年</rect>
<B237>	<rect>文</rect>
<B238>	<rect>他</rect>
<B239>	<rect>自</rect>
<B23A>	<rect>可能</rect>
<B23B>	<rect>ちがい</rect>

其它符号:
<B248>	⟷

4.替换标签:
^<1F09><(....)>
替换:<span class="\1">
<1F0A>
替换:</span>
<1F0E>
替换:<sup>
<1F0F>
替换:</sup>

(">)(.*?)<1F04> <1F05>
替换:\1<order>\2</order>

(">)(.+?)<1F04> ((.*?)<1F05>
替换:\1<order>\2</order>\3

<1F04>((*?)<1F05>
替换:\1

<1F04>()*?)<1F05>
替换:\1

<B647>(.+?)<B648>
替换:<furigana>\1</furigana>

<B93A>(.+?)<B93B>
替换:<foreign>\1</foreign>

<1FE0><0000>(.+?)<1FE1>
替换:<b>\1</b>

<1FE0><0001>(.+?)<1FE1>
替换:<i>\1</i>

<B946>(.+?)<B948>
替换:<formula>\1</formula>

<B938>(.+?)<B939>
替换:<katsuyou>\1</katsuyou>

">(.+?)<1F04> <1F05>
替换:"><order>\1</order>

<1F42>(.+?)(<.*?)1F62>
替换:<a href="entry://\1">\1\2/a>

<1FE2><0001>(.*?)<1FE3>
替换:<ttl>\1</ttl>

<1F04>
替换:<xmp>
<1F05>
替换:</xmp>

6.继续调整词头,加词头索引,链接CSS:
(</>\n.*?)‐
重复替换:\1

(</>\n.*?)・(.*?【)
替换:\1\2

(</>\n[^【]+?)・([^【]+?\n)
替换:\1\2

</>\n(.*)(【)(.*)(】)
替换:</>\n\1\n@@@LINK=\1\2\3\4\n</>\n\3\n@@@LINK=\1\2\3\4\n</>\n\1\2\3\4

(</>\n)([^【]+?)・([^【]+?)(\n@@@.*\n)
替换:\1\2\4\1\3\4

(</>\n)([^【]+?)〔(.+)〕([^【]+?)(\n@@@.*\n)
替换:\1\2\4\5\1\2\3\4\5

</>\n(.*)(【)(.*)(】)\n(@@@.*)\n
替换:</>\n\1\n\5\n</>\n\3\n\5\n

(</>\n.*\n)<s
替换:\1<link rel="stylesheet" href="OBS.css" media="all"/></head><body>\n<s

n>(\n</>)
替换:n></body>\1

7.替换属性名称(属性名称是数字时无法对其应用css样式)
"0001	"title
"0002	"title 和歌
"0003	"title 汉字
"0004	"title 专有
"0005	"title sub 专有
"0006	"title sub 标题
"0007	"title sub 项目
"0008	"title 五十音
"9999	"编号
"0010	"kanji
"0011	"kanji 专有
"0012	"kanji sub 专有
"0013	"kanji sub
"0014	"kanji 单字人名
"0015	"kanji 单字
"0016	"kanji 单字非常用
"0017	"kanji 旧字
"0018	"kanji 单字人名2
"0019	"kanji 单字外字
"0020	"kanji_level
"0021	"pronunc 音读 1
"0022	"pronunc 音读 2
"0023	"pronunc 训读 1
"0024	"pronunc 训读 2
"0025	"foreign
"0026	"foreign sub
"0027	"foreign 专有
"0028	"historical_kana
"0029	"historical_kana 专有
"0030	"historical_kana 单字
"0031	"historical_kana sub
"0032	"historical_kana sub 专有
"0033	"modern_kana
"0034	"modern_kana sub
"0100	"exp 0
"0101	"exp 1
"0102	"exp 2
"0103	"exp 3
"0104	"exam
"0105	"反义
"0106	"季语
"0107	"参见
"0108	"变形
"0109	"惯用
"0110	"用法
"0111	"语源
"0112	"参考
"0113	"历史
"0114	"类语
"0115	"外来
"0116	"复合
"0117	"难读
"0118	"人名
"0119	"和歌
"0120	"和歌作者
"0121	"和歌释义
"0122	"区别表现
"0123	"区分标题
"0124	"区分内容
"0125	"表现内容
"0126	"中心义
"0127	"exp 五十音
"0128	"使用区分
"0129	"表格
"0200	"笔顺
"0201	"图像
"1001	"title 英文缩写
"1002	"foreign 全文
"1101	"exp 英文缩写
"1102	"exam 英文缩写
"9999	"编号
<B641>	<rect><red>使い分け</red></rect>
<B63F>	<rect><red>ちがい</red></rect>
<B63C>	<rect><red>表現</red></rect>

8.其他修正:
修正表格结构:
(<span class="表格">.*?)</span>
替换:\1
(^<span class="[^表].*\n)<span class="表格">
替换:\1<table>
(^<span class="表格">.*?)(\n<span class="[^表])
替换:\1</table>\2
^<span class="表格">
替换为空白
7 个赞