『楼 主』:
UCDOS 5.0 曲线字库格式的文档
使用 LLM 解释/回答一下
UCDOS 5.0 曲线字库格式的文档
UCDOS,DOS下功能最强、兼容性最佳的外挂汉字系统,现在有人一直在寻找显示UCDOS 5.0曲线字库的格式,我通过在网上搜寻,终于找到了几篇此类文档,但算法的实现中发现存在几个错误,现在进行了更正,发到网上供大家参考。
说明:更正了英文字符地址计算公式,应是乘94,而不是乘100,原文已更正
UCDOS曲线轮廓字库结构分析
UCDOS 是国产的具有许多优异性能的汉字操作系统, 尤其是其丰富的PostScript 曲线轮廓字体给广大用户留下了深刻的印象。如何充分利用 UCDOS 已有的曲线字库, 如制作漂亮的软件界面, 作为自己系统的曲线小字库或向其它类型的字库转换等, 都有必要首先了解 UCDOS 曲线轮廓字库的结构。
1、曲线轮廊字库的结构及地址计算
UCDOS 中各种曲线轮廊字库都存放在 UCDOS 的特定的目录中(如 C:\UCDOS\FNT ) , 并用各自特殊的文件名加以区别, 如:
ASCPS 英文字体曲线轮廓字库
HZKPST 图形符号曲线轮廓字库
HZKPSKTJ 简体楷体曲线轮廓字库
HZKPSSTJ 简体宋体曲线轮廓字库
HZKPSHTJ 简体黑体曲线轮廓字库
HZKPSFSJ 简体仿宋曲线轮廓字库
另外, 还提供了其他三十余种扩充的简繁体曲线轮廓字库供用户选择。
每个曲线轮廓字库都两部分组成: 汉字索引区、字型数据区。汉字索引区由许多索引项组成。每个索引项由六个字节组成, 其中字型数据的偏移地址4 个字节, 字型数据的长度占2 个字节。由汉字的索引提供的字型数据偏移地址及字型数据的长度即可得到该汉字的字型数据。为了确定一个汉字在字库汉字索引区的偏移量, 必须通过汉字区位码或汉字机内码来获得。根据GB2312-80 的规定, 一般的汉字库都收集了7445 个汉字及非汉字的图形字符,其中汉字有6763 个, 非汉字的图形字符682 个。并且为了便于编码与管理, 将它们分为94个区, 每区分为94 个位。同时, 为了使汉字与英文相区别, 规定在计算机内, 汉字的编码用机内码来表示, 一个机内码占2 个字节。每个汉字或非汉字图形字符都唯一地对应一个区位码和机内码。汉字机内码与汉字区位码之间的转换公式如下:
机内码= 区位码+ 0xA0A0
由汉字机内码或区位码即可求得汉字在曲线轮廊字库索引区中的偏移量, 计算方法如下:
OFFSET=((机内码高位字节-0XA0-16)×94+(机内码低位字节-0XA0-1))×6
=((区码-16)×94+(位码-1))×6
例如: 汉字“啊”的区位码1601 (第16 区第1 位) , 可求得“啊”字对应的机内码在曲线轮廓字库索引区中的偏移量如下:
机内码= 区位码+0xA0A0=1601+0xA0A0=0x1001+0xA0A0=0xB0A1
偏移量OFFSET=((0XB0-0XA0-16)×94+(0XA1-0XA0-1))×6
或OFFSET =((16-16)×94+(1-1))×6
汉字的符号库同汉字库是分开的,16区以前的图形符号偏移:
offset=((机内码高位字节-OxaO -1)*94+(机内码低位字节-Oxa0-1))*6=((区码-1)×94+(位码-1))×6
而英文字体的偏移量比较特殊,它共有lO种字体,每种字体有1O0个英文字符,可以这样确定偏移量(设字体号为N ,英文字符的ASCII码为CC):
offset=(N*94-F(CC-32))*6
在UCDOS 5.0 中从此偏移量连续读取4 个字节转换成长整型数, 即为字型数据地址(Address ) , 紧接着2 个字节转换成整型数便为字型数据的长度(Length )。
在UCDOS 6.0 中从此偏移量地址连续读取4 个字节转换成长整型数后, 还需将此长整型数减去十六进制0x10000000 后才为字型数据地址, 紧接着2 个字节转换成整型数便为字型数据的长度(Length ),包括符号库。但英文字符不需要调整。
2、字型数据的格式
三次曲线字库的字形数据需要解释和重组。读取三次曲线字库的字形数据时每次只能读取四位,方法是当某一字节第一次被读取时, 读取低四位,当该字节再次被读取时, 读取高四位。当读取的数据不需要被当做坐标量时, 读取的四位数据就会被当做命令,并进行解释。如果当前命令需要坐标量, 坐标量分为绝对坐标值和坐标增量。绝对坐标值为一个字节, 为读取的两个四位数据的重组, 重组方法为先读取的四位数据作为高四位, 后读取的四位数据作为低四位。坐标增量为四位或六位的二进制数, 其中最高位为符号位。四位坐标增量为当前读取的四位数据。坐标增量为六位二进制数时同样是通过重组得到的, 方法是将第一次读取的四位数据作为六位数据的高四位, 第二次读取的四位数据的高两位作为六位数据的低两位, 得到第一个坐标增量,然后将第二次读取的四位数据的低两位作为六位数据的高两位, 第三次读取的四位数据作为六位数据的低四位, 得到第二个坐标增量。
3、控制字的含义
UCDOS 的控制字为4 位(半字节) , 共有16 种不同功能的控制字, 分别控制产生曲线轮廊字型的不同曲线段。具体含义如下:
表1 控制字的含义及功能
控制字 参数含义 长度 功能 当前点
0000 X1,Y1 16位 笔画起始点, 将当前点移至此点 X1,Y1
0001 X1 8位 从当前点画横线至X1点 X1,CY
0010 Y1 8位 从当前点画竖线至Y1点 CX,Y1
0011 X1,Y1 16位 从当前点画线至(X1,Y1)点 X1,Y1
0100 X1,Y1,X2,Y2 32位 以当前点、(X1,Y1)和(X2,Y2)为控制点画二次Bezier曲线 X2,Y2
0101 X1,Y1,X2,Y2,X3,Y3 48位 以当前点、(X1,Y1),(X2,Y2)和(X3,Y3)为控制点画三次Bezier曲线 X3,Y3
0110 X1,Y1,X2,Y2 32位 以(X1,Y1)为左上角,(X2,Y2)为右下点画矩形 不变
0111 #X1,Y1 12位 从当前点画线至(CX+#X1,Y1)点 CX+#X1,Y1
1000 X1,#Y1 12位 从当前点画线至(X1+CY+#Y1)点 X1+CY+#Y1
1001 #X1,#Y1 8位 从当前点画线至(CX+#X1,CY+#Y1)点 CX+#X1,CY+#Y1
1010 &X1,&Y1 12位 从当前点画线至(CX+&X1,CY+&Y1)点 CX+&X1,CY+&Y1
1011 #X1,#Y1,#X2,#Y2 16位 从当前点、(CX+#X1,CY+#Y1)和(CX+#X1+#X2,CY+#Y1+#Y2)为控制点画二次Bezier曲线 CX+#X1+#X2,CY+#Y1+#Y2
1100 &X1,&Y1,&X2,&Y2 24位 从当前点、(CX+&X1,CY+&Y1)和(CX+&X1+&X2,CY+&Y1+&Y2)为控制点画二次Bezier曲线 CX+&X1+&X2,CY+&Y1+&Y2
1101 #X1,#Y1,#X2,#Y2,#X3,#Y3 24位 从当前点、(CX+#X1,CY+#Y1),(CX+#X1+#X2,CY+#Y1+#Y2)和(CX+#X1+#X2+#X3,CY+#Y1+#Y2+#Y3)为控制点画三次Bezier曲线 CX+#X1+#X2+#X3,CY+#Y1+#Y2+#Y3
1110 &X1,&Y1,&X2,&Y2,&X3,&Y3 36位 从当前点、(CX+ &X1, CY + &Y1) , (CX+ &X1+&X2, CY + &Y1+ &Y2) 和(CX + &X1+ &X2+&X3, CY + &Y1+ &Y2+ &Y3) 为控制点画三次Bezier 曲线 CX+&X1+&X2+&X3,CY+&Y1+&Y2+&Y3
1111 X1,Y1 16位 仅读取两个绝对坐标, 不作其它操作 不变
说明:
①参数X1, Y1, X2, Y2, X3, Y3 均表示8 位长度的坐标值, 且为正数, 表示范围为0 至255;
②参数# X1,# Y1,# X2,# Y2,# X3,# Y3 均表示长度相对当前点的增量, 最高位是符号位,0 表示正数,1 表示负数, 表示范围为-7 至+7;
③参数&X1, &Y1, &X2, &Y2, &X3, &Y3 均表示6 位长度相对当前点的增量, 最高位是符号位,0 表示正数,1 表示负数, 表示范围为-31 至+31;
④CX, CY 均表示当前点的坐标值
### Document on the Format of the Curved Font Library in UCDOS 5.0
UCDOS, the most powerful and compatible external Chinese character system under DOS, now some people have been looking for the format of the curved font library displayed by UCDOS 5.0. I finally found several such documents through searching online, but found several errors in the implementation of the algorithm, and have now corrected them and posted them online for everyone's reference.
**Note:** The calculation formula for the address of English characters has been corrected. It should be multiplied by 94 instead of 100, and the original text has been corrected.
### Analysis of the Structure of UCDOS Curved Outline Font Library
UCDOS is a domestic Chinese character operating system with many excellent performances. In particular, its rich PostScript curved outline fonts have left a deep impression on users. To make full use of the existing curved font libraries in UCDOS, such as making beautiful software interfaces, using them as small curved font libraries for one's own system, or converting to other types of font libraries, it is necessary to first understand the structure of the UCDOS curved outline font library.
1. **Structure of Curved Outline Font Library and Address Calculation**
All kinds of curved outline font libraries in UCDOS are stored in specific directories of UCDOS (such as C:\UCDOS\FNT) and are distinguished by their own special file names, such as:
- ASCPS: English font curved outline font library
- HZKPST: Graphic symbol curved outline font library
- HZKPSKTJ: Simplified Kai body curved outline font library
- HZKPSSTJ: Simplified Song body curved outline font library
- HZKPSHTJ: Simplified Hei body curved outline font library
- HZKPSFSJ: Simplified Fang Song body curved outline font library
In addition, more than thirty other extended simplified and traditional Chinese curved outline font libraries are provided for users to choose from.
Each curved outline font library is composed of two parts: the Chinese character index area and the font data area. The Chinese character index area is composed of many index entries. Each index entry is composed of six bytes, among which the offset address of the font data is 4 bytes, and the length of the font data occupies 2 bytes. The font data of the Chinese character can be obtained from the offset address of the font data and the length of the font data provided by the Chinese character index. In order to determine the offset of a Chinese character in the Chinese character index area of the font library, it is necessary to obtain it through the Chinese character location code or the Chinese character internal code. According to the regulations of GB2312-80, generally, the Chinese character library collects 7445 Chinese characters and non-Chinese graphic characters, among which there are 6763 Chinese characters and 682 non-Chinese graphic characters. And in order to facilitate encoding and management, they are divided into 94 areas, each area is divided into 94 positions. At the same time, in order to distinguish Chinese characters from English, it is stipulated that in the computer, the encoding of Chinese characters is represented by the internal code, and one internal code occupies 2 bytes. Each Chinese character or non-Chinese graphic character corresponds to a unique location code and internal code. The conversion formula between the Chinese character internal code and the Chinese character location code is as follows:
Internal code = Location code + 0xA0A0
The offset of the Chinese character in the index area of the curved outline font library can be obtained from the Chinese character internal code or location code. The calculation method is as follows:
OFFSET = ((High byte of internal code - 0XA0 - 16) × 94 + (Low byte of internal code - 0XA0 - 1)) × 6
= ((Area code - 16) × 94 + (Position code - 1)) × 6
For example: The location code of the Chinese character "啊" is 1601 (the 16th area and the 1st position). The offset of the internal code of "啊" in the index area of the curved outline font library can be obtained as follows:
Internal code = Location code + 0xA0A0 = 1601 + 0xA0A0 = 0x1001 + 0xA0A0 = 0xB0A1
Offset OFFSET = ((0XB0 - 0XA0 - 16) × 94 + (0XA1 - 0XA0 - 1)) × 6
Or OFFSET = ((16 - 16) × 94 + (1 - 1)) × 6
The symbol library of Chinese characters is separate from the Chinese character library. The offset of graphic symbols before area 16 is:
offset = ((High byte of internal code - 0xa0 - 1) × 94 + (Low byte of internal code - 0xa0 - 1)) × 6 = ((Area code - 1) × 94 + (Position code - 1)) × 6
The offset of the English font is relatively special. There are 10 kinds of fonts, each font has 100 English characters. The offset can be determined in this way (let the font number be N and the ASCII code of the English character be CC):
offset = (N × 94 - F(CC - 32)) × 6
In UCDOS 5.0, read 4 consecutive bytes from this offset and convert it into a long integer, which is the font data address (Address). Then read 2 consecutive bytes and convert it into an integer, which is the length (Length) of the font data.
In UCDOS 6.0, after reading 4 consecutive bytes from this offset address and converting it into a long integer, it is necessary to subtract the hexadecimal 0x10000000 to get the font data address. Then read 2 consecutive bytes and convert it into an integer, which is the length (Length) of the font data, including the symbol library. But English characters do not need adjustment.
2. **Format of Font Data**
The glyph data of the cubic curve font library needs to be interpreted and recombined. When reading the glyph data of the cubic curve font library, only four bits are read each time. The method is that when a byte is read for the first time, the low four bits are read. When the byte is read again, the high four bits are read. When the read data does not need to be used as a coordinate quantity, the read four-bit data will be regarded as a command and interpreted. If the current command requires a coordinate quantity, the coordinate quantity is divided into absolute coordinate values and coordinate increments. The absolute coordinate value is one byte, which is recombined from the two read four-bit data. The recombination method is that the first read four-bit data is used as the high four bits, and the second read four-bit data is used as the low four bits. The coordinate increment is a four-bit or six-bit binary number, where the highest bit is the sign bit. The four-bit coordinate increment is the currently read four-bit data. When the coordinate increment is a six-bit binary number, it is also obtained by recombination. The method is to use the first read four-bit data as the high four bits of the six-bit data, and the high two bits of the second read four-bit data as the low two bits of the six-bit data to get the first coordinate increment. Then use the low two bits of the second read four-bit data as the high two bits of the six-bit data, and the third read four-bit data as the low four bits of the six-bit data to get the second coordinate increment.
3. **Meaning of Control Words**
The control words of UCDOS are 4 bits (half-byte), and there are 16 different functional control words, which respectively control different curve segments for generating curved outline glyphs. The specific meanings are as follows:
**Table 1: Meaning and Function of Control Words**
| Control Word | Parameter Meaning | Length | Function | Current Point |
|--------------|-------------------|--------|----------|---------------|
| 0000 | X1,Y1 | 16 bits| Start point of stroke, move current point to this point | X1,Y1 |
| 0001 | X1 | 8 bits | Draw a horizontal line from current point to X1 point | X1,CY |
| 0010 | Y1 | 8 bits | Draw a vertical line from current point to Y1 point | CX,Y1 |
| 0011 | X1,Y1 | 16 bits| Draw a line from current point to (X1,Y1) point | X1,Y1 |
| 0100 | X1,Y1,X2,Y2 | 32 bits| Draw a quadratic Bezier curve with current point, (X1,Y1) and (X2,Y2) as control points | X2,Y2 |
| 0101 | X1,Y1,X2,Y2,X3,Y3 | 48 bits| Draw a cubic Bezier curve with current point, (X1,Y1), (X2,Y2) and (X3,Y3) as control points | X3,Y3 |
| 0110 | X1,Y1,X2,Y2 | 32 bits| Draw a rectangle with (X1,Y1) as the upper left corner and (X2,Y2) as the lower right point | Unchanged |
| 0111 | #X1,Y1 | 12 bits| Draw a line from current point to (CX+#X1,Y1) point | CX+#X1,Y1 |
| 1000 | X1,#Y1 | 12 bits| Draw a line from current point to (X1+CY+#Y1) point | X1+CY+#Y1 |
| 1001 | #X1,#Y1 | 8 bits | Draw a line from current point to (CX+#X1,CY+#Y1) point | CX+#X1,CY+#Y1 |
| 1010 | &X1,&Y1 | 12 bits| Draw a line from current point to (CX+&X1,CY+&Y1) point | CX+&X1,CY+&Y1 |
| 1011 | #X1,#Y1,#X2,#Y2 | 16 bits| Draw a quadratic Bezier curve with current point, (CX+#X1,CY+#Y1) and (CX+#X1+#X2,CY+#Y1+#Y2) as control points | CX+#X1+#X2,CY+#Y1+#Y2 |
| 1100 | &X1,&Y1,&X2,&Y2 | 24 bits| Draw a quadratic Bezier curve with current point, (CX+&X1,CY+&Y1) and (CX+&X1+&X2,CY+&Y1+&Y2) as control points | CX+&X1+&X2,CY+&Y1+&Y2 |
| 1101 | #X1,#Y1,#X2,#Y2,#X3,#Y3 | 24 bits| Draw a cubic Bezier curve with current point, (CX+#X1,CY+#Y1), (CX+#X1+#X2,CY+#Y1+#Y2) and (CX+#X1+#X2+#X3,CY+#Y1+#Y2+#Y3) as control points | CX+#X1+#X2+#X3,CY+#Y1+#Y2+#Y3 |
| 1110 | &X1,&Y1,&X2,&Y2,&X3,&Y3 | 36 bits| Draw a cubic Bezier curve with current point, (CX+&X1, CY+&Y1), (CX+&X1+&X2, CY+&Y1+&Y2) and (CX+&X1+&X2+&X3, CY+&Y1+&Y2+&Y3) as control points | CX+&X1+&X2+&X3,CY+&Y1+&Y2+&Y3 |
| 1111 | X1,Y1 | 16 bits| Only read two absolute coordinates, no other operations | Unchanged |
**Notes:**
① Parameters X1, Y1, X2, Y2, X3, Y3 all represent 8-bit length coordinate values, and are positive numbers, representing the range from 0 to 255;
② Parameters #X1, #Y1, #X2, #Y2, #X3, #Y3 all represent increments relative to the current point, the highest bit is the sign bit, 0 means positive, 1 means negative, representing the range from -7 to +7;
③ Parameters &X1, &Y1, &X2, &Y2, &X3, &Y3 all represent 6-bit length increments relative to the current point, the highest bit is the sign bit, 0 means positive, 1 means negative, representing the range from -31 to +31;
④ CX, CY all represent the coordinate values of the current point
|