Free Hero Mesh

codepage.doc at [50395b8093]
Login
This is a mirror of the main repository for Free Hero Mesh. New tickets and changes will not be accepted at this mirror.

File codepage.doc artifact d8f7c530cd part of check-in 50395b8093



=== Code page numbers ===

Code page numbers are assigned according to the following:

* Code page numbers are 16-bit numbers. Numbers 0, 65534, and 65535 are
not used; they are reserved for internal use of the software.

* IBM and Microsoft code page numbers can be used. (Mostly, only IBM
code page numbers should be used, but in some cases (described below)
Microsoft code page numbers might be applicable.)

* If IBM and Microsoft use the same code page number for a different
character set, or if they use different code page numbers for the same
character set/encoding, then IBM takes precedence. However, if Microsoft
uses the code page number for a strict superset of IBM's definition,
then Microsoft's definition will be used.

* Extensions of FreeDOS are used if they do not conflict with IBM and
are not already present as official IBM code pages.

* The areas designated as private use areas by IBM (57344-61439 and
65280-65533) may be defined in future by myself (or others, with my
approval). Some ranges may also later be reserved as user-defined or
application-specific areas.

Note that not all of the code page numbers according to the above are
actually used in Free Hero Mesh; Free Hero Mesh only uses ASCII-based
code pages with 8-bit characters (although the implementation may be
improved in future to allow multibyte code pages, but currently it
doesn't). The full set of code page numbers may still be useful for
other applications, though.


=== List of code page numbers ===

In the below list, "=" means that it is already included in the provided
codepage.har file, and "-" means it isn't included (it might or might not
be included in future).

   437 = default PC character set
   850 = PC-like Western Latin
   852 - PC-like Central European
   858 - PC-like Western Latin including Euro currency
   860 - PC-like Portuguese
   861 - PC-like Icelandic
   865 = PC-like Nordic
   866 = DOS Cyrillic Russian
  1041 - Katakana (Japanese)
  1125 = DOS Cyrillic Ukrainian/Belarusian
  1250 - Windows Central Europe
  1251 - Windows Cyrillic
  1252 - Windows Western
  1254 - Windows Turkish


=== File format ===

The codepage.har file is a Hamster archive containing one lump for each
code page, named by the code page number in decimal. Code page 437 need
not be included in this file; pcfont.h is used instead.

The data of the lump can be one of three formats:

* Full set: Consists of 2048 bytes.

* Half set: Consists of 1024 bytes, with only the high half of the
character set. The low half will be the same as the built-in font.

* Compressed set: Any length less than 2048 bytes except 1024 bytes. See
below section for the description of this format.

In all cases, character code 0 is not used; if it contains anything, it
will be erased after it is loaded.


=== Compressed set format ===

The first 32 bytes indicate which characters differ from the definitions
in pcfont.h. The first byte is for character codes 0-7, where the low bit
means character code 0.

For each character whose bit is set in the header, the first byte tells
which scanlines to be copied from the base character, the second byte
tells the base character to use, and the rest of the bytes are one per
scanline that is changed, with the data to be replaced with.

The second byte is omitted if no scanlines are copied from the base
character (the first byte is zero). If it is less than the current
character code, it is copied from the current code page; if equal or
greater than the current character code, it is copied from the default
code page (pcfont.h).