Free Hero Mesh

codepage.doc at [5c59b19d96]
Login
This is a mirror of the main repository for Free Hero Mesh. New tickets and changes will not be accepted at this mirror.

File codepage.doc artifact 1ddaf3a23f part of check-in 5c59b19d96



=== Code page numbers ===

Code page numbers are assigned according to the following:

* Code page numbers are 16-bit numbers. Numbers 0, 65534, and 65535 are
not used; they are reserved for internal use of the software. (TODO:
Maybe this should be changed to 23-bit code page numbers that I will
maintain a list of. The below will still apply for code page numbers
that fit in 16-bits, though.)

* Mainly, IBM code page numbers are used.

* If there is something from Microsoft or FreeDOS that does not already
have a IBM code page number and does not conflict with IBM assignments,
then they may also be used.

* If a Microsoft code page extends a IBM code page and is otherwise 100%
backward compatible with it, and ahs the same number as the corresponding
IBM code page, then the extensions added by Microsoft will also apply to
that code page (only).

Note that not all of the code page numbers according to the above are
actually used in Free Hero Mesh; Free Hero Mesh only uses ASCII-based
code pages with 8-bit characters (although the implementation may be
improved in future to allow multibyte code pages, but currently it
doesn't). The full set of code page numbers may still be useful for
other applications, though.

(Even if multibyte code pages are implemented in future (as well as the
possibility of different font sizes than 8x8, and for narrow/wide
characters), Shift-JIS is not suitable for Free Hero Mesh, because Free
Hero Mesh uses a backslash for string escaping and a multibyte character
in Shift-JIS can contain the ASCII code for a backslash. However, EUC-JP
may be suitable.)


=== List of code page numbers ===

In the below list, "=" means that it is already included in the provided
codepage.har file, and "-" means it isn't included (it might or might not
be included in future).

   367 = ASCII only
   437 = default PC character set
   850 = PC-like Western Latin
   852 - PC-like Central European
   858 - PC-like Western Latin including Euro currency
   860 - PC-like Portuguese
   861 - PC-like Icelandic
   865 = PC-like Nordic
   866 = DOS Cyrillic Russian
  1041 - Katakana (Japanese)
  1125 = DOS Cyrillic Ukrainian/Belarusian
  1250 - Windows Central Europe
  1251 - Windows Cyrillic
  1252 - Windows Western
  1254 - Windows Turkish

(Note: Code page 437 is hard-coded and is compiled into the executable
file, and if you specify code page 367 then it will automatically change
it to 437 (since code page 437 is a superset of code page 367).)


=== File format ===

The codepage.har file is a Hamster archive containing one lump for each
code page, named by the code page number in decimal. Code page 437 need
not be included in this file; pcfont.h is used instead.

The data of the lump can be one of three formats:

* Full set: Consists of 2048 bytes.

* Half set: Consists of 1024 bytes, with only the high half of the
character set. The low half will be the same as the built-in font.

* Compressed set: Any length less than 2048 bytes except 1024 bytes. See
below section for the description of this format.

In all cases, character code 0 is not used; if it contains anything, it
will be erased after it is loaded.


=== Compressed set format ===

The first 32 bytes indicate which characters differ from the definitions
in pcfont.h. The first byte is for character codes 0-7, where the low bit
means character code 0.

For each character whose bit is set in the header, the first byte tells
which scanlines to be copied from the base character, the second byte
tells the base character to use, and the rest of the bytes are one per
scanline that is changed, with the data to be replaced with.

The second byte is omitted if no scanlines are copied from the base
character (the first byte is zero). If it is less than the current
character code, it is copied from the current code page; if equal or
greater than the current character code, it is copied from the default
code page (pcfont.h).