Difference between revisions of "Text Tables"

From NES Hacker Wiki
Jump to: navigation, search
(Other Tables)
 
Line 72: Line 72:
 
Some games, such as [[Blaster Master#Text]] are actually encoded in ASCII and can be viewed in a regular hex editor.
 
Some games, such as [[Blaster Master#Text]] are actually encoded in ASCII and can be viewed in a regular hex editor.
  
==Other Tables==
+
==Other Encoding Types==
Games with a lot of text, like RPGs, have so much dialogue they have to compress their text. There are multiple ways to do this, and they require their own guides to explain. [[Final Fantasy]], for example, uses a single byte to represent two letters (called Dual-Tile Encoding, DTE, or Byte Pair Encoding, BPE). [[StarTropics]] uses a dictionary where a single byte can represent an entire word. [[Battletoads]] uses Huffman Tree Encoding to greatly compress the dialog.
+
Most games use a single 1-to-1 encoding system like the one described above because it is extremely simple to implement. However, Games with a lot of text, like RPGs, have so much dialogue it can't fit in the tiny ROM chips used in NES games. For games like this, the text has to be compressed. There are multiple compression schemes used by various games, here are some of them.
 +
 
 +
===Dual-Tile Encoding (DTE)===
 +
Dual-tile encoding allows you to store two letters with a single byte, in addition to the usual 1-to-1 encoding. By using a larger text table, it's able to shrink the text a significant amount. Here is an example:
 +
 
 +
01=a
 +
02=b
 +
03=c
 +
...
 +
80=th
 +
81=ee
 +
82=st
 +
83=ng
 +
...
 +
 
 +
Notice how two-letter combinations common to the English language are represented. The word "street", which would normally take six bytes, could be represented with four bytes:
 +
 
 +
13 14 12 05 05 14 = S T R E E T
 +
82 12 81 14      = ST R EE T
 +
 
 +
You can also use common punctuation and spacing, for example, a lot of words end with an "e" and are followed by a space, so having "e " in the text table would shrink the text a lot. Also, a comma and period are almost always followed by a space, so ". " and ", " would also shrink the text a lot.
 +
 
 +
Displaying dialogue to the screen that has been encoded in DTE requires a lot more work on the part of the CPU in the NES compared to the normal 1-to-1 text lookup, but but the space saved in the ROM makes it worth it for games with a lot of dialogue.
 +
 
 +
This system is used by [[Final Fantasy]].
 +
 
 +
===Dictionary Encoding===
 +
While DTE can shrink two-letter combinations into a single byte, dictionary encoding can shrink entire words or phrases into a single byte. The lookup table would look something like this:
 +
 
 +
01=a
 +
02=b
 +
03=c
 +
...
 +
80=the
 +
81=is
 +
82=warrior
 +
83=go to the
 +
...
 +
 
 +
Just like with DTE, drawing the dialogue to the screen takes more CPU work, because the NES has to make frequent look ups in the dictionary, but you can save a lot of space in the ROM with an encoding system like this. You can also combine DTE and a dictionary to include common two-letter combinations.
 +
 
 +
This system is used by [[StarTropics]].
 +
 
 +
===Huffman Encoding===
 +
This system uses a [https://en.wikipedia.org/wiki/Binary_tree binary tree] to store letters, or letter combinations using only a couple bits.
 +
 
 +
This system is used by [[Battletoads]].
  
 
==Further Reading==
 
==Further Reading==
Line 80: Line 126:
 
* [[List of Text Editors]]
 
* [[List of Text Editors]]
  
==External links==
+
==External Links==
 
* [[Wikipedia:ASCII]]
 
* [[Wikipedia:ASCII]]
 
* [[Wikipedia:Character encoding]]
 
* [[Wikipedia:Character encoding]]
 
* [[Wikipedia:Byte pair encoding]]
 
* [[Wikipedia:Byte pair encoding]]
[[Category:Guides]]
+
 
 +
 
 +
[[Category: Guides]]

Latest revision as of 09:53, 23 June 2017

In simple terms, a text table is a look up for the numbers that a game uses to represent the alphabet. Specialized programs use text tables in order to make it easier to alter the text in a game.

Example

For example, if you look at the game Excitebike, you notice that it has a text table that looks like this:

00=0
01=1
02=2
03=3
04=4
05=5
06=6
07=7
08=8
09=9
0A=A
0B=B
0C=C
0D=D
0E=E
0F=F
10=G
11=H
12=I
13=J
14=K
15=L
16=M
17=N
18=O
19=P
1A=Q
1B=R
1C=S
1D=T
1E=U
1F=V
20=W
21=X
22=Y
23=Z
26=*
3A=©
79=#
f8==
F9='
FA=!
FB=:
FC= 

Computers use bytes. They can use letters only if they're encoded as bytes. A character encoding is a mapping between letters and bytes, and a text table is how this encoding is described to a hex editor. The number on the left of the text table is a byte-value, and the value on the right is the corresponding letter/number/symbol.

For example, in Excitebike, the phrase "IT'S A NEW RECORD!" appears if you get high score on a track. Text appears on the screen, but the programmers had to spell it out using the byte-value of each letter which looks like this:

12 1D F9 1C FC 0A FC 17 0E 20 FC 1B 0E 0C 18 1B 0D FA
I  T  '  S     A     N  E  W     R  E  C  O  R  D  !

If you open the Excitebike ROM in a hex editor, you'd find this very string of numbers at address 15DB. Changing the hex values would, in turn, change the text that's written to the screen. But that's a rather tedious process.

Thankfully, several programs have been written to make this process easier. Normally, when you look at the game's ROM data, all you see are a bunch of hexadecimal numbers, but these programs allow you to open up both the ROM and a text table at the same time in order to read and write to a ROM using the game's alphabet. See the images below for an example.

Here's what the Excitebike ROM looks like without a text table. There isn't any readable text to be found. This is because most hex editors use the built-in ASCII table unless told otherwise, and this text is not encoded in ASCII.

Text1.png

And here is the same data with a text table for Excitebike applied. Notice how the text is decoded, and we can read the game messages. The phrase "IT'S A NEW RECORD!" is now visible.

Text2.png

Some games, such as Blaster Master#Text are actually encoded in ASCII and can be viewed in a regular hex editor.

Other Encoding Types

Most games use a single 1-to-1 encoding system like the one described above because it is extremely simple to implement. However, Games with a lot of text, like RPGs, have so much dialogue it can't fit in the tiny ROM chips used in NES games. For games like this, the text has to be compressed. There are multiple compression schemes used by various games, here are some of them.

Dual-Tile Encoding (DTE)

Dual-tile encoding allows you to store two letters with a single byte, in addition to the usual 1-to-1 encoding. By using a larger text table, it's able to shrink the text a significant amount. Here is an example:

01=a
02=b
03=c
...
80=th
81=ee
82=st
83=ng
...

Notice how two-letter combinations common to the English language are represented. The word "street", which would normally take six bytes, could be represented with four bytes:

13 14 12 05 05 14 = S T R E E T
82 12 81 14       = ST R EE T

You can also use common punctuation and spacing, for example, a lot of words end with an "e" and are followed by a space, so having "e " in the text table would shrink the text a lot. Also, a comma and period are almost always followed by a space, so ". " and ", " would also shrink the text a lot.

Displaying dialogue to the screen that has been encoded in DTE requires a lot more work on the part of the CPU in the NES compared to the normal 1-to-1 text lookup, but but the space saved in the ROM makes it worth it for games with a lot of dialogue.

This system is used by Final Fantasy.

Dictionary Encoding

While DTE can shrink two-letter combinations into a single byte, dictionary encoding can shrink entire words or phrases into a single byte. The lookup table would look something like this:

01=a
02=b
03=c
...
80=the
81=is
82=warrior
83=go to the
...

Just like with DTE, drawing the dialogue to the screen takes more CPU work, because the NES has to make frequent look ups in the dictionary, but you can save a lot of space in the ROM with an encoding system like this. You can also combine DTE and a dictionary to include common two-letter combinations.

This system is used by StarTropics.

Huffman Encoding

This system uses a binary tree to store letters, or letter combinations using only a couple bits.

This system is used by Battletoads.

Further Reading

External Links