The computer images series has covered pixels, lines, polygons and splines. You need to understand these before diving into text.
Text is something most people take for granted. In a book you just read it. You write it with barely a thought. You type on a keyboard and letters appear on a monitor. Quite a bit goes on behind the scenes with text, at least in computers.
Before getting into details, consider fonts. There are fixed width and variable width fonts. The first computers used fixed width fonts. These fonts had a limited number of characters pre-created then associated with a number. The ASCII table is one such standard. In early computers, the display of characters was handled by the graphics card or processor. This was done for speed. To make this easy, each character had 8×8 pixels. As computers became more powerful, new fonts were possible. These include variable width fonts, where letters like i or l take less width. Variable width fonts are closer to books and are easier to read due to better spacing.
Many different forms of writing exist, but to make things simple we will only consider Roman Type. Since you’re reading this, then that’s probably the type you know.
Fonts can come in various options, such as regular, bold and italic. Traditionally, regular, bold and italic are separate fonts but they can be part of the same typeset. So the typeset could be Arial, where as the font could be Arial, Arial Bold and Arial Italic. Computers have joined font and typeset together, which can cause confusion. For example in Microsoft Word, you can select the font Arial. It really should be typeset rather than font, because you can have regular, bold and italic together.
The next term is serif. A serif is a small decorative line that connects to the ends of letters, particularly on the top and bottom corners. Sans-serif fonts lack these smaller elements and look plainer. The theory is that these smaller lines help connect one letter with the other, helping the reader’s eye flow between letters. A Sans-serif font is therefore better at titles, where you want the reader’s eye to linger. The research behind these claims is not conclusive but is generally accepted. There are different levels of serif too. The image below shows serif and sans-serif fonts.
Roman characters are horizontally aligned. When writing characters, they have to be properly aligned. They are generally aligned through the baseline. When learning to write English, you write the letters on a line. That’s the baseline. When drawing characters that are larger, you line up the baseline. Additional lines are shown below.
Finally, body height is the total height of the font plus spacing between rows. In computers, font height doesn’t always represent the height of the font. Some applications, like browsers, are generally closer. If you tell it to display a forty pixel font then you’ll get a distance of about 37 between descender and ascender.
Superscript and subscripts are also common. Unfortunately there is no defined standard for this. So make something up that looks good and go with it.
Next, there is spacing. The space between letters is not the same. It depends on the letters and their geometry. For example, in some fonts ‘AW’ can slightly overlap. The general goal is to make the space feel uniform. That can be defined as the minimum distance between character geometry. Fonts usually start with that then tweak the letter combinations by hand. Next is the space between words, or the width of the space character. This is typically constant regardless of the letters to either side. The spacing table can be handled using various algorithms. If it is an ASCII font, then the simplest method is to store all character pairings in a large array. This array is 16384 bytes long, so it isn’t overly large, and it’s very fast. When making a unicode font, an algorithm is typically far better. The normal approach is to look for specific values, but if one doesn’t exist then return the default spacing.
Any font drawing algorithm must include word wrapping. When a line of text fills it starts again on the next line. This isn’t automatic, it must be programmed. Which means you must also include line spacing. Also, when text wraps you don’t want it to wrap in the middle of a word.
Text justifying is another common term. Most text is left justified, or the first character in each line has identical x values. Text drawing should also include center and right justified text. You may also provide left and right justification, which scales the spacing between words so the left and right edges line up.
In addition, there are always functions that return the height or width of given text. Calling code requires that for various calculations.
If you haven’t been scared off by this point, then it’s a good sign. You may not require all features listed above, but you should know what they are. The terms crop up everywhere. The next step is a character drawing function. This is where you can have fun making your own character set. A good idea is to start with ASCII and expand to unicode only if needed. Unicode is a whole lot of work. If you don’t want to make your own character set then you’ll have to learn the true type font file format and go from there. (Or an alternate format.) You generally define a character with a set of simple drawing commands. This forms zero or more polygons, zero for space or tab.
Drawing a character can be done with a modified polygon. Instead of sending a single polygon, you can send several polygons and render them together. If you use a scan-line algorithm then hollows are handled intrinsically.
Once all characters are designed, you calculate the spacing between letters. This is just a large array with 128×128 items for normal ASCII. You can compress it by defining a standard and just record the character combinations that differ. It may be more efficient one way or the other, depending on the font.
Serif or sans-serif depends on what you draw. Reglar, bold or italic is also an option. If you want all three then you need three different fonts. You can try doing it algorithmically, but things usually need tweaking afterwards. The various lines, baseline, descender, etc, are just defined not calculated. You may not even implement them, depending on the font. It may be a unique font used in a video game, in which case it doesn’t matter. If you are building a general purpose graphics library then you’ll want to include the lines.
Word wrapping is a fairly simple algorithm. Just find all clean break points and calculate the length of each. If the next word goes past the end of the line then move to the first character of the next line. There are special cases, such as a long string of characters that can’t be broken. In these cases, word wrap is generally ignored.
Making your own font and font drawing algorithm can be an interesting challenge if you haven’t done it before. Once it’s made, you’ll find them easy. It’s also a very rewarding project.