Alphabet Soup

For a practical project, this page might be called a “Manual” instead of a “Manifesto.”

News

18 July 2011: My improvements to the Inkscape plugin have been merged into Inkscape's core, and should be part of Inkscape's next major release.
21 Nov, 2009: Alphabet Soup is now included with Inkscape 0.47 and newer.
19 Mar, 2008: Joel Holdsworth has ported Alphabet Soup to Inkscape/SVG. The plug-in is already available from Inkscape's SVN, and should be included in Inkscape 0.47. Many thanks to Joel & Inkscape. Read more here.

What is it?

Alphabet Soup is a project which attempts to determine a number of things about the shapes of letters in several different writing systems. First, it hypothesizes a set of basic building blocks that all letters are built up from. Second, it hypothesizes a set of rules, a grammar or syntax, which defines how those pieces combine to make different letters.

The project will eventually include the letters in the Roman alphabet, upper and lower case Arabic numbers, the Cyrillic alphabet, the International Phonetic Alphabet, and much of the uppercase Greek alphabet. It currently has the capability to generate 2,099,776 letter-like symbols, most of which are not in any alphabet, although they look quite plausible.

Alphabet Soup is implemented in a computer program which uses the building blocks and grammar to do several things. It can generate individual letters. It can take a input string and randomly vary the letters, producing a string which is readable but strange looking. Or it can generate random strings of symbols. The program includes an optical kerning algorithm to ensure that the letters are kerned sensibly.

The Syntax

The syntactic rules, which describe how to put the pieces together, is a system of context-free rules. This is the same type of system which is used for explaining phrase structure in modern linguistics. From a linguistic point of view, this project could be informally thought of as Universal Grammar for European orthography.

The syntax can be extended, potentially allowing the program to generate additional letters, or entirely different alphabets.

A syntactic rule consists of a start state, and a list of states which can be drawn and added to that state. Each state in the list of states comes with information about whether it should be reflected horizontally or vertically, and with information specifying how far vertically and horizontally it should be moved before adding it to the current state. This allows, for example, the curly tail seen in `f,' `j,' `c,' and `r' to be the same state, drawn at different locations after it has been flipped horizontally and/or vertically.

Each symbol that the system produces is described by a ``tree,'' a mathematical object produced by the application of the syntactic rules. If the system is given an incomplete tree, one where some terminal nodes in the tree are not terminal nodes in the syntax, the system randomly generates subtrees to fill in the ``underspecified'' terminal nodes. This is how it can randomize a prespecified letterform.

The Building Blocks

The pieces which comprise the letters are of my own design, and are reminiscent of a standard serif font, like Times. Part of my inspiration for this system comes from my experience designing typefaces. While designing typefaces, I would always reach a point where it seemed that any unfinished letters could be built up algorithmically from pieces of already existing, ``basic'' letters, like `o,' `l,' `n,' and `x.' This project is an expression of that realization that typefaces could be generated algorithmically.

To this end, the images containing the pieces can be replaced, allowing the program to generate letters in a different typeface. The system could eventually become a new framework for typeface design.

Finally, the syntax and the images could be replaced, and the program could be used to generate completely different images. One person suggested building images of insects from pieces.

What do I need to use the program?

The program, which is relatively short, is written in the Python programming language and uses the python-imaging module to create the composite images. It should work without any modification on any UNIX system with these programs installed. I haven't the faintest idea if it works on MacOS or Windows; let me know what happens if you try it out.

The source code of the compiler/interpreter, the syntax, and the building block images in this project are all released under the GNU GPL.

The latest implementation will be available from this page.

What doesn't it do?

There are a great many symbols and characters which are generally associated with typefaces (and indeed, many included in ASCII and all in Unicode). I have made the choice to leave out some things from this system:

Punctuation. With the exception of `?,' `!,' `&,' most punctuation does not vary greatly between typefaces or languages. It is indeed something we rely on to remain the same. Alphabet Soup handles `?' and `!,' but not `&.' That said, it would be nice to add some basic punctuation to the Alphabet Soup system, so the "randomize text" function preserved punctuation.
Special Symbols. Like punctuation, special symbols (many of the symbols that appear above the numbers on a regular keyboard) do not vary much between languages, and we expect them to be fixed. Alphabet Soup does not handle these.
Mathematical Symbols. There are a great deal of mathematical symbols. These are usually very simple and geometric, as opposed to a serif typeface with variable stroke width. Extending the program to deal with these would not generate symbols which had the feel of letters.
Diacritics. There are also a great number of diacritical symbols, from stress marks, tilde, cedillia, bars, and umlauts to a great deal of diacritics in the IPA that most people would be unfamiliar with. Again, these symbols are largely invariant and based on basic shapes. Unlike the others, they do combine with letters to make new letters. However, they seem to me more like a system imposed on top of the letters, and could eventually be added to Alphabet Soup, on top of the existing process.
Typesetting-based notation. There are a great deal of conventions used to denote relationships and meanings, such as superscript, to represent exponentiation, or subscript, to represent indices, or Sigma-notation, or small caps, which combine regular letters in different places and at different sizes to express unique meaning. Again, these seem outside the realm of Alphabet Soup, although, like the diacritics, they could be added on top of the existing system.
Ligatures. There are a great number of ligatures, both in ordinary typesetting ( ``fi,'' ``ff,'' ``fl,'' German double S) and in the IPA (t+esh, d+yogh). The system might well generate these as well; however, this would import a great deal more complexity into the system. Additionally, the rules governing ligatures are more concerned with making overlapping letters look better on the page, and are not trivial in a system which builds letters from basic building blocks.
Non-European Writing Systems. Obviously, trying to use the same system to account for character writing systems, hieroglyphics, and so on, is at best a Herculean task, and at the worst impossible.
Handwriting. Obviously, handwriting and cursive writing are much more sloppy, variable, and free-form than text typefaces. This seems to be another tremendously large task.

Where does it go from here?

There are many different potential roles for this program:

It could be integrated as a gimp-python plug-in, and could be called to create random text strings in images.
It could be converted to draw vector graphics, and integrated into a typeface that randomized a letter each time it was rendered. The typeface Beowulf randomizes its letters, in a radically different fashion.
It could be converted into a xscreensaver hack, and generate marquees and xmatrix or phosphor-like screensavers
It could be called as a CGI to create randomized headlines/banners for a website.
It could be built in to video games which need alien-looking or futuristic-looking typefaces.
It could be used for cipher-making, or for making new European-like alphabets, like Cherokee or the Mormon alphabet ``Deseret.''
It will be completely skinnable/themable and could inspire users to create their own sets of images and/or their own syntax to draw different alphabets, or entirely different things.