Printing the New Testament in Greek

From Spivey's Corner
Jump to: navigation, search

A couple of years ago, I decided to start learning the Greek of the New Testament, and this interest spread into an interest in printing Greek nicely. I can't say I've got very far with learning to read and understand the language, but I did put together a collection of stuff to help with the printing. See below for more details, or just enjoy this version of Mark's gospel, formatted for printing as an A5 booklet.

The text

For several years, James Tauber has been working on a computer text of the New Testament in which each word is analysed according to its part of speech (something linguists call parsing the text). This is a great resource for scholars, but what interested me was the fact that the text itself, including punctuation, could be recovered by selecting one column of the many included in the file, so that it is possible to make a conventional, typeset text from it. Version 4 of Tauber's file was based on a version called UBS4 which is the current standard text.

Here are the first two verses of the gospel of Mark:

020101 ! n- ----nsf- ------ ! *)arxh\ arxh a)rxh\ ! a)rxh/
020101 ! ra ----gsn- ------ ! tou= tou tou= ! o(
020101 ! n- ----gsn- ------ ! eu)aggeli/ou euaggeliou eu)aggeli/ou ! eu)agge/lion
020101 ! n- ----gsm- ------ ! *)ihsou= ihsou *)ihsou= ! *)ihsou=s
020101 ! n- ----gsm- ------ ! *xristou= xristou *xristou= ! *xristo/s
020101 ! n- ----gsm- ------ ! [ui(ou= [ui(ou ui(ou= ! ui(o/s
020101 ! n- ----gsm- ------ ! qeou=]. qeou]. qeou= ! qeo/s
020102 ! c- -------- ------ ! *kaqw\s kaqws kaqw\s ! kaqw/s
020102 ! v- 3xpi-s-- ------ ! ge/graptai gegraptai ge/graptai ! gra/fw
020102 ! p- -------- ------ ! e)n en e)n ! e)n
020102 ! ra ----dsm- ------ ! tw=| tw tw=| ! o(
020102 ! n- ----dsm- ------ ! *)hsai/+a| hsaia *)hsai/+a| ! *)hsai/+as
020102 ! ra ----dsm- ------ ! tw=| tw tw=| ! o(
020102 ! n- ----dsm- ------ ! profh/th|, profhth, profh/th| ! profh/ths
020102 ! x- -------- ------ ! *)idou\ idou i)dou\ ! i)dou/
020102 ! v- 1pai-s-- ------ ! a)poste/llw apostellw a)poste/llw ! a)poste/llw
020102 ! ra ----asm- ------ ! to\n ton to\n ! o(
020102 ! n- ----asm- ------ ! a)/ggelo/n aggelon a)/ggelo/n ! a)/ggelos
020102 ! rp ----gs-- ------ ! mou mou mou ! e)gw/
020102 ! p- -------- ------ ! pro\ pro pro\ ! pro/s
020102 ! n- ----gsn- ------ ! prosw/pou proswpou prosw/pou ! pro/swpon
020102 ! rp ----gs-- ------ ! sou, sou, sou ! su/
020102 ! rr ----nsm- ------ ! o(\s o(s o(\s ! o(/s
020102 ! v- 3fai-s-- ------ ! kataskeua/sei kataskeuasei kataskeua/sei ! kataskeua/zw
020102 ! ra ----asf- ------ ! th\n thn th\n ! o(
020102 ! n- ----asf- ------ ! o(do/n o(don o(do/n ! o(do/s
020102 ! rp ----gs-- ------ ! sou: sou: sou ! su/

As you can see, there is one line for each word. What interests us most is the form of the word that immediately follows that second !, because that represents the word as printed, with attendant punctuation.

A simple program can select and rearrange these words into a continuous text, taking care of a few details such as the use of * to denote a capital letter:

    )Arxh` tou= eu)aggeli'ou )Ihsou= Xristou= [ui(ou= qeou=].
    Kaqw`j ge'graptai e)n tw=| )Hsai'+a| tw=| profh'th|,
    )Idou` a)poste'llw to`n a)'ggelo'n mou pro` prosw'pou
    sou, o(`j kataskeua'sei th`n o(do'n sou:

This represents the Greek, but with each letter transliterated into a loosely corresponding Roman one, and accents and breathings represented by other characters. This scheme (called Beta) was commonly used before the advent of Unicode, and still has the attraction that it is easy to type on an ordinary keyboard, even if it is not easy to read.

The Greek reads

1 Ἀρχὴ τοῦ εὐαγγελίου Ἰησοῦ Χριστοῦ [υἱοῦ θεοῦ]. 2 Καθὼς γέγραπται ἐν τῷ Ἠσαΐᾳ τῷ προφήτῃ, Ἰδοὺ ἀποστέλλω τὸν ἄγγελόν μου πρὸ προσώπου σου, ὃς κατασκευάσει τὴν ὁδόν σου·

Rough translation:

1 The beginning of the good news of Jesus Christ [the son of god]. 2 As is written in Isaiah the prophet, See, I am sending my messenger before your face, who will prepare your path:

Sadly for us, the German Bible Society has decided to act as if it owns the copyright in UBS4, and has written to Tauber and others, asking them to discontinue offering files based on this text on the web. I take no view about whether they have the legal or moral right to do this, but the fact remains that Tauber's file is no longer available in a form that contains the complete, punctuated text. Later versions of Tauber's work contain only a list of the words, and therefore can't be used as a basis for printing.

Other editions of the New Testament are still available in a form that contains the whole text. The Tischendorf text, for example, can be found at in several forms, including one based on the Beta encoding that is used by Tauber.

The font

Good typesetting of Greek depends on finding a suitable font containing decent Greek letters, preferably pre-composed with accents. One possibility is the free font Gentium [1]; another is Adobe Minion, which is not free but is widely available. Both these fonts contain a complete set of polytonic Greek diacritics in their Unicode positions. Gentium is a bit easier to use because in Minion, if an upper-case vowels has an accent and breathing, they stick out of the character box to the left, and it's sometimes necessary to add a kern to compensate for this. Gentium does not have this problem.

There are some errors in the kerning information for the current version of Gentium that need fixing.

There are plenty of other fonts with extensive Unicode support, I'm sure, that have emerged in the last couple of years and could also be used.

The software

TeX! But alas, TeX doesn't understand Unicode ...

What TeX needs to make sense of the Beta-based input is a virtual font where the basic layout matches the assignment of Roman letters to Greek ones in Beta. On top of that, the font can use ligatures to access accented forms of the letters. So the position corresponding to ASCII h will contain the Greek character η, and there will be ligatures η+( → ἡ and ἡ+' → ἥ that mean the sequences h( and h(' produce ἡ and ἥ respectively.

Although the restrictions of ordinary TeX allow no more than 256 characters in a font, this is just about enough to include all the accented letters that actually occur in the text. With the help of the program afm2tfm, it's possible to create an appropriate font metric file for TeX that contains the appropriate ligatures, and maps each character position to an approprate character chosen from a Unicode font.


A nice printing of the text will use hyphenation to even up the layout. For TeX to do this, we need a set of hyphenation patterns that suit ancient Greek.