Introduction

This ia follow-up to my attempt to look at the relative difficulty of the New Testament books. I began to wonder what the best reading order for the NT books would be, if you wanted to read a whole book at a time, and learn as few new words as possible each time. This turned out not to be a very insightful question to ask, as the rest of this page demonstrates. But I'm glad that I followed up on my idea at least. :-) Here it is in excruciating detail...

Vocabulary Properties

Number of occurrences to words, in descending order of frequency:


SELECT COUNT(lemma) AS cnt FROM sblgnt GROUP BY lemma ORDER BY cnt DESC;

Not a lot of surprises here:

Here it is with a logarithmic scale:

So you might want to ask, how many lemmas occur more than 10x?


SELECT COUNT(lemma) FROM (SELECT lemma,COUNT(lemma) AS cnt FROM sblgnt GROUP BY lemma) AS tmp WHERE cnt >= 10;

The best order in which to read the NT books

I started wondering what the best order to read the books would be, if your priority was to read complete books, and learn as little vocabulary as possible at one go. (That is, once you've read the first book and know all of those words, you don't need to relearn them to read the second book.) I couldn't think of a way to figure this out with a simple SQL query. I had to create an algorithm.


# this is a temporary copy of sblgnt
CREATE TEMPORARY TABLE words ( _id INTEGER PRIMARY KEY AUTOINCREMENT, lemma TEXT, book_name TEXT );
INSERT INTO words (lemma,book_name) SELECT lemma,book_name FROM sblgnt;

# another table to hold used words
CREATE TEMPORARY TABLE used_words ( _id INTEGER PRIMARY KEY AUTOINCREMENT, lemma TEXT, book_name TEXT );

do this 27 times{
	# take the book_name with the fewest words:
	SELECT COUNT(DISTINCT lemma) AS cnt,book_name FROM words WHERE lemma NOT IN (SELECT lemma FROM used_words) GROUP BY book_name ORDER BY cnt ASC LIMIT 1;
	# put those words into used_words
	INSERT INTO used_words (lemma,book_name) SELECT lemma,book_name FROM sblgnt WHERE book_name=?;
	# remove them from words
	DELETE FROM words WHERE book_name=?;
}

This gives the following order. The number shows the number of new words you have to learn to read that book.

At first it's surprising that you have to learn fewer words for 3 John than for 2 John, but that's because you already learned a bunch of words for 2 John (95, in fact). 3 John only has 108 words to learn, but you'd have learned a good deal from having studied 2 John first.

This wasn't as interesting as I'd hoped it would be, because the measure is strongly biased by the length of the book. There's a comparison below between this order, and simply looking at how many distinct lemmas there are in each book. You don't get a dramatically different answer than you would if you just started with the books with the fewest words. (And really, who's going to wait to the end to read the gospels?)

Sort by unique lemmasThe “ideal” order
952 John    2 John95
1083 John3 John56
140PhilemonPhilemon80
226Jude1 John134
2331 John2 Thessalonians126
2492 ThessaloniansJude127
300TitusTitus150
3611 Thessalonians1 Thessalonians147
3992 Peter2 Peter171
430ColossiansColossians163
442PhilippiansPhilippians161
4522 TimothyEphesians155
519Galatians2 Timothy157
528EphesiansGalatians157
5381 Timothy1 Timothy170
5431 Peter1 Peter163
555JamesJames173
7862 Corinthians2 Corinthians211
911Revelation1 Corinthians242
9521 CorinthiansRomans227
999JohnHebrews288
1029HebrewsRevelation272
1054RomansJohn241
1341MarkMark327
1680MatthewMatthew287
2032ActsLuke407
2046LukeActs574

A somewhat more pedagogically relevant question would be, “Suppose someone already knew all the words that occur 25 times or more. Then what would the ideal reading order be?” To answer this, I ran this command before I ran the ‘script’ above.


DELETE FROM words WHERE lemma IN (SELECT lemma FROM (SELECT lemma,COUNT(lemma) AS cnt FROM sblgnt GROUP BY lemma) AS tmp WHERE cnt >= 25);

Unfortunately that doesn't change much:

The “ideal” orderThe “ideal” order, setting aside >25x words
2 John95    2 John11
3 John563 John23
Philemon80Philemon45
1 John1341 John53
2 Thessalonians1262 Thessalonians80
Jude127Jude87
Titus1501 Thessalonians116
1 Thessalonians147Titus127
2 Peter171Colossians150
Colossians1632 Peter146
Philippians161Philippians142
Ephesians155Ephesians140
2 Timothy1572 Timothy147
Galatians157Galatians144
1 Timothy1701 Timothy162
1 Peter1631 Peter154
James173James162
2 Corinthians2112 Corinthians203
1 Corinthians2421 Corinthians232
Romans227Romans225
Hebrews288Hebrews281
Revelation272Revelation264
John241John233
Mark327Mark325
Matthew287Matthew286
Luke407Luke407
Acts574Acts574

(Can you tell I'm writing this as I go along?)

Or you could ask, “Suppose someone doesn't care about learning a word unless it comes up at least five times. Then what would the ideal reading order be?” To answer this, I ran this command before I ran the ‘script’ above.


DELETE FROM words WHERE lemma NOT IN (SELECT lemma FROM (SELECT lemma,COUNT(lemma) AS cnt FROM sblgnt GROUP BY lemma) AS tmp WHERE cnt >= 5);

But this actually breaks my algorithm, because using this method you get some “freebie” books (i.e., no new vocabulary), which my algorithm does not expect. So, it must be revised:


# this is a temporary copy of sblgnt
CREATE TEMPORARY TABLE words ( _id INTEGER PRIMARY KEY AUTOINCREMENT, lemma TEXT, book_name TEXT );
INSERT INTO words (lemma,book_name) SELECT lemma,book_name FROM sblgnt;
DELETE FROM words WHERE lemma NOT IN (SELECT lemma FROM (SELECT lemma,COUNT(lemma) AS cnt FROM sblgnt GROUP BY lemma) AS tmp WHERE cnt >= 5);

# another table to hold used words
CREATE TEMPORARY TABLE used_words ( _id INTEGER PRIMARY KEY AUTOINCREMENT, lemma TEXT, book_name TEXT );

# a table to hold book names
CREATE TEMPORARY TABLE books ( book_name TEXT );
INSERT INTO books SELECT name FROM book_names;

do this 27 times{
	# take the book_name with the fewest words:
	SELECT 
		IFNULL(cnt,0),books.book_name FROM 
			books 
		LEFT JOIN 
			( SELECT COUNT(DISTINCT lemma) AS cnt,book_name FROM words WHERE lemma NOT IN (SELECT lemma FROM used_words) GROUP BY book_name ) AS tmp
		ON books.book_name = tmp.book_name
	ORDER BY cnt ASC LIMIT 1;

	# put those words into used_words
	INSERT INTO used_words (lemma,book_name) SELECT lemma,book_name FROM sblgnt WHERE book_name=?;
	# remove them from words
	DELETE FROM words WHERE book_name=?;
	# remove it from the temporary book name table
	DELETE FROM books WHERE book_name=?;
}

Prepare yourself for another anticlimax:

The “ideal” orderThe “ideal” order, setting aside <5x words
2 John95    2 John92
3 John563 John46
Philemon80Philemon59
1 John134Jude116
2 Thessalonians1261 John100
Jude1272 Thessalonians86
Titus150Titus78
1 Thessalonians1472 Peter90
2 Peter1711 Thessalonians87
Colossians1632 Timothy81
Philippians1611 Timothy74
Ephesians155Colossians68
2 Timothy157Philippians63
Galatians157Ephesians52
1 Timothy1701 Peter61
1 Peter163Galatians59
James173James67
2 Corinthians2112 Corinthians63
1 Corinthians242Romans63
Romans2271 Corinthians59
Hebrews288Hebrews57
Revelation272Revelation92
John241John90
Mark327Mark86
Matthew287Matthew35
Luke407Luke13
Acts574Acts21

So even setting aside all of the infrequent words in the gospels and Acts, you'd still be best off learning them last.

Acknowledgements

All contents © 2024 Adam Baker, except where otherwise noted.