I have been toying with the morphologically tagged New Testament prepared by Dr. Tauber, based on the NA-26 Greek text. The feature that I needed was the resolution of each word in the New Testament into its lexical form.
The reason for doing so was to look at an old problem, the lexical frequencies of the Pauline epistles, from a slightly new perspective, my own "max hapax" formula. ("Max hapax" may be a bit of a misnomer, but it sounds nice.)
It can be described in simple enough terms. First, one divides the material into maximally large chunks that will tell you something interesting about your data set, but that won't result in too many chunks (too long to process, not large enough sample size). For a quick analysis, I chose to use just 8 chunks:
00 Romans + Galatians (9341 words)
01 First and Second Corinthians (11307 words)
02 Philippians and First Thessalonians (3110 words)
03 Colossians (1582 words)
04 Ephesians (2422 words)
05 Second Thessalonians (823 words)
06 First Timothy, Second Timothy, Titus (3488 words)
07 Hebrews (4953 words)
I left out Philemon, for now, because it may be too short to analyze.
Then one chooses a number of authors between which the chunks can be parceled out. I chose 2, 3, 4, 5, and 6. (This also increases processing time. The program has to cycle through all possible permutations of author distribution.)
The "max hapax" formula is this. For each word, the number of occurences per 500 words is calculated for each author. Then the highest rate of occurence is found. Then, one goes through the rest of the authors, and if that author does not have the word or has the word less than 1 time per 500 words, then, for each such author without the word, the value of the highest rate of occurence is added to the "hapax" score for that particular distribution of authors.
The "max" part comes in displaying the top two distributions of authors in terms of the "hapax" score.
The reasoning behind this is, basically, that the more distinctive the lexical style of each author in the distribution, the more likely that distribution is. And, of course, I wanted to see what would happen if one went forward with this kind of analysis.
Here were the results.
For two authors:
Highest: Pastorals by themselves, the rest grouped together
Second Highest: Hebrews by itself, the rest grouped together
For three authors:
Highest: Hebrews; Pastorals; the rest grouped together
Second Highest: Hebrews; Pastorals and 2 Thessalonians; the rest
For four authors:
Highest: Hebrews; Pastorals; 2 Thessalonians;rest
2nd Highest: Hebrews; Pastorals; Ephesians; rest
For five authors:
Highest: Hebrews; Pastorals; 2 Thess; Ephesians; rest
2nd Highest: Hebrews; Pastorals; 2 Thess; Colossians; rest
For six authors:
Highest: Hebrews; Pastorals; 2 Thess; Eph; Philippians+1Thess; rest
2nd Highest: Hebrews; Pastorals; 2 Thess; Eph; Colossians; rest
The results interpreted.
Romans, Galatians are always grouped together with 1 Corinthians, and 2
Corinthians. These four epistles may be attributed to Paul and provide
the basis for determining what is Pauline style.
Hebrews and the Pastorals are certainly outliers in terms of lexical
style. Hebrews is usually taken as non-Pauline; the Pastorals should be
also, and, in fact, they often are on separate grounds.
2 Thessalonians is also on the periphery of Pauline lexical style. So
is Ephesians. Their non-Pauline status is probable, though not as
certain as in the case of Hebrews and the Pastorals.
Colossians is up in the air for me. For Philippians and First
Thessalonians, I would take them as Pauline more probably than not.
Thoughts, suggestions, criticisms, requests for source code?
The theoretical basis for this kind of analysis--in terms of statistics and linguistics--is probably the weak point of what I have produced so far. Basically, I'm not sure why my formula "works" inasmuch as it lines up, undesignedly, with what many scholars have been saying about the Pauline epistles for a long time. One thing I'd like to do is to apply this type of procedure to a different body of texts. The difficulty is getting a corpora that is tagged with the lexical forms of
the words. I made a request to B-Greek for notice of any such corpora. If anyone here knows of one (even if it is proprietary and may be difficult to obtain), please let me know.
I may do different types of studies with Tauber's NT in the future. Let me know if you have ideas.
1 Comments:
Have you ever looked at Trobisch's work of collation?
http://www.religion-online.org/showarticle.asp?title=91
Altho I am certainly NOT in disagreement with your results thus far -- I am wondering whether or not a 'strict' lexical study would be able to determine authorship beyond 'who' -the principal author might have been. My intended meaning is that you have demonstrated to me that Romans etc. lexically belong together whereas Ephesians does not...
but this does not in and of itself demonstrate to me that Paul was not behind Phil. Eph. Col. in a prominent manner. p.s. I do not belong to the inerrantist perspective but have maintained an interest in textual criticism etc. for a long time. As far as the Pastoral Epistles are concerned it would amuse me IF they were really non-Pauline ...
Post a Comment
<< Home