User's Guide for the MH Program (v. 1.2)

User Guide for the MH Program (v. 1.2)

Tests of Marginal Homogeneity for Two-Way Tables

1 Introduction

2 Running the program

Availability and author contact

Download program (mh.zip)

1 Introduction

This document describes use of the MH computer program for testing marginal homogeneity in N×N tables. Marginal homogeneity (Barlow, 1998; Bishop, Fienberg & Holland, 1975) means that the marginal frequencies or proportions are the same (do not differ significantly) between the row and column variables. For example, one may wish to test if two raters are similar in terms of how often they use each category when rating the same cases. Or one may wish to test pre- to post-treatment change on an ordered-category variable; this can be done by examining whether the marginal distributions of the variable differ pre- vs. post-treatment.

(Top of Page)

2 Running the Program

Note. This program is almost ridiculously easy to use. It runs in a Command Prompt (DOS) window under Windows 95/98/NT/2000/XP. If you're not familiar with using the Command Prompt window, check my Command Prompt Quick and Handy Guide.

MH can be run in several ways. The easiest is just to navigate within Windows to the folder where the program (mh.exe) resides, and click its icon.

Some users may prefer to open a Command Prompt window themselves, navigate to the folder where mh.exe resides, and at the prompt type:

mh

and press the Enter key.

The program will prompt for the name of the input file. Supply the input file name, and include a path if the file is in a different folder, for example:

     c:\mh\input.mh

If you press Enter without supplying a file name, the default input file name of input.txt will be assumed.

If the input file is in a different folder than mh.exe, then unless you supply the path with the file name as shown above, you will get an error message and the program will not run.

Because the only program file is mh.exe (i.e., there are no additional profile, configuration or library files) you can copy this small file and place it in any folder, have several versions on your machine, etc. (Hint: if you place a copy in the same folder as your data files, then there is no need to type path names.)

There are also several clever ways of sending filenames to your Command Prompt window without typing anything. See the Quick and Handy Command Prompt guide for details.

Next supply an output file name, with optional path, in response to the program prompt. If no output file name is specified, a default file name of output.txt will be assumed.

The input and output file names must each not exceed 60 characters, including the path specification, if supplied. (If it seems that you have supplied the file/path names correctly but the program does not appear to run, try limiting the file name to 7 characters or less and the file extension to 3 characters or less, and use no path.)

The program will then read and process the input file and write results to the output file.

(Note: If by any chance you have a pre-Pentium-era machine without a math co-processor, this version of MH will not run. Please contact the author to obtain a suitable version.)

(Top of Page)

3 Input File

The input file contains five command lines plus the data to be analyzed. All command lines must be present, even if some are left blank. The five command lines are as follows:

Title. A brief descriptive title of up to 72 characters.
Number of categories/rating levels. This line gives the number of categories or levels associated with the variable; format is free-field. Since this is a test of marginal homogeneity, it is assumed that the row variable and the column variable have the same number of categories/levels. It is also assumed that categories/levels are in the same order for the row and column variables.
Row variable name. In the first seven columns, supply a name for the row variable. If this line is blank, a default name of ROWVAR will be used.
Column variable name. In the first seven columns, supply a name for the column variable. If this line is blank, a default name of COLVAR will be used.
Data type (ordered-category or nominal). If the data are ordered-categorical, specify
```
        ord
```
in the first three columns. If the categories are purely nominal (i.e., unordered), specify
```
        nom
```
in the first three columns.

Following these lines are the data to be analyzed. The data consist of an N×N table of frequencies, where N is the number of categories. The format is free-field, but it is helpful to physically arrange the data as a square array.

Important! Be sure to hit the Enter key after entering the last number. This will ensure placement of an ASCII end-of-record mark on the last line. (Many editors do this automatically, but others, including Notepad, do not). Otherwise data may not be read completely, producing a fortran error message. To be really safe, you can add an extra line, with just a blank character or two, following your data.

An example input file is as follows:

   Classification of 113 screening mammograms (Source: Barlow, 1998)
   5
   Rater 1
   Rater 2
   ord
       75     1     3     1     0
        1     1     0     0     1
        5     2     4     0     1
        0     0     2     1     3
        0     0     0     0    12

In constructing the input file, verify that the row and column variables are correctly labeled. Note that with, for example, rater agreement data, some sources make Rater 1 the row variable, whereas others make Rater 2 the column variable.

The largest table MH will analyze is 50×50. If this is insufficient, please contact the author.

MH requires a square data table. That is consistent with the premise of testing marginal homogeneity--i.e., that the row and column categories are exactly the same. If a row or column has all 0 frequencies, include the row or column as required to maintain a square table. However, this should be done only if the row or column could potentially have had non-zero frequencies.

(Top of Page)

4 Tests Performed

MH always performs the following tests:

For each category, a McNemar test comparing the row marginal with the corresponding column marginal
The Bhapkar and Stuart-Maxwell tests of overall marginal homogeneity
A Bowker symmetry test

For ordered-category data, the following tests are also performed:

A McNemar test of overall bias or overall direction of change
For each category, a McNemar test comparing the cumulative row marginal proportion with the corresponding cumulative column marginal proportion; this can be interpreted as a comparison of category thresholds for the row and column variables.

The specific tests are briefly described below. For more detail, see the

4.1 The McNemar test

Consider a 2×2 table that summarizes crossclassifications of a sample of cases on two categorical variables, each with the same categories. For example, the table might summarize ratings by two raters on presence (+) or absence (-) of a trait, as in Table 1, with observed frequencies a, b, c and d.

+--------------------------------------------+
|                                            |
|                    Table 1                 |
|                                            |
|                    Rater 2                 |
|                   -       +                |
|               +-------+-------+            |
|             - |   a   |   b   | a + b      |
|    Rater 1    +-------+-------+            |
|             + |   c   |   d   | c + d      |
|               +-------+-------+            |
|                 a + c   b + d   total      |
|                                            |
+--------------------------------------------+

The McNemar test (McNemar, 1947; Everitt, 1977; Somes, 1983; for more detail, x) tests whether the row and column marginal frequencies are equal. This is equivalent to simply testing whether b = c. (We should note that, strictly speaking, the test is not of the observed marginal frequencies, but rather of the corresponding marginal rates in the population.)

The MH program calculates the McNemar statistic as

X² = (b - c)²/(b + c). [1]

The value X² can be viewed as a chi-squared statistic with 1 df. A significant value (e.g., p < .05) implies that the marginal rates significantly differ between the rows and columns. The chi-squared test is inherently two-tailed. In theory, one could adapt the method to perform a one-tailed McNemar test.

If (b + c) < 10, a two-tailed exact test, based on the cumulative binomial distribution, is performed instead of calculating chi-squared.

4.2 McNemar tests for each category

MH first tests marginal homogeneity separately for each category. For each of these tests the N×N table is collapsed to form a 2×2 table. Specifically, for each rating category k (k = 1, ..., N), all categories other than k are combined, producing a 2×2 table for the k vs. not-k distinction. The McNemar test is then performed on this table.

N such tests are performed. Of these, N - 1 are independent. To account for the multiple tests, one may wish to adjust (decrease) the p value required for statistical significance. The MH program reports a Bonferroni-adjusted significance level, calculated as .05/(N - 1). However the user may instead wish to use a less conservative adjustment, or no adjustment.

4.3 Bhapkar and Stuart-Maxwell tests

As overall tests of marginal homogeneity (i.e., across all categories simultaneously) MH performs the Bhapkar test (Bhapkar, 1966) and the Stuart-Maxwell test (Stuart, 1955; Maxwell, 1970; Everitt, 1977; for more details click here).

The Stuart-Maxwell statistic is interpreted as a chi-squared value. The df are ordinarily N - 1, where N is the number of categories. If, for any category k, all frequencies in Row k and Column k are 0, except possibly for the main diagonal element (e.g., for agreement data, if there is perfect agreement for category k or the category is never used), then the category is not included in the test. The df for the test then could be considered to be N - m - 1, where m is the number of categories dropped from the test. However, a more conservative approach is to regard the df as N - 1 even though some categories were not included in the calculations. MH reports the p values associated with both df.

The Bhapkar test is a more powerful alternative to the Stuart-Maxwell test. It is similar to the latter in computational details, and again produces a test statistic which is interpreted as a chi-squared value. The df are as described above. See Agresti (2002, p. 422) for details.

The Bhapkar and Stuart-Maxwell tests are asymptotically equivalent. With a large N, both will produce the same chi-squared value. As it is more powerful, the Bhapkar test is preferred in most circumstances. The Stuart-Maxwell statistic is included mainly for comparison with other results of models one might apply to the data, such as log-linear models.

4.4 Bowker symmetry test

This tests symmetry of the table above and below the main diagonal. The null hypothesis is that p(i,j) = p(j,i) for all i ≠ j, where p(i,j) is the probability of an observation falling in row category i and column category j. The statistic is calculated as:



     N   N   [f(i,j) - f(j,i)]^2
S = SUM SUM  -------------------  [2]
     i < j     f(i,j) + f(j,i)

where f(i,j) is the number of cases in cell (i,j) of the table. For large samples, the statistic has an symptotic chi-squared distribution with N(N-1)/2 degrees of freedom, where N is the number of rows/columns.

The tests described above are always performed by MH. If data are specified as ordered-categorical, the following tests are also performed.

4.5 McNemar test of overall bias or directional change

This compares the total frequency of cases above the main diagonal of the data table with the total frequency of cases below the main diagonal using the McNemar test (Bishop, Fienberg & Holland, 1975; pp. 284-285). The test's interpretation depends on the particular application. For example, with rater agreement data a significant result implies that one rater's ratings are generally higher or lower than the other rater's ratings, indicating overall bias. If the row and column variables are pre- and post-treatment measures, a significant result implies overall improvement or worsening of cases associated with treatment.

4.6 McNemar tests for equal thresholds

Ordered categories often result from the discretization of a trait that is fundamentally continuous. When this is true, there is a connection between the cumulative proportion of cases below various levels of the variable and graded thresholds associated with each level:

Levels 1 < k < N result when a case exceeds the threshold for with level k but does not exceed the threshold for level k + 1.

Level k = N results if a case exceeds the threshold for level N.

If the row variable and column variable have the same threshold for a level k, they will also have the same proportion of cases below level k, and vice versa. Therefore a test of homogenous cumulative proportions is the same as a test of equal category thresholds.

MH tests homogeneity of row and column cumulative proportions of cases below each level k = 2, ..., N. Each test is done by collapsing the N×N table into a 2×2 table and performing the McNemar test. For a given level k, the 2×2 table is constructed by combining all rows/columns less than k and all rows/columns greater than or equal to k.

This produces N - 1 separate tests. For each test, a significant chi-squared value implies that the row and column variables have different cumulative proportions below level k and therefore that the row and column variables have different thresholds for level k. As before, a Bonferroni or similar adjustment to the alpha level may be made to account for the multiple comparisons.

(Top of Page)

5 Output File

The program output has four sections: the Input section, the Basic Tests section, the Tests for Ordered-Category Data section and the Graphic Output section.

5.1 Input section

This section prints the command file and the total number of cases.

5.2 Basic tests

Fourfold tables

This section first prints the frequencies (i.e., a, b, c, d, in that order, of Table 1) for the collapsed tables associated with the marginal homogeneity test of each category (see section 4.2 above).

Marginal homogeneity tests for each category

The next table the shows the results of the McNemar test of row/column marginal homogeneity for each category:

Column 1 shows the index number (k) associated with each category or level.
Columns 2 and 3 show the row and column marginal frequencies for this category.
Columns 4 and 5 show the corresponding marginal proportions.
Column 6 shows the value of the McNemar test chi-squared; if an exact test was performed, the phrase "exact test" appears in this column instead of a chi-squared value.
Column 7 reports the two-tailed significance value. An asterisk indicates that the value is less than the value associated with the Bonferroni-adjusted significance criterion.

Bhapkar and Stuart-Maxwell tests

Next the results of the Bhapkar and Stuart-Maxwell tests of overall marginal homogeneity appear. Reported are the calculated chi-squared values, the df, and the associated p value.

If categories were not included in the tests because of reasons discussed in Section 4.3, the number of such categories is reported; df and p values for both the conservative (i.e., with respect to all categories) and the nonconservative (i.e., with respect only to the categories used for calculations) interpretations of the tests are shown.

Bowker symmetry test

This section shows the chi-squared value, the df, and the p value for the test of table symmetry.

5.3 Tests for Ordered-Category Data

Test of overall bias or direction of change

This section reports the number of cases above the main diagonal of the data table, the number of cases below the main diagonal, and the chi-squared value, df, and p value for the McNemar test of overall bias or directional change. Fourfold tables

This section first prints the frequencies (i.e., a, b, c, d, in that order, of Table 1) for the fourfold table associated with test of equal thresholds for each level of the variable.

Tests of equal thresholds

This table shows the results of the McNemar test of equality of the row and column thresholds for each level of the variable:

Column 1 shows the index number (k) associated with each level.
Columns 2 and 3 show the proportion of cases below Level k for the row and column variables, respectively.
Columns 4 and 5 show corresponding estimated category thresholds. The thresholds are calculated under the assumption that the underlying continuous trait is normally distributed. The formula for their calculation is:

t_k = F^-1 (C_k) = probit(C_k), [3] where
Note that the assumption of a normally distributed trait is not tested. The threshold values are printed for comparison purposes only and should generally not be reported. (If one wishes to test the assumption of a normally distributed underlying trait and estimate thresholds under this assumption, one can calculate the polychoric correlation (using, for example, programs that can be downloaded from that page.)
Note, however, that the normality assumption does not enter into the calculation of p values--the tests themselves are nonparametric. A significant p value implies that the row and column thresholds for Level k differ, even though the actual values of the thresholds may be unknown. Thresholds, regardless of distributional assumptions, are monotonically related to the cumulative proportions. In reporting results, then, one may give the cumulative proportions below Level k for the row and column variables and note that the threshold of the variable with the larger cumulative proportion significantly exceeds the threshold of the other variable.
The McNemar chi-squared statistics and p values for each test are shown in Columns 6 and 7.

5.4 Graphic output

Marginal distribution histogram

If there are 12 or fewer categories, a histogram is printed comparing the marginal distributions of categories for the row and column variables.

Cumulative proportions figure

If there are 20 or fewer levels, MH will print figures showing the cumulative proportions of cases below each level and the associated probit-based category thresholds.

The graph of cumulative proportions shows the proportion of cases below levels k = 1, ..., N for the row and column variables. Levels 1 to 9 are labeled with the integers 1 to 9. Level 10 is labeled with a 0. Levels 11-20 are labeled with the lower-case letters a, b, ..., j. Note that some labels may overprint others.

Category thresholds figure

The graph of probit-based thresholds shows the estimated thresholds of levels k = 2, ..., N for the row and column variables. Thresholds are labeled as described above. The scale is relative to the standard normal curve (e.g., -3 means three standard deviations below the mean, etc.).

(Top of Page)

Disclaimer

This program is distributed as-is. It has not undergone extensive testing. The author does not guarantee accuracy and assumes no responsibility for unintended consequences of its use.

References

Categorical data analysis

Barlow W. Modeling of categorical agreement. The encyclopedia of biostatistics, P. Armitage, T. Colton, eds., pp. 541-545. New York: Wiley, 1998.

Bhapkar VP. A note on the equivalence of two test criteria for hypotheses in categorical data. Journal of the American Statistical Association, 1966, 61, 228-235.

Bishop YMM, Fienberg SE, Holland PW. Discrete multivariate analysis: theory and practice. Cambridge, Massachusetts: MIT Press, 1975

Bowker AH. A test for symmetry in contingency tables. Journal of the American Statististical Association, 1948, 43, 572-574.

Everitt BS. The analysis of contingency tables. London: Chapman & Hall, 1977.

Fleiss JL. Statistical methods for rates and proportions (second ed.) New York: Wiley, 1981.

Maxwell AE. Comparing the classification of subjects by two independent judges. British Journal of Psychiatry, 1970, 116, 651-655.

McNemar Q. Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika, 1947, 12, 153-157.

Sheskin DJ. Handbook of parametric and nonparametric statistical procedures (second edition). Boca Raton: Chapman & Hall, 2000.

Somes G. McNemar test. Encyclopedia of statistical sciences, vol. 5, S. Kotz & N. Johnson, eds., pp. 361-363. New York: Wiley, 1983.

Stuart AA. A test for homogeneity of the marginal distributions in a two-way classification. Biometrika, 1955, 42, 412-416.

(Top of Page)

Availability and Author Contact

The MH program can be downloaded at: http://john-uebersax.com/bin/mh.zip

This manual is available online at: http://john-uebersax.com/stat/mh.htm

I hope you find the MH program helpful. Please let me know if the program does not work correctly. If so, please include the input file you tried to process along with your email.

Citation

Either of the following formats may be used to cite the MH program (or this page):

Uebersax JS. User Guide for the MH Program (Vers. 1.2). Computer program documentation. 2006.

Uebersax JS. User guide for the MH program (vers. 1.2). Statistical Methods for Rater Agreement website. 2006. Available at: http://john-uebersax.com/stat/mh.htm. Accessed: month dd, yyyy.

Go to Agreement Statistics site
Go to Latent Structure Analysis site
Go to My papers and programs page

Last updated: 10 April 2007