A REVIEW OF THE METHODOLOGY FOR THE U.S. NEWS & WORLD REPORT'S RANKINGS OF UNDERGRADUATE COLLEGES AND UNIVERSITIES
This report critically reviews the methodology used by the U.S. News & World Report to rate American colleges and universities and suggests ways in which it can be improved. We began by reviewing the relevant literature with the goal of developing a defensible framework for establishing measures of academic quality. We note, however, that the U.S. News ratings are developed for traditional students entering college shortly after high school, that is, students 18 to 24 who attend full time and may have applied to and chosen among several institutions. Thus, this discussion is about criteria for such traditional students. We believe it is impossible to rate institutions with the same set of indicators for both traditional and nontraditional students. As the proportion of nontraditional students attending higher education institutions grows, U.S. News might want to consider developing a separate rating system and publishing a separate guide for nontraditional students.
Review of the Literature
As part of our evaluation process, we reviewed a number of different works, among them:
Institutional research about America's universities and colleges;
Quality assessment literature on higher education;
Survey literature on student attitudes and experiences;
Economic choice literature on choosing a college;
Histories of the rating and ranking of institutions of higher education;
Rating efforts in other areas such as health care;
Critiques of rating procedures and methodologies both in this area and elsewhere;
Competing American college guidebooks; and
Ratings done in foreign countries
The literature in general failed to yield an agreedupon model for assessing quality of institutions but did indicate general agreement that quality is multidimensional and that no single measure will capture all dimensions. There was little agreement about what group of measures were essential and what set of dimensions needed to be portrayed. Most writers, either implicitly or explicitly, used a highly general input, process, and output model to think systematically about measuring quality. This general model is similar in many ways to the structure, process, and outcome model used in research on hospital quality (Donabedian, 1966)
The quality of the literature varied. The institutional research literature produced a large quantity of information, but much of it exists in a manner that does not allow comparative ratings of institutions. The quality assessment literature an enterprise seemingly now in vogue in the British Isles and in northern Europe focuses mainly on techniques for selfassessment by institutions or for use by accrediting bodies, and yields little in the way of quantitative or objective measures.
The survey literature on student attitudes and experiences produces much that it of interest for rating instruments. One important finding is that students' satisfaction with their college experience is strongly related to their "immersion" in their environment such as nonclassroom and extracurricular activity. Unfortunately, the kind of information produced by the survey literature is not currently produced on an institutionbyinstitution basis and thus is unavailable for rating/ranking individual institutions, The cost of getting such information on an institutional basis is an obstacle for their use for individual institutional comparisons. Below we do suggest some possible additions to the measures currently used by U.S. News that might address these concerns.
The other literature that we examined provided little more than background information but did help us compile the list of variables we considered. Appendix A offers a comprehensive set of variables suggested in the literature to reflect quality in institutions of higher learning. Appendix B is a bibliography of the literature we consulted.
General Consideration in Ratings
Quality ratings of institutions are commonly performed in light of institutional goals. Compared with hospitals and graduate programs, undergraduate colleges are more heterogeneous in their goals. Goals may include: liberal arts education; vocational preparation; preprofessional and scholarly preparation; middleclass socialization; and service and leadership development. There are also varying mixes of goals across institutions. In large institutions there is considerable variance within the institutions, for example, among different subunits ("colleges"). The great variance both across and within institutions makes it very difficult to get consensus on quality criteria or on measures for undergraduate programs in general, or even for groups of colleges or universities that might appear similar. compared with hospitals there is also a paucity of publicly available comparable data on postsecondary institutions that are uptodate, this necessitating more reliance on data collected directly from the schools.
Students also differ in their goals, and there will be no "best" college for all students. Criteria such as size, being in a comfortable academic and social climate, cost, proximity to (or distance from) home, or being in an urban or nonurban area loom large in the choice of a college. Quality ratings are of most value when done within the major factors that affect student decision making.
Because of the heterogeneity of goals, it is difficult to get any consensus on a standard against which one can validate measures. Academic reputation has been the traditional measure most accepted by academic institutions and faculty members themselves, but this measure has severe limitations. The principle limitations are its inherently subjective nature and the fact that academic excellence, at least as traditionally defined, is not the goal of all, or perhaps even the majority, of colleges or students. In addition, it is generally assumed that reputations change more slowly than real change in institutions, thus overvaluing institutions that, in fact, may be declining and undervaluing institutions that are improving.
In the absence of clear validity criteria, how does one proceed? Basically there are two approaches: the first is to adopt a bootstrap approach in which one uses knowledgeable people in their field to tell you what are the indicators of quality for their segment of the field and what are the best measures of those indicators. We then can use the information from the experts to construct the measures in accordance with their weights.
The second approach is statistical‹assembling a set of measures, inspecting their statistical structure, that is, their degree of relatedness or correlation‹and then constructing a model that maximizes a relationship with some validity criterion, such as the reputational measure. We recommend using both approaches.
In addition, if ratings are to be reported on a yearly basis and comparison between years is to be valid, the method used to construct the ratings has to remain constant. When changes are made in methods, ratings from other years need to be recomputed to reflect the same measuring method.
When subjective measures, such as reputation, are used there will be some random variation from year to year due to chance factors such as unreliability of ratings and changes in the raters. A common technique for smoothing out such "noise" in the ratings is to use threeyear moving averages in which the ratings for the current and past two years are averaged and this average is reported as the score of the institution. (For example, one wellknown rating of MBA programs uses a threeyear average, but assigns 50 percent of the weight to the most recent year.) We recommend that U.S. News use this method: report the final ranking on a consistent basis and use threeyear moving averages.
Review of the Current U.S. News Methodology
In order to control for the heterogeneity of colleges and universities, U.S. News first divides the institutions into four categories based on a classification scheme devised by the Carnegie Foundation for the Advancement of Teaching. These categories are: national universities, national liberal arts colleges, regional universities, and regional liberal arts colleges. The regional schools are then divided into four regional groups: North, South, Midwest, and West.
U.S. News currently uses 15 independent data items assembled from various data sources, both public and private. These 15 data items are first combined into 7 variables: Academic reputation (1 item), retention (2 items), faculty resources (5 items), student selectivity (4 items), financial resources (2 items), value added (1 item derived from data already included in other variables), and alumni giving rate (1 item). The variables are then combined into a single score that is scaled against the top score, and that is expressed as a percentage of the top score. Slightly different weights and variables are used for colleges and universities in different categories
Of these 16 measures (15 independent data items plus one derived data item), we would classify 7 as input measures (average faculty salaries, percent of faculty fulltime, faculty degrees, test scores, high school class standing, acceptance rate and yield), 4 as proxies for process variables (class size, students/faculty ration, educational expenditures and other expenditures), and 4 as output measures (freshman retention, graduation rate, value added, and alumni giving). Reputation is a global measure that captures some aspects of inputs, process, and output. The data sources for these measures seem reasonably solid and not too costly to obtain, although many of them must be provided directly by the schools themselves.
The measures have evolved over the years and have been informed by discussions and criticisms from a wide variety of sources. Particular attention has been given to checking on the accuracy of the data: U.S. News has received praise for its leadership in standardizing data definitions and establishing a common core of data used by all publications presenting these data on colleges and universities. Obtaining agreement on common definitions and data sources is a considerable accomplishment, and U.S. News has done a service to the academic community in bringing this about.
Despite the many strengths of measures and the procedures by which the data are obtained, our review of these measures and their use in the U.S. News rating suggest several weaknesses that need to be investigated more fully:
1. The principal weakness of the current approach is that the weights used to combine the various measures into an overall rating lack any defensible empirical or theoretical basis. Recent studies of the measure by McGuire (1995) and Machung (1995) indicate that the ratings are sensitive to relatively small changes in the weighting scheme. Much of the negative reaction by colleges and universities centers on the apparently arbitrary weighting scheme. Such criticism does not mean that the present weighting system is necessarily wrong, but it does mean that it is difficult to defend on any grounds other than the U.S. News staff's best judgment on how to combine the measures. Since the method of combining the measures is critical to the eventual ratings, the weights are the most vulnerable part of the methodology.
A related criticism is that the weights continue to change, in part as colleges and university officers complain about their rating and suggest alternative formulations, which, not coincidentally, play into their own strengths. U.S. News has been more than accommodating to these complaints.
2. Apart from the weights, however, we were disturbed by how little was known about the statistical properties of the measures or how knowledge of these properties might be used in creating the measures. For example, the simple correlation matrix among the variables has apparently not been computed. This would tell us whether some of the present measures are redundant, or whether some are contributing more to the discrimination among colleges and universities than others.
In some cases, separate measures are combined into a single variable, such as student selectivity, faculty resources, etc., with varying numbers of measures in different variables. It is not clear whether these separate variables are really different from a statistical point of view, that is, form a latent class, or whether the weighting used in combining the measures to form the variables is optimal. Further statistical analysis is necessary to answer questions such as these.
At least one measure, graduation rates, enters the ratings twice, once directly and once as part of the value added measure, thus giving it more weight than it might appear to have. Also there is some question whether more measures would be used only in some derived form rather than directly, for example, would yield rates be used directly or should they be adjusted by type of institution? Does a yield rate for a highly selective institution have the same meaning as the yield rate for a state school that has a mandated open enrollment policy? Do yield rates have the same meaning at institutions where the majority of students apply only to that institutions as at institutions where applicants typically are applying to more than one institution? Should academic and other financial resource measures be used as direct measures or in some derived form, such as, a ration, corrected for institutional characteristics?
3. The meaning of some measures as indicators of the variables they purport to measure needs further examination and possible refinement. Alumni giving, for example, may be more a function of the vigor of the development office and a long history of fund raising at that institution than of satisfaction. It also has different connotations in private as contrasted with public institutions, although some public institutions have recently embarked on vigorous alumni and capital campaigns. As presently measured, alumni giving also represents the "satisfaction" of alumni from as much as 40 or 50 years ago and may not be a good proxy for the view of more recent alumni. Restricting the measure to more recent alumni might be a better measure.
Graduation rates, like yield rates, may have different meanings in different institutions. A simple inspection of the valueadded measure suggests that institutions with heavy mathematics and science requirements have lower graduation rates than expected when that expectation is based on a statistical regression of all institutions. One would anticipate that the meaning of graduation rates might also be different in institutions with open enrollment policies. It may not be sufficient to base the valueadded measure on a simple regression of all institutions, or even on separate ones, for the present major groups of institutions.
4. The reputational measure plays a large role in the total rankings. The rating task for the reputational raters is being changed to be similar to the National Research Council's (NRC) ratings for graduate programs. This change is welcome because placing institutions into quartiles is an almost impossible cognitive task. The rating task, however, will still require raters to make judgments about institutions within the broad groups defined by U.S. News based on the Carnegie classification. While this classification scheme is widely used, it is also widely criticized because it does not capture features of academic institutions that make them similar and comparable. The large number of institutions within each classification means that each rater is asked to rate about 2000 institutions. In the NRC ratings of graduate programs, raters were asked to rate no more than 50 programs.
Though statistical analysis of data from the past few years and by obtaining some additional data from next year's survey, we should be able to construct a more refined classification system that would both simplify the raters' task and produce comparisons that would be more meaningful. A more meaningful classification system would also permit the refinement of measures (described above) whose meaning is different for different types of institutions.
5. The 16 measures currently used by U.S. News capture many of the variables we turned up in the review of the literature on assessing quality of academic institutions. We believe that there are two areas where some sort of measures should be added. These areas are student experiences and curriculum. We recognize that it is difficult to find measures that are both defensible and for which adequate comparable data can be obtained at a reasonable cost. However, we believe some measures can be developed with research, and we offer the following suggestions.
As we noted in the introduction, the survey literature, notably the work by Astin and his colleagues (1993), suggests that student experiences outside of the classroom and their degree of involvement in the life of the college are strongly related to their satisfaction with their college experience. Ideally one would survey a sample of students in each college to get measures of their involvement plus direct measures of satisfaction. We can think of no way in which such data could be collected at a scale that would be statistically defensible but not at the same time bankrupt U.S. News. There are several measures that could be obtained from the colleges that might be used as proxies for the important student experiences. The easiest one is the proportion of students who live on campus or in college housing. A second, but possibly more difficult to obtain, is an estimate of the amount of time faculty spend with students outside of classrooms. A third might be the absolute number (or some number adjusted for size of the student body) of organized student groups on campus, grouped, if possible, into types such as cultural, social service, political, or literary. Further investigation might turn up other such measures that reflect the types of experiences that the literature indicates are important determinants of college satisfaction.
The other area that is absent from the current set of measures relates to the academic demands of the curriculum. This is a very difficult area to conceptualize, and U.S. News is not alone in finding it difficult to know what to do. There is not a good taxonomy of curricula, and the literature in this area is not particularly helpful. One dimension that seems to be important is the amount of math and science required in the curriculum. Schools with heavier math and science requirements, or with high proportions of engineering students (which may be partially a proxy for curricular orientation toward math and science) clearly are "harder" than schools that have only minimal or no such requirements. Universities that have "colleges," such as education, business or communication/journalism, may have easier curricula for many students than do schools with large arts and sciences enrollments. Similarly, honors colleges or other special programs within large universities may have harder curricula for students in those programs.
One clear trend in recent years is the steady shift of undergraduate students from traditional arts and sciences enrollments and majors toward preprofessional concentrations. At present the only proxy for academic rigor of an institution is the quality of the students entering as measured by SAT/ACT scores or proportion of students in the top 10 percent of their high school class. While we are less confident about what measure is the best one in this regard, we do feel that the area needs some attention. Research should be done to see if at least an adequate measure of academic rigor can be found. If some such measure could be developed and used conjunction with graduation rates, it would blunt the criticism that the present use of graduation rates penalizes schools with high standards.
U.S. News should obtain empirical ratings on the value of different measures for measuring the quality of schools. This can be done by adding a section to the questionnaire used in the reputational survey to get ratings on the utility of different measures, both those currently used and possible additions that have been suggested by critics or derived from the literature review. As part of this survey, U.S. News should consider as many reasonable new measures as possible.
U.S. News should undertake statistical analysis of the data it has been using to understand more completely the statistical structure of the data, the sensitivity of the data to different weighting algorithms, and to look for possible anomalies that may point to limitations in the interpretation of different measures.
When a revised rating methodology is adopted (or the current one affirmed), the ratings should be reported as a threeyear moving average to smooth out shortterm fluctuations, random errors in reporting, or other factors that might cause unbelievably large movements in rankings for particular institutions.
Once U.S. News settles on a methodology, it should remain constant unless there is compelling evidence for change. It would be wise to have an outside body review the methodology once every five to seven years, but the presumption should be against change unless there is strong evidence of change in validity of the measures.
A standing body of outside experts should be organized to meet with U.S. News staff on a regular basis to discuss possible refinements or revisions to the ratings system. Such a group could give U.S. News valuable outside advice and evaluate criticism from interested parties. Such a body could be constituted to consider not only the college and university ratings, but also the graduate and professional school ratings.
Appendix A: Selected Characteristics of Colleges/Universities that May Be Important to Undergraduate Students
Requirements for Admission (in terms of academic preparation and test scores)
Hossler, Don, John Braxton, and Georgia Coopersmith; "Understanding Student College Choice," in Higher Education: Handbook of Theory and Research, Volume 5, 1989.
Hossler, Don and Larry Litten. Mapping the Higher Education Landscape, New York: College Entrance Examination Board, 1993.
Jacobi, Maryann, Alexander Astin, and Frank Ayala, Jr.; College Student Outcomes Assessment, 1987.
Jordan, Thomas E. Measurement and Evaluation in Higher Education, London: The Falmer Press, 1989.
Kogan, Maurice, ed. Evaluating Higher Education, Higher Education Policy Series 6, International Journal of Institutional Management in Higher Education, London: Jessica Kingsley Publishers Ltd., 1989.
Machung, Anne. "Changes in College Rankings: How Real Are They?," MPaper presented at the 35th Annual AIR Forum, Boston MA, 1995.
McGuire, Michael D. "Validity Issues for Reputational Studies," in Walleri and Moss, 1995.
Miller, Richard I. The Assessment of College Performance: A Handbook of Techniques and Measures for Institutional Self Evaluation, San Francisco, JosseyBass Publishers, 1979.
Pascarella, Ernest and Patrick Terenzini; How College Affects Students, 1991.
Peterson, Doroth G.; Accredting Standards and Guidelines: A Study of the Evaluative Standards and Guidlines of 52 Accrediting Agencies Recognized by the Council on Postsecondary Education, Washington, D.C.: Council on Postsecondary Accreditation, 1979.
Postsecondary Education Opportunity; "Actual versus Predicted Institutional Graduation Rates for 1100 Colleges and Universities," April 1997.
Starck, Joan, ed. Promoting Consumer Protection for Students, New Directions for Higher Education, No 13, Spring 1976.
Terenzini, Patrick T. and Ernest T. Pascarella; "Living with Myths," Change, Jan/Feb 1994.
Walleri, R. Dan and Marsha K. Moss, ed. Evaluating and Responding to College Guidebooks and Rankings, New Directions for Institutional Research, No. 88, Winter 1995, San Francisco: JosseyBass Publishers.
Webster, David S.; Academic Quality Rankings of American Colleges and Universities, Springfield, IL: Charles C. Thomas, Publisher, 1986.
Westerheijden, Don F., John Brennan, Peter A.M. Maassen, eds. Changing Context of Quality Assessment: Recent Trends in West European Higher Education, 1994.
The College Handbook, The College Board, 1998.
The Fiske Guide to Colleges, 1998.
Peterson's Four Year Colleges, 1998.
The Best 311 Colleges," The Princeton Review, 1998.
"The Best College For You," Time Princeton Review, 1998.
"America's Best Colleges," U.S. News & World Report, 1998 (and various years).