On Assessing and Comparing Critical Thinking Programs: A Response to Hitchcock


Donald Hatcher

Philosophy and Religion

Baker University


First, I want to thank David Hitchcock for providing us with a model for assessing critical thinking (CT) courses, for his careful comparison of his outcomes to van Gelder’s and mine, and for his thoughtful critique of the California Critical Thinking Skills Test (CCTST). This is the sort of work that is long overdue. For years, it was ironic that teachers of critical thinking who preached the virtue of honestly evaluating alternatives were so reluctant to honestly evaluate the outcomes of their own courses. Hence, any comparison was impossible. Besides  academic insecurity, this reluctance might be explained by the fact that most of us are trained in philosophy, rather than the social sciences, and so lack the needed knowledge of statistics.  So, even if we did try to assess outcomes, we were not astute enough to evaluate the results. Obviously, David Hitchcock is an exception to this short-coming.

            With respect to Hitchcock’s analysis, I have nothing very critical to say.  For the sake of a clearer comparison of the McMaster’s and my program, I will provide a little more background on the Baker courses and then go over a few concerns I have about some of the claims or suggestions Hitchcock makes in his comparison. In general, I am surprised that over the years the Baker program has an average gain of 2.8, while the McMaster program has a mean gain of 2.18. To me, there are a number of things that should give the McMaster course the advantage in terms of student motivation to do well on the CCTST post-test.

            First, let me first describe our program.  Our approach to teaching critical thinking is a two-semester required freshmen sequence that combines instruction in critical thinking and written composition.  We use our text, Reasoning and Writing; From Critical Thinking to Composition that was especially written to achieve the primary goal: to teach students to write well-argued position papers that demonstrate an honest evaluation of alternatives with respect to available evidence and arguments.  This is our conception of critical thinking.  After explaining the nature and value of critical thinking, we provide instruction in  three essential areas: 1) how to read and understand difficult material, 2) how to evaluate the reasonableness of claims, and 3) how to articulate reasoned judgments in the form of position papers.  We also spend considerable time explaining to students why what they are about to learn is important in school, work, and in life; and include a lengthy chapter on applying critical thinking skills to ethical issues.

            After only seven weeks of instruction in logic and critical thinking skills, students begin the process of writing their first position papers.  This consists of constructing argument strategies for and against a chosen position, a formal outline, a first draft, and a final copy.  After completing each part, students schedule a conference with their instructors for consultation and critique of their work.  (One cannot have 400 students in a class such as this.)

            During the second semester, students write five critical papers in response to chosen readings.  Whenever possible we choose readings that take opposing positions on an issue, e.g., Adam Smith and Karl Marx on the merit of a market economy; or Epictetus and B.F. Skinner  on human freedom.  To minimize plagiarism, we choose new readings each year.

            When we designed this program, I thought it was light on instruction in logic and critical thinking, so we set out to assess it to see if our student outcomes were comparable to those who took more traditional one-semester courses in logic or critical thinking. Currently we give the CCTST at the beginning of the first semester and the last week of classes in the second semester. See the chart at the end for the data.[i] So far so good.

            This brings me to my first concern about the comparison of Hitchcock’s course to ours. I think that student attitudes towards Hitchcock’s course may naturally be more positive than towards ours.  Our sequence is what we call a Core General Education requirement.  Without exception, entering freshmen must enroll and remain enrolled until they have successfully completed the second-semester course.   David Hitchcock’s course, as I understand it, satisfies a requirement for science and social science majors.  Given that most students choose a major because the area interests them, I would suggest that student attitudes towards requirements for majors are more positive than towards university-wide, core requirements. If a positive attitude means greater motivation and motivation is an important factor in learning new material, then this attitude should give these students an advantage over students forced to take a similar course.  This, all things being equal, should translate into stronger gains on the CCTST by the McMaster’s students.

.           A second concern about the comparison is that our students are not allowed to drop the course. No matter how badly they are doing, they must complete the course and take the post-test. If they were allowed to do so, I assume that there are a fair number of weak students who would have dropped the course and not taken the post-test.  If these students did not take the post-test, our mean scores would be higher.  However, in Hitchcock’s study, students are allowed to drop the course or also allowed not to  complete the post-test.  I would suggest that it is often the weaker students who drop or do not take the post-test.  As Hitchcock points out, they might reason that their time could be better spent studying for the final.  Leaving these students out of the mix should increase the mean gain of McMaster’s students  over those forced to take a course where drops are not permitted.

            Third, Hitchcock suggests that because nothing turns on doing well on the pre-test,  Baker students may not be properly motivated to do as well as they could and so show better gains when they take the post-test as part of the final exam.  To motivate students to do well on his pre-test, Hitchcock and van Gelder suggest telling students that the test counts 5% of the final grade, and the higher score will be used.  I wonder whether this technique is effective in motivating students to do well on the pre-test.  The average student would reason that of course the post-test score will be higher, having had an entire semester of critical thinking instruction, so why take the pre-test seriously.  I think that  telling students they are part of an important research program with “world-wide comparisons” (U.S., Australia, and Canada) might motivate them just as well. In addition, even if some do not take the pre-test seriously, as long as we assume the same lack of seriousness across all groups, nothing is lost in the comparison.

            So, if I am right, being a more selective group, the students in the McMaster might have a natural advantage over the Baker students. I am surprised that their gains were not greater than ours. However, as I have suggested in other places, the real reason for our modest success may simply be doing critical thinking for two-semesters rather than one. Just as students cannot become proficient in  a foreign language in one semester, neither do they become critical thinkers in a few short weeks. Perhaps, lengthening the course by combining critical thinking with written composition should be given wider consideration.

                        On a different note,  after insightfully pointing out weaknesses in five of the questions on the CCTST, Hitchcock  suggests that we give serious consideration to using the Ennis-Weir Critical Thinking Essay Test. First, I hope that even if the CCTST is flawed that many will still use it because by using the same instrument, we can compare the outcomes of alternative course structures and pedagogies. 



            Second, the CCTST was normed when it was validated, while the Ennis-Weir is not.  According to the test booklet, the 261 students taking a course in CT gained an average of  2.0 points.  Given this norm, anyone using it can see how his or her students are doing.

            Third, we used  the Ennis-Weir Critical Thinking Essay Test as the primary assessment tool for the first six years of our program.  It is a good test with a format that resembles exactly what we want our students to be able to do: read arguments, critically evaluate them, and respond in writing.  However, because it is an essay exam, grading it is quite time consuming.  Even if one can train a team of graders to get high inter-grader reliability, one cannot know that the team in 1998 is grading by the same standards as the team in 2003.  Likewise, it would be impossible to know if the team that graded Hitchcock’s combined 556 pre and post-test essays was applying the same standards as my team grading 480 pre and post test essays at Baker.  An objective test, for all of its weaknesses, makes comparison much easier and more reliable.

            Fourth, even if we assumed inter-grader reliability across institutions and time was possible, it is hard to know what counts for a good score or a reasonable gain on the Ennis-Weir test. According to the original test booklet that accompanied the test, its reliability was determined when “27 students midway through a college-level introductory informal logic course and 28 gifted eighth-grade students of English were each graded by two different graders.”   The college students had a mean score of 23.8 out of 29, and the eighth graders had a mean of 18.6.    As one can see from the chart at the end of this paper, compared to Baker students who over six years averaged 12.8 on the post-test, (and this is after going through our two-semester critical thinking program), both 18.6 and 23.8 are extremely high scores.[ii]  This makes me question whether graders of the Ennis-Weir from different institutions can ever hope for the inter-grader reliability necessary to compare outcomes.

            And finally, the Ennis-Weir only has eight problems.  I worried that students might be able to  remember the arguments and talk among themselves during the year and so easily do better on the post-test.  The CCTST has 34 questions, and would be harder to recall. So, as much as I like the test for mirroring what I take to be the essence of critical thinking, I do not think it is a good tool to use for comparative studies.

            Thanks you for allowing me to comment on this fine work, and I hope this will be the beginning of a long-overdue dialogue on assessing various approaches to teaching CT.  It is my personal bias, but I hope that computer-based instruction be kept as far from critical thinking classrooms as possible. To my Socratically-biased mind, there is nothing more stimulating as a learning experience than having small groups of students use class time to reconstruct or formulate arguments for and against a position and then, as a class evaluate the various arguments. So, given this bias, I am very pleased that the Baker’s program,  void of computer assistance, did at least as well as it did in the comparison. 







Baker Freshmen            Pre             St.                    Post            StD.            Diff            T            Sig

                                    Test            Dev.                 Test            Dev.                                                    

F96/S97 (n=152) 14.9            +/-4.0                18.3            +/-4.1            +3.4                 

F97/S98 (n=228) 14.3            +/-3.9                17.2            +/-4.3            +2.9                 

F98/S99 (n=177)            15.5            +/-4.7                        17.9            +/-4.7            +2.5

F99/S00 (n=153)            15.8            +/-4.3                        18.3            +/-4.3            +2.5

F00/S01 (n=184)            16.0            +/-4.2                        18.5            +/-4.3            +2.5

F01/S02 (n=197)            15.3            +/-4.1                        17.5            +/-4.4            +2.1

Mean   (n=710)            15.1            +/-4.4                        17.9            +/-4.5            +2.6            8.0            <.001





BAKER                         Pre            Std.                  Post            Std.

FRESHMEN                  E-W            Dev.                 E-W            Dev.                 Diff.                  Sig.

 90/91 (n=169)               6.3                                12.4                              6.1

 91/92 (n=119)               9.4                                12.2                              2.8

 92/93 (n=178)               6.8                                12.6                              5.8

 93/94 (n=178)               8.1                                14.1                              6.0

 94/95 (n=164)               7.5                                13.0                              5.5

 95/96 (n=169)               6.9                                12.9                              6.0

Mean (n=977)               7.5            +/-5.3                12.8            +/-5.7                5.3                    .000