1
Addressing inconsistencies in grading practices
Thomas R. Guskey
Phi Delta Kappan, 105(8), 52-57. Apr 29, 2024|
https://kappanonline.org/addressing-inconsistencies-in-grading-practices/
Coming to agreement about the purpose of grading and establishing clearer and more
accurate reporting structures can pave the way for more learning-focused grading systems.
Throughout the world today, school leaders are struggling to implement grading reforms. They
recognize that many current grading policies and practices are outdated and inadequate. They
also know these policies and practices don’t align well with recent changes in school curricula,
instructional strategies, and procedures for assessing student learning. Yet despite their
commitment and good intentions, these dedicated school leaders are facing unanticipated
opposition.
2
Grading reform means challenging some of education’s longest held and most firmly entrenched
traditions (Guskey & Brookhart, 2019). These challenges prompt concern among all stakeholders
and serious opposition from some. In many cases, the most adamant opposition comes from
parents and families, especially for reforms involving standards-based or competency-based
grading (Franklin, Buckmiller, & Kruse, 2016; Young, 2023).
Sources of frustration
Ironically, few parents and families oppose the basic principles of standards-based or
competency-based grading. Most support the idea of reporting students’ achievement in terms of
specific learning goals. They also understand the rationale behind giving students multiple
opportunities to demonstrate what they have learned. The frustration of parents and families, as
well as many students, comes from the failure of reform efforts to address what they consider a
primary obstacle to fairness and equity in grading: inconsistency in grading practices among
teachers in the same school (Guskey & Link, 2019). Each time students change classes, the rules
for grading change. What counts as part of the grade, what doesn’t count, and how different
aspects of students’ performance are weighed in determining grades — all can be different
(Guskey, 2024).
This inconsistency leads many students to see grading as a game they must learn to play to
succeed in school and some students play the game quite well. They become strategists in the
grading game, constantly tallying points and calculating the minimum scores they must attain to
get the grade they want. But for other students, the grading game remains a mysterious puzzle
they must decipher in every class, and many struggle in that effort. So, when a parent asks at the
dinner table, “What grade are you going to get in this class?” the student responds in all honesty,
“I don’t know.”
3
Before standards-based or competency-based grading reforms can be implemented, this
inconsistency in grading must be addressed. This doesn’t mean infringing on teachers’
professional freedom. It simply requires reaching consensus about the purpose of grading and
then implementing grading policies and practices that evidence shows serve the best interests of
students and their learning.
Gaining greater consistency in grading among teachers involves three crucial steps that lay the
groundwork for standards-based and competency-based grading reforms (Guskey, 2021):
1. Reach consensus on a clear and concise purpose statement for grading.
2. Use grading scales with four to seven categories of student performance.
3. Report academic and non-academic aspects of students’ performance separately.
Develop a clear and concise purpose statement
Teachers generally don’t agree on why they give grades in the first place (Russell & Airasian,
2011). When neither teachers nor school leaders agree on what grades mean or what they are for,
grading procedures tend to vary from teacher to teacher, class to class, and school to school.
Establishing consensus
Successful grading reforms always begin with focused discussions on the purpose of grades and
report cards (Brookhart, 2011). These discussions must address three questions:
1. What information will grades communicate?
2. Who is the primary audience for that information?
3. What is the intended goal of grading?
Reaching consensus on answers to these questions provides the foundation for determining the
appropriateness of all grading policies and practices. It also establishes criteria for deciding the
optimal form and structure of the report card.
Research by Jessica Gogerty (2016) showed that when the purpose of grading is clearly
articulated, teachers become more deliberate in their approach to student learning. They
prioritize curriculum standards and adjust their instructional procedures so that content, format,
and difficulty of classroom assessments are more closely aligned. Teachers also express less
tolerance of colleagues who fail to align their teaching and learning practices to the grading
purpose. They see this failure as “negligence” that causes unnecessary confusion for students and
families (p. 154). When the grading purpose is clear, teachers are expected to uphold that
purpose.
4
Example purpose statements
To guarantee a shared understanding among all stakeholders, this purpose statement should be
prominently featured on the report card and included in the introduction of all grading policy
documents. This helps clarify the report card’s intent, the information it includes, and how to
interpret that information.
Numerous examples of purpose statements for grading and report cards can be found online and
in Developing Standards-Based Report Cards (Guskey & Bailey, 2010). Although these
examples vary widely, the best succinctly address the three questions described earlier. An
elementary-level example would be:
The purpose of this report card is to describe students’ learning progress to parents
and families, based on our school’s learning goals for each grade level. It is intended
to inform parents and families about learning successes and to guide improvements
when needed.
This statement specifies the aim of the report card, for whom it is intended, and how the included
information should be used. It is brief but clear and concise. Another example for the middle
school or high school level is:
The purpose of this report card is to communicate with parents, families, and
students about the achievement of specific learning goals. It identifies students’
current levels of performance regarding those goals, areas of strength, and areas
where additional time and effort are needed.
This statement identifies parents, families, and students as important audiences for the report
card. It further specifies that the information describes students’ “current level of performance,”
not where they started or an average of scores over time. It also indicates how the information
should be used to guide improvement.
A third example comes from the American School of Paris, an international school where the
administrators and faculty have been especially thoughtful in their approach to grading and
reporting reform:
The primary purpose of grading is to effectively communicate student achievement
toward specific standards, at this point in time. A grade should reflect what a student
knows and is able to do. Students will receive separate feedback and evaluation on
their learning habits, which will not be included in the academic achievement grades.
Two parts of this purpose statement deserve attention. First, the phrase “at this point in time”
makes clear that teachers do not determine students’ grades by averaging scores from the entire
grading period. Instead, they assign grades based on the most current evidence they have on what
students now know and can do. In other words, grades reflect where students are in their learning
right now, not where they were weeks or months before.
Second, the statement “students will receive separate feedback and evaluation on their learning
habits” emphasizes that achievement grades represent students’ performance on specific
5
academic learning goals. Other aspect of students’ behavior related to learning habits, such as
homework completion, class participation, and punctuality in turning in assignments, are
reported separately.
Use grading scales with four to seven performance categories
Consistency in grading implies that teachers with comparable knowledge and experience, when
presented with the same body of evidence on a student’s performance, agree on the grade.
Researchers call this “inter-rater reliability” (Gwet, 2021; Hallgren, 2012). The number of levels
or categories of performance in the grading scale plays a significant role in achieving that
agreement. Scales that include large numbers of categories increase the potential influence of
subjectivity and drastically reduce agreement among teachers.
Direct and indirect measures
The challenge of gaining acceptable levels of inter-rater reliability in grading is further
complicated by the fact it requires teachers to summarize quantitative evidence gathered
primarily through indirect approaches to measurement. Direct measurement involves explicitly
measuring and quantifying the characteristic of a person that we want to report. For instance, to
measure a student’s height, we would ask the student to stand with their back against a wall,
place a level instrument like a ruler or book on the top of their head, mark the wall, and then
measure the distance from the floor to that mark. The recorded number represents the direct
measurement of the student’s height.
Most of the measures teachers use to determine grades are indirect measures. Indirect
measurement involves measuring something else and converting it into a measurement of the
characteristic in question (American Psychological Association, 2018). For example, we cannot
directly measure students’ achievement or proficiency by placing a measuring device on them.
Instead, we ask students to answer questions or perform certain tasks. We then make judgments
or inferences about students’ level of achievement based on their responses or performance.
Because these judgments involve personal interpretation, indirect measures are more susceptible
to bias and interpretation errors than direct measures. This makes it extremely difficult to
accurately discern and report subtle differences in students’ performance.
When neither teachers nor school leaders agree on what grades mean or what they are for,
grading procedures tend to vary from teacher to teacher, class to class, and school to school.
Failure to recognize the difference between direct and indirect measures often leads to false
assumptions about the numbers assigned to students. This is called the illusion of data validity,
and it leads to the false belief that the information we collect from and about students is always
honest, complete, and accurate (Jansen et al., 2022). This is rarely true when it comes to indirect
measures of student achievement collected for grading.
6
The problem of the percentage scale
In this context, the percentage grade scale with 101 discrete levels of student performance, two-
thirds of which typically designate failure, presents a noteworthy challenge. Some educators
believe the large number of levels in the percentage grade scale makes it more precise than scales
with fewer levels, such as the five-level letter-grade scale (A, B, C, D, and F) used in most
colleges and universities. But the reality is far more complex. In the absence of a truly accurate
measuring device, adding more levels to the measurement scale offers only the illusion of
precision. In fact, the large number of levels in the percentage scale, coupled with the fine
discrimination required to determine differences among those levels using indirect measures, can
lead to greater subjectivity, increased error, and diminished reliability. Researchers have
recognized these problems for well over a century (Starch & Elliot, 1912, 1913).
In defense of the percentage grade scale, some educators argue that the percentage of questions
on an assessment that students answer correctly represents a direct measure of achievement.
They reason that correctly answering 80% of the questions on an assessment means the student
has learned 80% of the material or mastered 80% of the learning goals. While the percentage of
questions answered correctly might seem like a direct measure, the interpretation of this
percentage involves numerous complexities. The format, difficulty, and alignment of the
questions to instruction, as well as other factors, can significantly impact the accuracy and
reliability of percentage-based scores. This complexity underscores the challenges in achieving
true precision in grading, even when using seemingly straightforward measures like percentage
correct. The perceived precision of percentage grading methods is far more illusory than real,
due to the inherent subjectivity and complexity of the indirect measures involved (Guskey,
2013).
Fewer levels, greater accuracy
Significant research shows that optimal discrimination, validity, and reliability are obtained
using grading scales with four to seven levels or categories (Lozano, Garcia-Cuento, & Muniz,
2008; Preston & Colman, 2000). Teachers with comparable knowledge and experience are far
more likely to agree when distinguishing an A level from a B level of performance than when
distinguishing a 90 from an 89 using the percentage scale. The use of clear and well-defined
scoring criteria, along with a limited number of grading categories, helps ensure a shared
understanding among teachers and promotes more consistent grading practices. This
understanding is particularly important for implementing grading reforms that prioritize fairness,
transparency, and equity.
Report multiple grades
Every marking period, teachers gather multiple forms of evidence on students’ performance that
reflect three different types of grading criteria: product, progress, and process (Guskey, 1994,
1996).
7
Product criteria show how well students have achieved specific academic learning goals,
standards, or competencies, typically demonstrated through major assessments,
classroom quizzes, compositions, projects, reports, and other culminating activities.
Progress criteria, sometimes called “growth” or “development” criteria, show how much
students have gained or improved in their learning. Students could make outstanding
progress, but still not be achieving at grade level, and highly skilled students might
achieve the product criteria without making notable improvement.
Process criteria describe student behaviors that facilitate, broaden, or extend learning.
These may include activities that enable learning, such as formative assessments,
homework, and class participation. They also may reflect nonacademic social-emotional
learning skills, such as collaboration, goal setting, perseverance, habits of mind, or
citizenship. In some cases, they relate to students’ compliance with procedures, like
turning in assignments on time.
A hodgepodge grade
At the end of each marking period, teachers assign weights to these different sources of evidence
to tally a final score recorded on the report card (Sun & Cheng, 2013). Researchers call this a
“hodgepodge” grade (Brookhart, 1991) because it mixes achievement and other factors related to
behavior, attitude, effort, and improvement. It makes the report card grade a confusing
amalgamation that is impossible to interpret clearly and accurately (Guskey, 2020). An A, for
example, might mean that the student knew all the concepts before instruction began (product);
that she did not achieve the learning goals but made significant improvement (progress); or that
she put forth extraordinary effort (process).
Recognizing these problems, some grading reform advocates recommend that teachers use only
product criteria in determining students’ grades. They point out that the more progress and
process criteria come into play, the more subjective, biased, and inequitable grades become
(Feldman, 2023). How can a teacher know, for example, how difficult a task was for students or
how hard they worked to complete it? Many teachers point out, however, that if process elements
like homework and punctuality in turning in assignments don’t count, students will lose all
motivation to do homework or complete assignments on time and evidence from schools
implementing these practices confirm their apprehensions (Randazzo, 2023; We Are Teachers,
2023).
Multiple grades for multiple criteria
A far more effective solution is not to eliminate progress or process criteria from grading but to
report these criteria separately. Teachers simply extract evidence on the important nonacademic
aspects of students’ performance and report those in their own section of the report card and the
transcript.
Although reporting multiple grades is relatively new in most U.S. schools, the practice has a
long-established history in other countries. In Ontario, Canada, for example, teachers have
reported multiple grades for students from 1st grade through high school for decades. Every
8
marking period, in addition to academic grades, teachers record grades for responsibility,
independent work, initiative, organization, collaboration, and self-regulation. A major
component of students’ responsibility grade is “Completes and submits class work, homework,
and assignments according to agreed-upon timelines.” Students’ grades for responsibility and
other process elements are reported on a four-level scale with the categories Excellent, Good,
Satisfactory, and Needs Improvement (Ontario Ministry of Education, 2023).
Benefits for students, parents, teachers, and more
Teachers using multiple grades say that knowing these aspects of performance will be reported
on both the report card and transcript compels students to act more responsibly. Parents benefit
because the report card provides a more detailed, comprehensive picture of their child’s
performance. In addition, because product grades are no longer tainted by evidence based on
behavior or compliance, those grades more closely align with external measures of achievement
and content mastery, such as standardized test scores a quality college and university
admissions officers favor (Buckmiller & Peters, 2018). In essence, removing process elements
from the achievement (product) grade makes grades more accurate, honest, and equitable
indicators of student learning.
Most important, reporting multiple grades doesn’t require extra work for teachers. In fact, it’s
less work. Teachers already gather evidence on product, progress, and process criteria. For
example, most keep records of students’ scores on various measures of achievement, as well as
homework completion, class participation, collaboration in projects, and so on. By simply
reporting separate grades for these different aspects of learning, teachers avoid the dilemmas
involved in determining how much to weigh each element when calculating a single grade.
A more accurate picture
Establishing greater consistency in grading policies and practices doesn’t require all teachers to
grade in the same way. Just as assessment strategies must be adapted to fit the learning goals in
different subjects, grading procedures must be similarly adapted to accurately communicate
students’ achievement of those learning goals.
Schools where educators reach consensus on a purpose statement, adopt a grading scale with
four to seven categories of student performance, and report academic achievement and
nonacademic learning goals separately have the necessary foundation for more meaningful and
effective grading reform. With these three crucial steps accomplished, most teachers find it easy
to transition to standards-based or competency-based grading. They recognize how they can
break down an overall achievement grade to report on the different standards that it summarizes.
Many see this transition as a natural progression in their efforts to provide meaningful summaries
of students’ performance. Without adding to teachers’ workload, these steps address the greatest
concerns of parents and families; facilitate better communication between school and home; and
ensure greater honesty, accuracy, and equity in grading.
9
References
American Psychological Association. (2018). Indirect measurement. In APA dictionary of
psychology. https://dictionary.apa.org/indirect-measurement
Brookhart, S.M. (1991). Grading practices and validity. Educational Measurement: Issues and
Practice, 10 (1), 35-36.
Brookhart, S.M. (2011). Starting the conversation about grading. Educational Leadership, 69 (3),
10-14.
Buckmiller, T.M. & Peters, R.E. (2018). Getting a fair shot? School Administrator, 75 (2), 22-25.
Feldman, J. (2023). Grading for equity: What it is, why it matters, and how it can transform
schools and classrooms (2nd ed.). Corwin.
Franklin, A., Buckmiller, T., & Kruse, J. (2016). Vocal and vehement: Understanding parents’
aversion to standards-based grading. International Journal of Social Science Studies, 4 (11),
19-29.
Gogerty, J.I. (2016). The influence of district support during implementation of high school
standards-based grading practices [Unpublished doctoral dissertation]. Drake University,
Des Moines, Iowa.
Guskey, T.R. (1994). Making the grade: What benefits students. Educational Leadership, 52 (2),
14-20.
Guskey, T.R. (1996). Reporting on student learning: Lessons from the past Prescriptions for
the future. In T.R. Guskey (Ed.), Communicating student learning (pp. 13-24). ASCD.
Guskey, T.R. (2013). The case against percentage grades. Educational Leadership, 71 (1), 68-72.
Guskey, T.R. (2020). Breaking up the grade. Educational Leadership, 78 (1) 41-46.
Guskey, T.R. (2021). Learning from failures: Lessons from unsuccessful grading reform
initiatives. NASSP Bulletin, 105 (3), 192-199.
Guskey, T.R. (2024). Engaging parents and families in grading reforms. Corwin.
Guskey, T.R. & Bailey, J.M. (2010). Developing standards-based report cards. Corwin.
Guskey, T.R. & Brookhart, S.M. (2019). What we know about grading: What works, what
doesn’t, and what’s next. ASCD.
Guskey, T.R. & Link, L.J. (2019, April). Understanding different stakeholders’ views on
homework and grading [Paper presentation]. Annual Meeting of the American Educational
Research Association, Toronto, ON, Canada.
10
Gwet, K.L. (2021). Handbook of inter-rater reliability: The definitive guide to measuring the
extent of agreement among raters (5th ed.). AgreeStat Analytics.
Hallgren K.A. (2012). Computing inter-rater reliability for observational data: An overview and
tutorial. Tutor Quant Methods Psychol, 8 (1), 23-34.
Jansen, B.J., Salminen, J., Jung, S., & Almerekhi, H. (2022). The illusion of data validity: Why
numbers about people are likely wrong. Data and Information Management, 6 (4), 1-14.
Lozano, L.M., Garcia-Cuento, E., & Muniz, J. (2008). Effect of the number of response
categories on the reliability and validity of rating scales. Methodology, 4 (2), 73-79.
Ontario Ministry of Education. (2023). Elementary and secondary report card templates.
www.ontario.ca/page/elementary-and-secondary-report-card-templates
Preston, C.C. & Colman, A.M. (2000). Optimal number of response categories in rating scales:
reliability, validity, discriminating power, and respondent preferences. Acta Psychologica,
104 (1), 1-15.
Randazzo, S. (2023, April 26). Schools are ditching homework, deadlines in favor of “Equitable
grading.” Wall Street Journal.
Russell, M.K. & Airasian, P.W. (2011). Grading. In Classroom assessment: Concepts and
applications (7th ed.). McGraw-Hill.
Starch D. & Elliott, E.C. (1912). Reliability of the grading of high school work in English.
School Review, 20, 442-457.
Starch, D. & Elliott, E.C. (1913). Reliability of the grading of high school work in mathematics.
School Review, 21, 254-259.
Sun, Y. & Cheng, L. (2013). Teachers’ grading practices: Meaning and values assigned.
Assessment in Education: Principles, Policy & Practice, 21, 326-343.
We Are Teachers Staff. (2023, September 7). “No zeros” is sold as an equity shortcut. It’s not.
We Are Teachers. www.weareteachers.com/equitable-grading/.
Young, J. (2023, November 30). Some schools are changing how they grade students. Here’s
why some parents are upset. USA Today.
This article appears in the May 2024 issue of Kappan, Vol. 105, No. 8, p. 52-57.
ABOUT THE AUTHOR
Thomas R. Guskey is professor emeritus at the University of Kentucky, Lexington. He is the
author of Get Set, Go! Creating a Successful Grading and Reporting System.