Assessment Scores
On this Page
The following paragraphs briefly describe the types of scores provided, appropriate use of scores, and cautions for score use. For more detailed technical information on assessment scores, refer to the Technical Digest, available on the Texas Education Agency (TEA) Assessment Reports and Studies webpage.
Scale Scores
The scale score is a statistic that allows comparison of scores by adjusting for variations in the difficulty of the forms used in different administrations. Thus, the scale score can be used to determine whether a student achieved a particular performance or proficiency level, to compare one student’s performance to another’s on the same assessment, and to compare performances by cohorts of students on the same assessment in different years. The scale score can be used to evaluate a student’s progress across grades 3–8 within the same subject where there is a vertical scale (i.e., mathematics, reading language arts [RLA]). However, the scale score cannot be used to compare a student’s performance in one subject or course with his or her performance in another subject or course. See Cautions for Score Use for more information.
Raw Scores
The raw score represents the number of points earned on an assessment. By itself, the raw score has limited utility; it can be interpreted only in reference to the total number of possible points on the assessment. Raw scores should not be compared across test administrations. The raw score is reported for the assessment and for each reporting category. Refer to the Raw Score Conversion Tables webpage for raw score to scale score conversion tables.
Appropriate Score Uses
State assessment results have several uses, both for individual students and for comparing the performance of groups. A more detailed explanation of appropriate score uses can be found in the Technical Digest, available on the Assessment Reports and Studies webpage.
Individual Students
A student’s scale score indicates whether he or she has met a performance level and how far the student’s achievement is above or below a performance level.
Assessment results can be used to compare the performance of an individual student to the performance of a demographic group, a program group, or an entire campus or district in a particular grade level or course. For example, the score for a Hispanic student in a gifted and talented program could be compared to the average scores of Hispanic students, to the scores of other gifted and talented students, or to the scores of all the students at the campus assessed at that grade level or in that course.
Groups of Students
Assessment results can be used to compare the performance of different demographic or program groups. For example, all STAAR scores can be analyzed in the same grade and subject or course for any single administration to determine which demographic or program group had the highest average scale score, the lowest percentage achieving each performance level, or the highest percentage achieving Masters Grade Level performance.
Assessment results can also be used to help evaluate the academic performance of demographic or program groups in core academic areas. For example, aggregations of reporting-category data can help district and campus personnel identify areas of potential academic weakness for a group of students. This same methodology can be applied to an entire campus or district.
In addition, assessment scores for a student, student group, campus, or district can be compared with regional and statewide performance within the same grade and subject or course for any test administration.
Cautions for Score Use
Cautions must be kept in mind when analyzing state assessment results. For example, scale scores allow for a comparison of assessment scores across test administrations within a particular grade and subject or course: If a student takes the STAAR Algebra I assessment in May and takes the assessment again in June, the scores from those administrations could be compared. However, scale scores cannot be used to compare achievement across subjects. For example, it is not appropriate to say that a 3800 on the STAAR Biology assessment represents the same level of achievement as a 3800 on the STAAR Algebra I assessment.
Additionally, because constructed-response questions are likely to differ from one administration to the next, the raw score distributions across administrations are not directly comparable.
More detailed technical information on cautions for score use is provided in the Technical Digest, available on the Assessment Reports and Studies webpage.
Reporting-Category Information
Reporting-category information at the individual student level should be used with caution due to the limited number of questions in each reporting category. When aggregated at the campus or district level, such information might be useful in helping campus personnel identify skill areas in which further diagnosis is warranted. As with all assessments given at a single time, the data generated from this snapshot should be used in conjunction with other evaluations of performance to provide an in-depth portrait of student achievement. Once an area of possible weakness has been identified, supplementary data should be gathered to further define which instructional intervention would be most effective.
Furthermore, because each assessment is equated only at the total assessment level, year-to-year comparisons of reporting-category performance should be made with caution. Assessments are constructed so that the difficulty of a given reporting category is similar for each administration of the assessment. However, some fluctuations in the difficulty of the reporting categories do occur at every administration. Observing trends in reporting-category performance over time, identifying patterns of performance in reporting categories assessing similar skills, and comparing campus or district reporting-category performance to that of the region or state are appropriate uses of group reporting-category information.
Program Evaluation
Standardized assessments are a valuable tool for evaluating education programs. However, any assessment can furnish only one part of the picture. State assessments are not able to identify, let alone measure, every factor that contributes to the success or failure of an education program. Assessment results can be most helpful when considered as one component of an evaluation system.