Interpreting Results

In this Section

This section provides information about interpreting the results of the Texas Assessment Program, which includes the following:

  • STAAR

  • STAAR Alternate 2

  • TELPAS

  • TELPAS Alternate

Appropriate Score Uses

State assessment results have several uses, both for individual students and for comparing the performance of groups. A more detailed explanation of appropriate score uses can be found in the Technical Digest, available on the Assessment Reports and Studies webpage.

Individual Students

A student’s scale score indicates whether he or she has met a performance level and how far the student’s achievement is above or below a performance level.

Assessment results can be used to compare the performance of an individual student to the performance of a demographic group, a program group, or an entire campus or district in a particular grade level or course. For example, the scores for a Hispanic student in a gifted and talented program could be compared to the average scores of Hispanic students, to other gifted and talented students, or to all the students at the campus assessed at that grade level or in that course.

Groups of Students

Assessment scores can be used to compare the performance of different demographic or program groups. For example, all STAAR scores can be analyzed in the same grade and subject or course for any single administration to determine which demographic or program group had the highest average scale score, the lowest percentage achieving each performance level, or the highest percentage achieving Masters Grade Level performance.

Other scores can be used to help evaluate the academic performance of demographic or program groups in core academic areas. For example, aggregations of reporting-category data can help district and campus personnel identify areas of potential academic weakness for a group of students. This same methodology can be applied to an entire campus or district.

In addition, all assessment scores can be compared with regional and statewide performance within the same grade and subject or course for any test administration.

Cautions for Score Use

Several cautions must be kept in mind when analyzing state assessment results. More detailed technical information on cautions for score use is provided in the Technical Digest, available on the Assessment Reports and Studies webpage.

Scale Scores

Scale scores allow for a comparison of assessment scores across test administrations within a particular grade and subject or course. For example, if a student takes the STAAR Algebra I assessment in May and takes the assessment again in June, the scores from those administrations could be compared.

None of the scale scores can be used to compare achievement across subjects. For example, it is not appropriate to say that a 3800 on the STAAR Biology assessment represents the same level of achievement as a 3800 on the STAAR Algebra I assessment.

Reporting-Category Information

Reporting-category information at the individual student level should be used with caution due to the limited number of questions in each reporting category. When aggregated at the campus or district level, such information might be useful in helping campus personnel identify skill areas in which further diagnosis is warranted. As with all assessments given at a single time, the data generated from this snapshot should be used in conjunction with other evaluations of performance to provide an in-depth portrait of student achievement. Once an area of possible weakness has been identified, supplementary data should be gathered to further define which instructional intervention would be most effective.

Furthermore, because each assessment is equated only at the total assessment level, year-to-year comparisons of reporting-category performance should be made with caution. Assessments are constructed so that the difficulty of a given reporting category is similar for each administration of the assessment. However, some fluctuations in the difficulty of the reporting categories do occur at every administration. Observing trends in reporting-category performance over time, identifying patterns of performance in clusters of reporting categories assessing similar skills, and comparing campus or district reporting-category performance to that of the region or state are appropriate uses of group reporting-category information.

Raw Score Distribution for Constructed-Response Questions

Because constructed-response questions from different administrations are likely different questions, the raw score distributions across administrations are not directly comparable.

Program Evaluation

Standardized assessments are a valuable tool for evaluating programs. However, any assessment can furnish only one part of the picture. State assessments are not able to identify, let alone measure, every factor that contributes to the success or failure of a program. Assessment results can be most helpful when considered as one component of an evaluation system.

Performance Standards and Points Earned

The performance standards are related to two factors: the difficulty of the questions on the assessment and the number of points students must earn to meet a specific performance standard. The performance standards are set on the original form of each grade and subject or course assessment. When different questions are used in another administration, the difficulty of the questions, and thus the overall difficulty of the assessment, might fluctuate. To compensate for slight changes in difficulty, the number of points required to meet a specific performance standard on the assessment is adjusted. This is also true for assessments with proficiency standards.