Nine Levels #5 (IPA Webinar) No comments yet
Empowered High School Model Level 5 from Illinois Principals Association on Vimeo.
Empowered High School Model Level 5 from Illinois Principals Association on Vimeo.
Empowered High School Model Level 4 - “Formative Assessments” from Illinois Principals Association on Vimeo.
The Empowered High School Model for 21st Century Schools Level 2 - Standards and Benchmarks from Illinois Principals Association on Vimeo.
The Empowered High School Model for 21st Century Schools Level 1 - Model Overview: Managing Continuous Program Improvement from Illinois Principals Association on Vimeo.
Thanks to all the people we met at our presentation at the Ohio ACT Confirence in January!
One of the principals asked the following two questions. Below is my answer to the questions, but there is so much more our contributors can add.
So if you can help out, please add your ideas. Thanks–Howard McMackin
—————————————————————-
Question 1 : What examples of inter-rater reliablity problems do you have?
Here are two types of inter-rater reliability concerns that we know must be addressed as part of our processes and a third, problem that is newly encountered:
1. Inter-rater Process for Performance Assessments (papers, speeches, using a method or strategy, thinking skill processes, etc.);
a. This process begins with each team teacher scoring a sample of performances from different students. Then the team discusses how each teacher has scores their samples. After teachers have clarified their interpretation of the rubric, the team follows a consensus building process to establish team rules for scoring. All teachers must strive to score exactly the same. It is more important that everyone agrees than to align to an external, validated performance norm—that can come later. (If team teachers use different criteria or interpret it differently, the data in performance reports cannot be compared.)
b. After scoring has been discussed and agreements established, all the teachers should test their understanding and agreement by scoring another single performance. In this case, all teachers score the same performance. After comparing their scores, they need to discuss any scoring differences and readjust. Once all the team’s teachers can score a test performance uniformly, they are ready to begin scoring and producing reports.
c. Because teachers forget or change their minds, it is important to retest inter-reliability periodically by pulling individual performances, removing names (if possible) and having each member rescore and discuss. Over time, the team will get better, but we have experienced dramatic setbacks when this step is skipped.
d. Although teachers may fight this process as a “waste of time”, they are usually surprised by the actual score disparities. Gradually, they learn what inter-rater reliability means. It enters their vocabulary and their sensitivity to inter-rater problems become routine. For instance recently, one teacher was getting much higher scores than other team members. The team’s first thought was a possible inter-rater problem. (However, the team discovered that the teacher had a better strategy.)
2. Math is not as objective as one may think:
Math teachers must follow a similar process as performance for insuring inter-rater reliability. Teacher must answer questions about what is more important:
a. Some teachers believe that only correct answers deserve credit;
b. Other teachers believe that the steps in the process are most important;
c. While others believe that a combination of the above should be used.
d. When real-world (such as science), multi-level contextualized problems are required, teachers may disagree on the correct process steps and how to reward creativity and process efficiency.
The team must resolve these opinions and adopt common criteria for scoring and assigning developmental benchmarks.
The discussion may be as heated or even more so than such discussions the humanities. However, once consensus is achieved, it is usually easier for each cooperating teacher to understand what to do and be less likely to revert. Usually, it is easier for math to know the range of problems that will be taught. While the humanities teacher must constantly sort through contextual differences, which may cause legitimate confusions, assignment to assignment.
3. Using same rubric for multiple levels:
In ESL, the district has one performance rubric for three levels of ESL. The rubric has only 4 benchmark levels. When the data was first collected, teachers marked the ESL 1s doing poorly and ESL 3s doing much better. Most teachers thought that the 1-4 levels were universal for all students. The district thought mastery was defined differently at each ESL level. Since there was no district guidance, the rubrics were misinterpreted by teachers.
The problem can be solved by either creating a larger rubric, such 12 levels (4 X 3 program courses). Then the district can float mastery across the standard or choose a single mastery level and create progress goals (instead of mastery) for the early courses in the program sequence of course.
Question 2: Can you give an example of a School Wide-Performance Goals?
We are still designing. However, we are close. Presently, we have Explore scores for all entering freshmen, because we use Explore as our baseline screener. We can reduce the 1-25 Explore scores to 4 ranges (like what we do with ACT–it 1-6, because it has two ranges higher than Explorer). We plan to label each student 1-4 to create four demographic groups. Then we can track how well each group achieves in our programs, using the existing program benchmark (rubric) system .
By doing this, we do not need the unknowable socio-economic backgrounds of our students such as father’s income and mother’s education. We will know which students entered according to four levels of preparedness (regardless of cause or correlation). Then we can measure student progress according to group, just like we do with gender and race. Since the groups are originally labeled by Explore ranges, it becomes easy to compare the group to the ACT/CRSS benchmarks in program standard rubrics.
Example possibilities:
We may make a set of school goals, which may look something like a variation on this:
At RMHS, each Explore group will grow 2 ACT benchmark levels by the end of sophomore year in each measured program standard.*
PLTs can then match their goals with language such as:
Sophomore Social Science will cause 60% of Group 2 students to achieve mastery in on writing standards.*
*Please understand the above are NOT actual goals, but purely examples of goal language construction.