9 Levels - Episode #7 No comments yet
I’m still working on fixing the embedding of video; in the meantime, below is the link.
Episode #7 of the 9 Levels for PLT development and utilization for Response to Intervention.
I’m still working on fixing the embedding of video; in the meantime, below is the link.
Episode #7 of the 9 Levels for PLT development and utilization for Response to Intervention.
We have finished up the webinar series. I’m not sure why, but we’re having difficulties embedding Vimeo into Wordpress. I’ll work on it when my schedule slows down. In the meantime, here is the link for Webinar #6.
In our presentations, the 5 Whys seem to generate a great deal of interest. This is a good thing. 5 Whys is a simple technique for analyzing problems. Those who have tried out the technique know that there are some key subtleties that can make a considerable difference in the success of the experience. Gemba Panta Rei just put out a great little post on the 5 Whys and when to shift to Hows. Also, there’s a nice paragraph about why we need to stay away from the Whos.
Empowered High School Model Level 5 from Illinois Principals Association on Vimeo.
Empowered High School Model Level 3 - “Summative Assessments” from Illinois Principals Association on Vimeo.
The Empowered High School Model for 21st Century Schools Level 2 - Standards and Benchmarks from Illinois Principals Association on Vimeo.
The Empowered High School Model for 21st Century Schools Level 1 - Model Overview: Managing Continuous Program Improvement from Illinois Principals Association on Vimeo.
Lately, I have been asked some very sophisticated questions about why benchmarking is important, when is it required, and how we can measure performances that require unobservable thinking processes.
When standards matter:
The most common reason to benchmark is that common assessments are not always the same as summative assessments. Summative assessments are designed to measure the student’s ability to meet a standard or multiple standards by using subject content. Common assessments regularly only measure content. We need developmental performance data standard by standard. This is well explained in our presentations and videos.
See: http://www.empoweredhighschools.com/DdC-Summative%20Assessment.html
When strategies vary:
However, there is a more sophisticated problem. When the strategy used to master the skill or process is obvious and common, then the scores on the assessment may be predictive of how a student will perform on an external test or interim test (aligned to the external test). For instance, a certain number of math problems on a summative assessment may be predictive of performance on other tests. Therefore, benchmarking is not absolutely required. (However, benchmarking once or twice a year may help administrators track school-wide progress in a manner that all parties can understand.) Another more perfect example is music. If a student can masterfully perform a piece of music, which has a known level of sophistication, we can know exactly how well the student can play. Whatever rating system is used it need not be translated into benchmarks –at least, not for the music teachers.
When we try to measure the developmental growth of a student’s ability to write or measure reading comprehension, measurement becomes difficult. First, objective measures do not work as well as performance measures in these cases. Having a student answer multiple choice questions about writing rules is not as predictive of the student’s ability to write as having the student write and rewrite an essay. Measuring how a student comprehends what is read is even more difficult to measure. Since we cannot yet look into a student’s brain, we must, instead, observe the strategies the student is using. We can measure the student’s ability to demonstrate his or her proficiency at using a reading comprehension or writing strategy.
Measurement process looks like this:
Standard -> strategy-> formative scores -> summative benchmark scores-> interim assessment-> external test
In this case, the subject teacher can teach and measure the student’s formative ability to operate the strategy by using a uniform, scoring system agreed to by the PLT or school-wide PLC. After formative practice and re-looping, the student demonstrates the ability to use the strategy on a uniform, summative, performance assessment. The summative assessment is benchmarked to the standard. The PLT gets a performance report on how well each student is progressing by benchmark. It is important to understand that there is an assumption that being able to operate the strategy means the student can do the skill or standard, but it is the best we can do. When the student is assessed on an interim measure, the data can be used to determine if the correct strategy was used. (The reliability of the interim assessment to predict the external test score is a reliability problem for the vender, administration or the school-wide, inter-disciplinary PLC.)
Last, educators have asked, “Can the local summative, performance assessment actually be a better measurement than standardized testing when attempting to understand how a student can do very complex, cognitive performance? My opinion is that highly trained, experienced educators can cause students to perform at high levels of performance. Sports, art, and drama coaches do it all the time. However, no one is looking over their shoulder. Can you prove that your performance assessments are better measures? Politically, I do not know if we will ever have a choice. A school must have some external measure to validate what is done. You cannot ask the public to believe, if there is no evidence that you are right. However, I would hope over time that our country can develop complex performance assessments, such as is done by International Bachelorette or the Westinghouse Scholars and its national science scholarship contest. Westinghouse Scholars uses a true complex, performance assessment to measure the best science students in the country. Some schools build six year programs around these measures. I am sure that the science teachers, who coach these students, could teach us how to measure rigorously using performance assessment. IB offers an even more practical solution for performance assessment.
This is a very short explanation. I hope it helps a little.
Thanks for reading,
Howard McMackin, Ph.D
Senior Partner,
Empowered High Schools
This may be one of the most important posts I’ve written on any of my blogs in a very long time. Howard and I have spent a great deal of time in our presentations preaching about the need for educators to adopt a problem-solving mentality. A scientific approach to scanning for problems, digging into them for root causes and then creating solutions with feedback loops for improvement. Our audiences seem very interested in this line of thinking.
Realizing that I might need some places to refer people for additional professional development, I did a Google Search. Quickly, I was overwhelmed. Then, by accident, I was reading an RSS feed from 800-CEO-READ and found a reference to Problem-Solving 101 by Ken Watanabe. I quickly ordered the book and began reading it upon arrival. It’s a must read! It is a great resource for training teams how to do problem-solving. Normally, I buy this sort of material from Harvard Business Press and then have to spend hours wrapping my brain around how I can translate it into educationese. This will not be necessary with the Problem-Solving 101. The content is quickly adaptable for teams and very straightforward. You will not overwhelm your team leaders with an overabundance of details.
Now, don’t miss his website. It’s awesome. He has a number of videos that you might want to preview before buying the book or use as teaching aides.
Thank you for reading.
Charles
Thanks to all the people we met at our presentation at the Ohio ACT Confirence in January!
One of the principals asked the following two questions. Below is my answer to the questions, but there is so much more our contributors can add.
So if you can help out, please add your ideas. Thanks–Howard McMackin
—————————————————————-
Question 1 : What examples of inter-rater reliablity problems do you have?
Here are two types of inter-rater reliability concerns that we know must be addressed as part of our processes and a third, problem that is newly encountered:
1. Inter-rater Process for Performance Assessments (papers, speeches, using a method or strategy, thinking skill processes, etc.);
a. This process begins with each team teacher scoring a sample of performances from different students. Then the team discusses how each teacher has scores their samples. After teachers have clarified their interpretation of the rubric, the team follows a consensus building process to establish team rules for scoring. All teachers must strive to score exactly the same. It is more important that everyone agrees than to align to an external, validated performance norm—that can come later. (If team teachers use different criteria or interpret it differently, the data in performance reports cannot be compared.)
b. After scoring has been discussed and agreements established, all the teachers should test their understanding and agreement by scoring another single performance. In this case, all teachers score the same performance. After comparing their scores, they need to discuss any scoring differences and readjust. Once all the team’s teachers can score a test performance uniformly, they are ready to begin scoring and producing reports.
c. Because teachers forget or change their minds, it is important to retest inter-reliability periodically by pulling individual performances, removing names (if possible) and having each member rescore and discuss. Over time, the team will get better, but we have experienced dramatic setbacks when this step is skipped.
d. Although teachers may fight this process as a “waste of time”, they are usually surprised by the actual score disparities. Gradually, they learn what inter-rater reliability means. It enters their vocabulary and their sensitivity to inter-rater problems become routine. For instance recently, one teacher was getting much higher scores than other team members. The team’s first thought was a possible inter-rater problem. (However, the team discovered that the teacher had a better strategy.)
2. Math is not as objective as one may think:
Math teachers must follow a similar process as performance for insuring inter-rater reliability. Teacher must answer questions about what is more important:
a. Some teachers believe that only correct answers deserve credit;
b. Other teachers believe that the steps in the process are most important;
c. While others believe that a combination of the above should be used.
d. When real-world (such as science), multi-level contextualized problems are required, teachers may disagree on the correct process steps and how to reward creativity and process efficiency.
The team must resolve these opinions and adopt common criteria for scoring and assigning developmental benchmarks.
The discussion may be as heated or even more so than such discussions the humanities. However, once consensus is achieved, it is usually easier for each cooperating teacher to understand what to do and be less likely to revert. Usually, it is easier for math to know the range of problems that will be taught. While the humanities teacher must constantly sort through contextual differences, which may cause legitimate confusions, assignment to assignment.
3. Using same rubric for multiple levels:
In ESL, the district has one performance rubric for three levels of ESL. The rubric has only 4 benchmark levels. When the data was first collected, teachers marked the ESL 1s doing poorly and ESL 3s doing much better. Most teachers thought that the 1-4 levels were universal for all students. The district thought mastery was defined differently at each ESL level. Since there was no district guidance, the rubrics were misinterpreted by teachers.
The problem can be solved by either creating a larger rubric, such 12 levels (4 X 3 program courses). Then the district can float mastery across the standard or choose a single mastery level and create progress goals (instead of mastery) for the early courses in the program sequence of course.
Question 2: Can you give an example of a School Wide-Performance Goals?
We are still designing. However, we are close. Presently, we have Explore scores for all entering freshmen, because we use Explore as our baseline screener. We can reduce the 1-25 Explore scores to 4 ranges (like what we do with ACT–it 1-6, because it has two ranges higher than Explorer). We plan to label each student 1-4 to create four demographic groups. Then we can track how well each group achieves in our programs, using the existing program benchmark (rubric) system .
By doing this, we do not need the unknowable socio-economic backgrounds of our students such as father’s income and mother’s education. We will know which students entered according to four levels of preparedness (regardless of cause or correlation). Then we can measure student progress according to group, just like we do with gender and race. Since the groups are originally labeled by Explore ranges, it becomes easy to compare the group to the ACT/CRSS benchmarks in program standard rubrics.
Example possibilities:
We may make a set of school goals, which may look something like a variation on this:
At RMHS, each Explore group will grow 2 ACT benchmark levels by the end of sophomore year in each measured program standard.*
PLTs can then match their goals with language such as:
Sophomore Social Science will cause 60% of Group 2 students to achieve mastery in on writing standards.*
*Please understand the above are NOT actual goals, but purely examples of goal language construction.