Attribute Agreement Analysis

By Robert Minardi

There have been amazing advancements in freeform lens designs. They can provide optics and cosmetic appeal superior to their counterparts of just a few years back.

We want to provide these lenses to our customers at the highest quality possible. To do that, we must ensure our labs’ inspection process is, first and foremost, accurate, and in line with industry standards. Also, we must certify that our labs are meeting these standards consistently and with negligible variance among inspectors.

To say this another way, our inspection methods should be Repeatable and Reproducible. These two terms can be defined as follows:

Repeatability – Can an individual inspector arrive at the same results, testing the same parts with the same equipment?

Ex: If we ask our inspector, Ben, to inspect 10 lenses, three times each, with the same lensometer, will he always pass or fail the same lenses? Will he be able to repeat the same results every time?

Reproducibility – Can multiple inspectors arrive at the same result as a baseline value, using the same lenses and same equipment?

Ex: If we ask Roxy, Dan and Bob to inspect 10 lenses, will they arrive at the same results as our most seasoned inspector Brian? Can our quality standards be reproduced across all inspectors?

How do we verify repeatability and reproducibility? I’m glad you asked. We’ll do an Attribute Agreement Analysis (AAA). One thing we need to be clear about from the start: An AAA is used to verify the measurement system and not the lenses. The quality of the lens doesn’t matter. What matters is that your inspectors all agree on what’s good or bad, both with each other (and themselves) and established quality standards. Also, the AAA we’re doing is only considering binary data—meaning: pass/fail or good/bad.

Setting up the analysis

First, create a matrix with the items to be tested on the vertical axis and the inspectors on the horizontal axis. For simplicity sake, we’re just going to be using five lenses and two trials. When you do this for real, you should use at least 10 lenses and three trials. The more lenses used and trials performed, the more useful and accurate the analysis becomes.

Notice the numbers following our inspectors’ names. This stands for the first and second trial. Meaning, the same five lenses will be tested twice. The lenses are inspected twice to test for repeatability of the inspection results. Conversely, we have multiple inspectors (Jeff and Stephanie) being compared against our expert’s results to test for reproducibility, the expert being a highly qualified inspector whose judgment you can count on.

Also, if these lenses are relatively new to your lab, you may or may not have bad lenses for the analysis. You shouldn’t do this study with all good lenses, so you’re going to have to fib a little.

I recommend creating another table that simply has the lens aliases (A to E) and the Rxs listed like so:

There’s a couple reasons for this. First, if you don’t have bad lenses, you can adjust the numbers in the Rx and make them “bad.” Remember, we’re not worried about if the lenses are actually good or bad, but whether our inspection process is reliable. Second, it helps mask the identity of the lenses. If an inspector is using a work-ticket, they could plainly see they’re inspecting Mark Stevenson’s Rx again and they’re more likely to agree with themselves across trials.

At this point, the expert will have evaluated the lenses and determined which are good and which are not.

Perform the analysis

Now that we have everything set up, have the inspectors verify the lenses, using the same lensometer as the expert, and put a P (for pass) or F (for fail) in the appropriate boxes for each trial. In the end, you’ll end up with something like this:

So, what can we learn from this?

To start, the measurement system yielded an overall accuracy of 70% because we had 14/20 (x 100) lenses inspected that agreed with the expert’s evaluation.

Starting with Jeff, he has an inconsistent result for lenses C and D across his two trials. And, he disagreed with the expert on his first trial and agreed on the second trial. This means out of his 10 inspections, his results agreed with the expert 80% (8/10 x 100) of the time and with himself 80% of the time.

Looking at Stephanie’s results, we see that she’s in agreement with herself 100% of the time, as she passed and failed the same lenses across both of her trials. On the other hand, she’s only in agreement with the expert 60% (6/10 x 100) of the time. She’s obviously doing something wrong, but hey, she’s doing it consistently!

Okay, so this tells us something. While Jeff’s results aren’t bad, 80% agreement and above is generally accepted as satisfactory, Stephanie may need some additional training.

Also, there’s another pattern in our results. Notice how lenses A, B and E are agreed upon across the board, yet lenses C and D have the most error in relation to the expert? This could point towards another training discrepancy. What’s different about those lenses that’s causing confusion and incorrect results? This warrants investigation for sure.

In our example, we used pass/fail for simplicity, but you could do an AAA any number of ways. For example, create a matrix like we did here, but instead of pass/fail, have your inspectors verify a selection of lenses and evaluate the entire Rx. Have a fancy new AR coating? Set up a matrix and evaluate how well your inspectors spot flaws.

I’m confident you can see the benefits of doing this type of analysis. Reduced remakes, reduced unhappy customers, and therefore, by proxy, reduced blood pressure (yours!) would be enough to give this a whirl. ■

Robert Minardi, ABO-AC, is currently a Software Engineer at Ocuco Ltd. He’s been in manufacturing for about 25 years, and is a Lean Six Sigma Black Belt with a background in quality control.