There are many subjects that we cover regularly here at Quality Digest. Chief among these are standards (ISO 9001 or IATF 16949, for example) methodologies (such as lean, Baldrige, or Six Sigma), and test and measurement systems (like laser trackers or micrometers). One topic, however, is consistently at the very top of the list when it comes to audience popularity—industrial statistics, including statistical process control (SPC).
It’s no secret why statistics hold such a place of honor among QD readers. Quite simply, without exquisite measurement and the ability to understand if a process is in or out of control, continuous improvement is impossible. Statistical analysis is the very underpinning of all things quality, and it touches on everything from proper management to understanding how to best leverage technological innovation.
With that in mind, I recently had the opportunity to exchange thoughts with Neil Polhemus, Ph.D., the chief technology officer for Statgraphics Technologies Inc. Late last year he released the book, Process Capability Analysis: Estimating Quality. A slightly edited version of our conversation follows.
Mike Richman: What are the costliest mistakes people make when it comes to industrial statistical analysis?
Neil Polhemus: I think the costliest mistake is confusing quantity of data with quality of data. Statisticians learned long ago that all data are not created equal. I remember attending a class on designed experiments when J. Stuart Hunter, Ph.D. contrasted practical accumulated records calculation (PARC) analysis with design of experiments (DOE). A PARC analysis relies on gathering together whatever data happens to be available, entering it into the computer, running some analyses, and hoping that useful models emerge. He contrasted this with the statistical DOE, where the data to be collected are carefully planned to allow maximum information to be generated by its analysis. Dr. Hunter made it crystal clear to all of us that statistical thinking is even more important before the data are collected than afterward. Optimizing the sampling plan has a huge effect on how successful one’s analytical efforts are likely to be.
MR: How would you describe the difference between “data” and “information”?
NP: Data are all around us. It consists of everything that we observe. Our eyes collect data, our ears collect data, and so do our senses. Yet that raw data is not useful until our brain puts it into a context where it can be applied to solve some practical problem, such as keeping our car in the center of the proper lane on a highway as we drive to work. Data analytics is all about taking data, some of it structured but much of it unstructured, and extracting from it inferences that can be applied to making correct predictions. When we fit a statistical model, our goal is to take a set of observed values and extract enough information to make predictions about situations that have not yet been observed. Of course, all of this occurs in a dynamic multivariate context where what we are trying to model has changed as soon as we’ve observed it.
MR: What about so-called “Big Data”? Does it offer specific opportunities and limitations in helping manufacturers improve?
NP: Collecting large amounts of data from industrial processes is now the rule rather than the exception. Most manufacturers have large databases from which they can examine the performance of their processes in myriad ways. Structuring that data in such a way that the relevant data can be accessed in a timely fashion is a huge challenge. I gave a talk recently at the Symposium on Data Science and Statistics. You could see there that getting the data in the proper format is often much harder than analyzing it. With Big Data, we also need to replace the idea of “statistical significance” with that of “practical significance.” With millions of observations, every P-Value you calculate will equal 0. The small sample tests we all learned about in Stat 101 are no longer useful. The primary concern becomes whether the data have been collected in a manner that eliminates any sampling biases.
MR: That’s an interesting take; can you offer a few examples of biases that you may run into in collecting data sets?
NP: Suppose you wish to compare a new methodology for producing a product with the method you’ve been using for many years. So you set up a pilot line to generate samples using the new method, and compare the results with samples that you’re currently shipping. Is that really a fair comparison? Are you using the same operators? Are they doing things the same on the pilot line as they would on a real production line? Does someone have a vested interest in demonstrating that the new method is better (or worse) than the current method? It’s always hard to collect data where everything remains the same except what you’re trying to test. Dr. Hunter used to refer to “lurking variables” that affect a response without your knowledge. Unless we protect against those unseen effects, we can easily make the wrong decisions.
MR: Do you think that Six Sigma, which of course is so widely in use now, still has things to teach us about performance improvement? How do you separate the PR side of the methodology from its nuts-and-bolts statistical analyses?
NP: Any methodology that gets practitioners to look at their data rather than simply filing it away is clearly useful. Although it’s easy to criticize specific Six Sigma assumptions, the general DMAIC approach is sound. You define the problem, be sure you can measure adequately what you need to measure, collect data to analyze, improve the process if possible, and institute controls to be sure that improvements are sustained over time. Nothing to argue with there. Was it a radical change from what industrial statisticians had been saying for decades? I don’t think so. But it reached the ears of upper management in such a way that institutional changes were made. Sometimes the promises may have been greater than what could really be delivered, and sometimes it was applied in ways that it was not designed for, but overall I believe that Six Sigma was beneficial. However, in the era of big data, major changes need to be made to the methodologies normally associated with Six Sigma. Methods based on small sample statistics break down when the data sets get large.
MR: Finally, what do you see as the biggest technological changes that will affect quality and statistics in, say, the next 10 or 20 years? Is it automation, the Internet of Things, quantum computing, or something else entirely?
NP: I’m not great at forecasting the future, but it’s always seemed to me that the goal we should be working toward is removing drudgery from our lives. A lot of people go to work each day, do a repetitive job that they don’t enjoy, and come home too tired to give their loved ones the attention they deserve. The more we automate these types of jobs, the more time people will have to do the activities they enjoy and the things that really count in life. How do we do this? By building quality into products, by constructing statistical algorithms that are as good or better than expert humans, by linking all our devices to an intelligent control system, and by turning data into useful real-time information. A brave new world is coming very quickly. Hopefully our social systems can handle the change.