What is the Box Cox Transformation?
A Box Cox Transformation is a simple calculation that may help your data set follow a normal distribution. Box Cox transformation was first developed by two British statisticians, namely George Box and Sir David Cox.
When the assumption of data normally distributed is violated, or the relationship between the dependent and independent variables in the case of a linear model is not linear, in such situations some transformation methods may help the data set follow a normal distribution. Box Cox is one such transformation method.
The basic assumption of Box-Cox is data must be positive (no negative values) and also data should be continuous.
What Does Box Cox have to do with Multiple Regression Analysis?
Box-Cox transformation is the basic tool in Multiple Regression Analysis. The assumption of any linear model is that relationship between the response variable Y, and the predictor variable X is linear; however, this is not true all the time, so when the relationship between the dependent variable and independent variable is not linear and still wish to fit a linear model to the data then consider a Box-Cox transformation method. This will transform the predictor variable or the response variable and then fit a linear model to the data to study the predictor variable’s effect on the transformed responses.
We assume that linear models have normally distributed error terms – a basic assumption. Significant violation of the assumption also leads to committing the type I or type II error.
In addition, the benefits of Box-Cox transformation include less skewness, maintaining the linear relationship between response variable Y and the predictor variable X’s, almost equal spread, etc.,
The Box Cox Equation
The original form of the Box Cox transformation is given by
In a 1964 paper, Box-Cox proposed an extended form of the two-parameter Box-Cox transformation
When would you use this transformation during the DMAIC process?
Process capability studies are performed during the Measure phase of DMAIC. The first step for process capability studies is to check whether the data follows normal distribution or not (it is more important for parametric tests like ANOVA etc.).
The Box-Cox method helps to address non-normally distributed data by transforming it to normalize the data. However, there is no guarantee that data follows normality because it does not really check for normality.
The Box-Cox method checks whether the standard deviation is the smallest or not. Hence it is always advisable to check the transformed data for normality using a probability plot or Q-Q(Quantile-Quantile) plot.
How to use Box Cox to calculate Process capability for non-normal data
There may be no advantage in calculating the process capability for non-normal raw data, in other words, it may give inaccurate results. Data should be transformed to normalize before calculating the process capability. While there are various data transformation methods exist like log transformation, power transformation, Exponential, Reciprocal, etc.,
In order to use the right transformation method, some data analysis may be required. One of the foremost power transformation methods is the Box-Cox method.
The formula is yI = yLambda
Where Lambda power must be determined to transform the data, the usual assumption of parameter Lambda values varies between -5 and 5. The likelihood of transformed data is maximum, and data are normally distributed when the standard deviation value is small.
Most Common Box-Cox Transformations
Example: if the Lambda is 2 then yLambda = y2
An Example of a Box Cox Transformation by Hand
Box Cox transformations in practice are typically done by leveraging software that can try many different variations of Box Cox transforms very quickly.
Doing it by hand in practice is time-consuming and error-prone. Imagine trying varying types of lambda by hand until you run them all or run out of patience!
“But what about on a Six Sigma exam?” I can hear you say. “I won’t have MiniTab or R Studio available! What will I do?”
Not to worry.
In my experience, the questions on the exam are rather simple. You’re usually just having to do or understand the following:
- Sometimes your data doesn’t appear to be normal, but if you transform it, you can achieve normality–which then opens up a bunch of other properties and tools for you (or at least easier tools;’)).
- While Box-Cox is complex, questions on Six Sigma exams are usually very simple. Just substitute variables into the following equation:
- X(transform) = X ^ Lambda
Example: if the Lambda is 2 then yI = y2
You must replace your original data with the “new equation” using a lambda of 2.
As the example chart here shows, all you’d have to do is just square the original value.
“Old measure” 2 now becomes “New measure” 4 because we are simply substituting into X(transform) = X ^ Lambda for the following: X(2) = 2 ^ 2.
An Example of a Box-Cox Transformation Using Minitab
An Example of a Box-Cox Transformation Using Minitab
Box Cox Transformation in Minitab tool, Excel Analysis tool pack, or any other statistical software tools. These tools automatically calculate an appropriate power transformation
Example: Raw data
Step 1: Perform the normality test to see whether the data follows normal distribution or not
From the above graph, the P value is less than 0.005; hence the data does not follow a normal distribution, and the histogram clearly shows data skewed on one side.
Step 2: Transform the data using Box-Cox Transformation
Transformed data
Step 3: Again test the normality
From the above graph, the p-value is greater than 0.05; hence it is clear that the data follows a normal distribution, and from the histogram also, we can see the data is uniformly distributed.
What Do You Need to Know for Your Six Sigma Exam?
Green Belt
The IASSC Six Sigma Green Belt BOK requires as part of the Improve Phase.
Black Belt
The IASSC Six Sigma Black Belt BOK requires as part of the Improve Phase.
The ASQ Six Sigma Black Belt BOK requires the following:
Process capability for non-normal data
Identify non-normal data and determine when it is appropriate to use Box-Cox or other transformation techniques. (Apply)
Helpful Videos
This first video has poor audio but gives a good overview.
This second video shows a great practical example of leveraging R studio. You’re unlikely to have to go into this level of detail on an exam. I include it because it’s a great example with very helpful plots of data that help you visualize what a transformation can do to help you progress through your data analysis and come to viable conclusions.