Training Systems Using Python Statistical Modeling

上QQ阅读APP看书，第一时间看更新

Testing with two samples

If we assume that our data was drawn from normal distributions, the t-test can be used. For this test, we can use the statsmodels function, ttest_ind(). This is a more stable function from the package, and uses a different interface. So, here, we're going to test for a common mean.

Let's assume that your company has decided to stop outsourcing resistor production, and they're experimenting with different methods so that they can start producing resistors in-house. So, they have process A and process B, and they want you to test whether the mean resistance for these two processes is the same, or whether they're different. Therefore, you feel safe, assuming again that the resistance level of resistors is normally distributed regardless of whatever manufacturing process is employed, and you don't assume that they have the same standard deviation. Thus, the test statistic is as follows:

So, let's use this test statistic to perform your test:

Our first step is to load in the data, as follows:

Our next step is to load and define the ttest_ind function, as follows:

This will give us a p value. In this case, the p value is 0.659—this is a very large p value. It suggests that we should not reject the null hypothesis, and it appears that the two processes produce resistors with the same mean level of resistance.