Estimates of repeatability and reproducibility are given in Table 1. The standard deviations are also shown in Figure 1.
These estimates have been calculated by excluding the data for Laboratory B at all three levels because their laboratory averages are outliers at all three levels. The data for Laboratory H in Level 2 were excluded because their laboratory average and their between-test-specimen standard deviation were outliers in this level. The data for Laboratory I in Level 3 were excluded because their between-test-result range was an outlier in this level.
In Figure 1 it can be seen that the between-test-specimen, repeatability, and reproducibility standard deviations all approximately follow straight lines, so that it is possible to summarise the results by fitting the functional relations shown in the figure.
The draft CEN Standard quotes values of r = 0.04 and R = 0.26 for an 8/16mm homogeneous aggregate, but does not state the level (i.e. average FTV) at which these results were obtained. From Figure 1 it can be seen that the precision of the test depends very much on the level, so that it is not possible to compare the results obtained in this experiment with those given in the draft Standard.
It has been argued (J»rck, Sym and Powell, 1994. A study of mechanical tests of aggregates. Green Land Reclamation Ltd Report GLR 3036/03a.) that the reproducibility standard deviation of a mechanical test, when expressed as a coefficient of variation, should be no more than about 8%, if the test method is to be used to assess the compliance of aggregates with specifications. Because specified Freeze/Thaw values are upper limits, as with mechanical tests, the criterion may also be applied to the Freeze/Thaw test. The results in Table 3 show that the reproducibility of the Freeze/Thaw test fails to meet this criterion by a wide margin.
It is possible that the reproducibility found in this experiment is poorer than is achieved with the DIN test method when it is used by experienced German laboratories. If this is so, then proficiency testing as recommended above will enable laboratories in other countries to measure and improve their performance.
The draft CEN test method contains requirements for the temperature inside a can in the middle of the freezing cabinet. However, it is surprising that the method contains no other calibration requirements for the cabinet. It is possible for the temperature at other points in the cabinet to be significantly different from those at the middle, if the air is not circulated effectively. This could be checked by placing thermocouples (in air) at several points around the interior. It would also add confidence to the results if laboratories were required to report the cooling curve achieved in the central can with every set of results.
The result of a freeze/thaw test is likely to be sensitive to details of the way the test is carried out, such as the rates of cooling at various points in the cooling cycle, and the minimum temperatures achieved. It would be of interest to carry out ruggedness trials to establish the sensitivity of the test results to variations in these details. Such trials would show where tolerances in the method specification need to be tightened up to improve the reproducibility of the test.
The reproducibility of a test method cannot be smaller than its repeatability. The repeatability coefficients of variation in Table 3 all exceed 8% , so it will not be possible to achieve reproducibility coefficients of variation below 8% without improving the repeatability of the test. The repeatability of the Freeze/Thaw test has two components: the between-test-specimen variation (which can be checked using the critical range Wc), and the between-run variation (which can be checked using the repeatability limit r1). Methods for doing these checks in practice are outlined in the general section on repeatability checks. If the application of these checks does not produce the improvement that is needed, then it will be necessary to consider steps such as increasing the number of test-specimens, or tightening up the tolerances in the method so as to reduce the run to run variability.
Note that the importance of reproducibility will depend on the level of the results given by an aggregate. Thus if an upper limit of a FTV of 1.0% is imposed, then from the histogram of laboratory averages in Figure 1, it can be seen that, for the aggregate used as Level 1 in this experiment, it will not matter which laboratory does the test, it will comply. Likewise if an upper limit of 2% were to be applied to the aggregate used for Level 2, again it does not matter who does the test (provided that one does not choose Laboratories B or H). However, if an upper limit of 4% is applied to the aggregate used for Level 3, then it matters very much where the aggregate is tested. When the reproducibility is as high as found in this experiment, producers will have to supply materials that give results well below the specification limit so that the risk that a laboratory will report a non-complying result is small.