Estimates of the repeatability and reproducibility of the Schlagversuch are shown in Table 1 and in Figure 1. From the estimates of r1 it is possible to calculate a linear functional relationship between level and repeatability or repeatability standard deviation. These functions, given in Table 2, could be quoted in a CEN Standard as a statement of the repeatability of the Schlagversuch. Figure 1 shows, however, that there is not a linear relationship between reproducibility and level for this test: the reproducibility standard deviation for the low-strength aggregate is much higher than that for the other two levels. This may be because the test gives an unusually high reproducibility standard deviation with the particular aggregate used for Level 3, or generally with low-strength aggregates. It may be noted that, in contrast, the corresponding graph of the reproducibility standard deviations of the Los Angeles test showed a reasonable straight line relation.
The draft CEN method for the Schlagversuch states that the repeatability (r1) of the test is 0.82, and that its reproducibility (R1) is 2.11, for levels of SZ value between 10 and 25. From Table 1 it may be seen that the value given for repeatability in the draft CEN method is somewhat larger than the values of repeatability obtained in this experiment, and in the case of reproducibility the value given in the draft EN is not in agreement with the values obtained in this experiment because they vary so much between levels.
In this experiment, the participants prepared six specimens from each laboratory sample, so the notion of "test portion" is only a theoretical one. If Sspec is the standard deviation between-specimens within-laboratory-samples, then the critical range and the repeatability standard deviation are related to this standard deviation as follows:
Wc = 3.3 Sspec
Sr1 = Sspec / 3½
Wc = (3.3 3½) Sr1
Hence when the specimens are prepared as in this experiment, the critical range is directly related to the repeatability, and can be used by laboratories to check the repeatability of the test method.
It has been argued (J»rck, Sym and Powell, 1994. A study of mechanical tests of aggregates. Green Land Reclamation Ltd Report GLR 3036/03a.) that the reproducibility standard deviation of a mechanical test, when expressed as a coefficient of variation, should be no more than about 8 %, if the test method is to be used to assess the compliance of aggregates with specifications. This applies when the specification imposes an upper limit on the test results. The coefficient of variation of reproducibility for the Schlagversuch is shown, by this experiment, to be much lower than 8 %, so the reproducibility is satisfactory according to this criterion.
The proposed CEN specification for aggregates for concrete (CEN, 1994c. Proposed draft CEN standard: Aggregates for concrete including those for use in roads and pavements. CEN/TC 154/SC 2 committee paper N135, November, 1994.) allows aggregates to be assigned to a number of different categories according to the results that they give in the Schlagversuch. The table of upper limits from that specification is reproduced here as Table A, below. If a specification contains a pair of limits that are too close in comparison with the reproducibility of a test method, then this may cause difficulties in practice.
The problem is illustrated by Figure B, below. The results from most of the participants indicate that the aggregate used for Level 3 in this experiment complies with the upper limit for Category S3. However, three of the sixteen laboratories gave a laboratory average that is above the upper limit, so with this aggregate there is a risk of disagreements as to whether or not it falls within Category S3. If this aggregate had been a little stronger, and had given results a few percentage points lower, the dispersion of the results is such that some laboratories would then have found that it fell within Category S2, but other laboratories would have found that it did not.
The dispersion of the results obtained at Levels 1 and 2 is less than that obtained at Level 3, and the interval between the upper limits for the S1 and S2 categories is greater than that between the S2 and S3 categories, so for stronger aggregates the risk of disagreements will be smaller.
In order that this report does not give the impression that the risk of disagreements will occur only with the Schlagversuch, Figure C shows the corresponding specification limits for the Los Angeles test, together with the "European" results from the cross-testing experiment on this test. It is clear that the risk of disagreements exists also with the Los Angeles test.
|S4||No requirement||No requirement|
Figure B. Specification limits for aggregates for concrete and the reproducibility of the Schlagversuch.
SZ 30.0 ¦ value ¦ J 29.0 ¦ D G ------------------------Category S3 upper limit ¦ A I P 28.0 ¦ B N O Level 3 laboratory averages ¦ C 27.0 ¦ E H L ¦ K 26.0 ¦ F ¦ M 25.0 ¦ ----------------------------Category S2 upper limit ¦ 24.0 ¦ ¦ 23.0 ¦ ¦ D G 22.0 ¦ B E J L N O ¦ A F H P Level 2 laboratory averages 21.0 ¦ I K ¦ C 20.0 ¦ M ¦ 19.0 ¦ ¦ 18.0 ¦ ----------------------------Category S1 upper limit ¦ 17.0 ¦ ¦ 16.0 ¦ ¦ 15.0 ¦ ¦ 14.0 ¦ ¦ 13.0 ¦ ¦ G J 12.0 ¦ ¦ D F N O 11.0 ¦ A B E H K L Level 1 laboratory averages ¦ C I M P 10.0 ¦
Figure C. Specification limits for aggregates for concrete and the reproducibility of the Los Angeles test.
LA 46.0 ¦ value ¦ 44.0 ¦ ¦ b 42.0 ¦ ¦ T a 40.0 ¦ X Y Z -----------------Category S3 upper limit ¦ V 38.0 ¦ L O P Q S ¦ H I M N R U W Level 3 laboratory averages 36.0 ¦ E F J K ¦ G 34.0 ¦ B C ¦ D 32.0 ¦ ¦ A 30.0 ¦ ---------------------------Category S2 upper limit ¦ 28.0 ¦ ¦ 26.0 ¦ ¦ 24.0 ¦ W b ¦ U V Y Z a 22.0 ¦ J K N O P Q R S X Level 2 laboratory averages ¦ D E G I L M 20.0 ¦ A C F H T ---------------Category S1 upper limit ¦ B 18.0 ¦ ¦ 16.0 ¦ ¦ 14.0 ¦ ¦ 12.0 ¦ ¦ 10.0 ¦ ¦ D G K M N R W a b 8.0 ¦ A B C F H I J L O P Q S T U V X Y Z ¦ E Level 1 laboratory averages 6.0 ¦
Comparing the sensitivity ratios for the Schlagversuch in Table 4 with corresponding values for the Los Angeles test, it can be seen that the Schlagversuch gives higher "Level 1 - Level 2" sensitivity ratios, for both repeatability and reproducibility, than the Los Angeles test. This suggests that the Schlagversuch is better at discriminating between the "high" strength and "medium" strength aggregates. However, the situation is reversed with the "Level 3 - Level 2" comparison, where the Los Angeles test gives higher sensitivity ratios, so it appears that the Los Angeles test is better at discriminating between the "medium" and "low" strength aggregates. This latter observation suggests that the Los Angeles test might be the better test to use to test the mechanical strength of recycled aggregates, whose strengths are lower than those of the natural aggregates that are generally used for road construction or structural concrete.
The estimates of reproducibility are substantially larger than the corresponding values for repeatability (compare values of r1 and R1 in Table 1). Also, there are some laboratories that give consistently large laboratory biasses (see the Mandel plots). These observations indicate that there are one or more factors that are having a detrimental effect on the reproducibility. It is proposed therefore that the appropriate way to follow up this cross-testing experiment would be for the laboratories to take part in regular proficiency tests, so that laboratories who achieve consistently large laboratory biasses can be made aware of their position, and to provide a means of checking inexperienced laboratories.
The Schlagversuch is unusual in that the result of the test is calculated as the average of the percentages passing a series of five sieves. It could be expected, from simple statistical considerations, that such an average would give better precision than the result of a test on a single sieve: when measurements are statistically independent, the standard deviation of the average of n measurements is smaller than the standard deviation of the individual measurements by a factor of n½. If this is the case, then the precision of other mechanical tests could perhaps be improved by adopting a similar procedure to that used in the Schlagversuch. This possibility is examined by using the data from this experiment to calculate repeatabilities and reproducibilities for the test results obtained for each sieve used in the Schlagversuch, and by comparing the values obtained with those for the results of the Schlagversuch itself.
The results are shown in Figures D to I, below. The points representing the precision of the Schlagversuch are generally a little lower than the lines representing the precision of tests on single sieves, so that the "average over five sieves" gives only marginally better precision than tests on single sieves. The likely explanation for this is that tests on different sieves are not statistically independent, but are correlated. It may be concluded from this that it is unlikely that the precision of other mechanical tests could be improved by deriving the test result as an average over several sieves.
Figure D. Repeatabilities of tests on single sieves, Level 1.
Figure E. Reproducibilities of tests on single sieves, Level 1.
Figure F. Repeatabilities of tests on single sieves, Level 2.
Figure G. Reproducibilities of tests on single sieves, Level 2.
Figure H. Repeatabilities of tests on single sieves, Level 3.
Figure I. Reproducibilities of test on single sieves, Level 3.