Definition of the upper reference limit for thyroglobulin antibodies according to the National Academy of Clinical Biochemistry guidelines: comparison of eleven different automated methods

Purpose In the last two decades, thyroglobulin autoantibodies (TgAb) measurement has progressively switched from marker of thyroid autoimmunity to test associated with thyroglobulin (Tg) to verify the presence or absence of TgAb interference in the follow-up of patients with differentiated thyroid cancer. Of note, TgAb measurement is cumbersome: despite standardization against the International Reference Preparation MRC 65/93, several studies demonstrated high inter-method variability and wide variation in limits of detection and in reference intervals. Taking into account the above considerations, the main aim of the present study was the determination of TgAb upper reference limit (URL), according to the National Academy of Clinical Biochemistry guidelines, through the comparison of eleven commercial automated immunoassay platforms. Methods The sera of 120 healthy males, selected from a population survey in the province of Verona, Italy, were tested for TgAb concentration using eleven IMA applied on as many automated analyzers: AIA-2000 (AIA) and AIA-CL2400 (CL2), Tosoh Bioscience; Architect (ARC), Abbott Diagnostics; Advia Centaur XP (CEN) and Immulite 2000 XPi (IMM), Siemens Healthineers; Cobas 6000 (COB), Roche Diagnostics; Kryptor (KRY), Thermo Fisher Scientific BRAHMS, Liaison XL (LIA), Diasorin; Lumipulse G (LUM), Fujirebio; Maglumi 2000 Plus (MAG), Snibe and Phadia 250 (PHA), Phadia AB, Thermo Fisher Scientific. All assays were performed according to manufacturers’ instructions in six different laboratories in Friuli-Venezia Giulia and Veneto regions of Italy [Lab 1 (AIA), Lab 2 (CL2), Lab 3 (ARC, COB and LUM), Lab 4 (CEN, IMM, KRY and MAG), Lab 5 (LIA) and Lab 6 (PHA)]. Since TgAb values were not normally distributed, the experimental URL (e-URL) was established at 97.5 percentile according to the non-parametric method. Results TgAb e-URLs showed a significant inter-method variability. Considering the same method, e-URL was much lower than that suggested by manufacturers (m-URL), except for ARC and MAG. Correlation and linear regression were unsatisfactory. Consequently, the agreement between methods was poor, with significant bias in Bland–Altman plot. Conclusions Despite the efforts for harmonization, TgAb methods cannot be used interchangeably. Therefore, additional effort is required to improve analytical performance taking into consideration approved protocols and guidelines. Moreover, TgAb URL should be used with caution in the management of differentiated thyroid carcinoma patients since the presence and/or the degree of TgAb interference in Tg measurement has not yet been well defined.


Abstract
Purpose In the last two decades, thyroglobulin autoantibodies (TgAb) measurement has progressively switched from marker of thyroid autoimmunity to test associated with thyroglobulin (Tg) to verify the presence or absence of TgAb interference in the follow-up of patients with differentiated thyroid cancer. Of note, TgAb measurement is cumbersome: despite standardization against the International Reference Preparation MRC 65/93, several studies demonstrated high inter-method variability and wide variation in limits of detection and in reference intervals.
Taking into account the above considerations, the main aim of the present study was the determination of TgAb upper reference limit (URL), according to the National Academy of Clinical Biochemistry guidelines, through the comparison of eleven commercial automated immunoassay platforms. Methods The sera of 120 healthy males, selected from a population survey in the province of Verona, Italy, were tested for TgAb concentration using eleven IMA applied on as many automated analyzers: Results TgAb e-URLs showed a significant inter-method variability. Considering the same method, e-URL was much lower than that suggested by manufacturers (m-URL), except for ARC and MAG. Correlation and linear regression were unsatisfactory. Consequently, the agreement between methods was poor, with significant bias in Bland-Altman plot. Conclusions Despite the efforts for harmonization, TgAb methods cannot be used interchangeably. Therefore, additional effort is required to improve analytical performance taking into consideration approved protocols and guidelines. Moreover, TgAb URL should be used with caution in the management of differentiated thyroid carcinoma patients since the presence and/or the degree of TgAb interference in Tg measurement has not yet been well defined.

Introduction
Human thyroglobulin (Tg) is a high molecular weight (660 kDa) soluble glycoprotein, typically stored within the follicular colloid of the thyroid, acting as the substrate for thyroid hormones (triiodothyronine, T3 and thyroxine, T4). As Tg is produced and utilized entirely by benign or differentiated malignant thyroid cells, it is considered a good tumor marker for patients with differentiated thyroid carcinoma (DTC) [1,2] after removal of benign and malignant thyroid tissue by surgery and I 131 ablation. Over the years, advances in assay technologies have led to important improvements in the analytical performances of Tg immunometric assays (IMAs); above all, the functional sensitivity (FS) of Tg IMAs has greatly improved: from 0.5 to 1.0 lg/L of the first generation IMAs to 0.05-0.10 lg/L of the second generation (2G) IMAs [3]. Nevertheless, the major limitation of 2G IMA testing is interference by serum Tg autoantibodies (TgAb) causing, as a rule, underestimation of Tg results and possibly masking disease recurrence [4][5][6]: it has been hypothesized that the complex between free Tg and endogenous TgAb prevents free Tg from binding to the capture and/or signal monoclonal antibody reagents and/or alternatively, endogenous TgAb binding to free Tg masks the epitopes recognized by monoclonal antibody reagents [5,7].
Serum TgAb are reported to be present in about 25-30% of DTC patients depending of the assay used and the cutoff employed to classify samples as positive or negative [1,7]. They are more frequent in females [8] and they are also present in about 60% of patients with autoimmune thyroid disease (AITD) [9]. On the basis of these considerations, the role of TgAb measurement has evolved from a marker of thyroid autoimmunity [10,11] to a test associated with Tg to investigate TgAb interference [12]. Consequently, serum TgAb have evolved as a surrogate test for tumor marker replacing Tg determination by IMAs, in cases of analytical interference from TgAb [13,14].
The manufacturers' upper reference limit (URL) for TgAb, set up to identify patients with AITD but misleading for evaluation of TgAb interference in Tg assay, is another aspect to consider. Reference intervals are the most widely used tool for the interpretation of clinical laboratory results. The Clinical and Laboratory Standards Institute (CLSI) Expert Panel on Reference Values has provided guidelines for the determination of reliable reference intervals (EP28-A3c) [24]. They recommended the use of the direct method, which implies the enrolment of a healthy population of at least 120 individuals and the determination of 2.5th and 97.5th percentile for the lower reference limit and the URL, respectively. As regards thyroid antibodies (thyroid peroxidase antibodies-TPOAb and TgAb) for AITD diagnosis, the 2003 proposal of the National Academy of Clinical Biochemistry (NACB) recommends the use of a direct method and a reference group composed of 120 men younger than 30 years, biochemically euthyroid [i.e., with serum thyrotropin stimulating hormone (TSH), concentrations between 0.5 and 2.0 mIU/L], and without risk parameters (goiter, family history of AITD, or other autoimmune diseases) [25].
However, the definition of the TgAb URL remains a matter of debate, because of the problems in enrolling the appropriate reference group [25] and in the determination of TgAb cut-off suitable for the identification of assay c Precision defined by the modified NCCLS Protocol EP5-A2 [27] d LoD and LoQ defined by the CLSI protocol EP17-A [28] e Functional sensitivity defined as TgAb concentration with total CV B20%, determined for a period of two days using one lot of reagents and testing, by four instruments, multiple samples from normal patients interference and consequently for the use of TgAb as surrogate marker in the follow-up of DTC [12]. Taking into account the above considerations, the main aim of the present study was the determination of TgAb URL, according to the NACB guidelines, by the use of eleven commercial automated IMA platforms. A further aim of the study was to compare the analytical performances of the methods used, in an attempt to evaluate, whenever possible, their effectiveness in detecting TgAb interference.

Materials and methods
One hundred and twenty male subjects were selected from a population survey in the province of Verona, Italy, according to the NACB criteria [25]. All of them gave informed consent for their participation in the study. Their sera were tested for TgAb concentration by using eleven IMA methods applied in as many automated analyzers:  Table 1. All methods are standardized with the reference preparation (IRP MRC 65/93) and use International Units (IU), except for CEN and KRY whose results were initially expressed in Arbitrary Units and then converted in IU ( Table 1). The normality of the distribution was assessed using the Shapiro-Wilk test. Since TgAb values were not normally distributed, the experimental URL (e-URL) was established at 97.5th according to the non-parametric percentile method (CLSI standard C28-A3c) [24]. Moreover, the non-parametric Kruskal-Wallis test and the Dunn's multiple comparison test were used for comparing the median values of the eleven groups.
The inter-method variability was assessed considering the interquartile range (25th and 75th percentile). To compare the eleven methods, ARC was regarded as the reference assay since it showed a satisfactory combination between the LoD and the assay imprecision (Table 1). Correlation between assays was assessed by Spearman Rank correlation coefficient (r s ); Passing-Bablok regression was applied to verify the linear association between methods, while agreement between assays was analyzed by Bland-Altman plot considering the difference between ARC and the other ten methods (AIA, CEN, CL2, COB, IMM, KRY, LIA, LUM, MAG and PHA). The difference between manufacturer's URL (m-URL) and e-URL was expressed as the ratio between them in percentage  (Delta% = |m-URL -e-URL|/m-URL 9 100). A twosided value of p \ 0.05 was considered statistically significant. Statistical analyses were performed by GraphPad Prism Software, version 4.0 (San Diego, CA, USA) and MedCalc software, version 11.6 (Ostend, Belgium).
A statistically significant difference between medians was observed for all methods except for 11 pairs of the 45 combinations analyzed (Fig. 1) (Table 3).
e-URLs differed from one method to the other. Of note, within the same method, e-URL was much lower than m-URL, except for ARC and MAG, which showed similar values for both ( Table 4).

Discussion
The determination of the cut-off for the definition of TgAb positivity is an important and controversial issue.
In this study, we have determined the TgAb URL in a reference group of male individuals, meticulously defined as being free of thyroid diseases, by eleven IMA methods, currently used in autoimmunology laboratories, and compared to each other. Actually, to our knowledge, no similar data are present in literature: in the past, other studies faced the same topic but with small numbers of different analytical methods, most of which are no longer in use [9,[15][16][17][18][19][20][21][22]29].
The first relevant result of the present study was the demonstration of differences between TgAb URLs claimed in the package insert (m-URL) and those obtained in the male reference sample (e-URL): with the exception of ARC and MAG method, e-URLs were lower than those proposed by the manufacturers, the difference ranging from 2.33 to 88.85%. These results were similar to those described in two previous studies dealing with the definition of TPOAb reference limits, determined by several current IMA platforms [30,31]. In our opinion, these discrepancies could be related to the lack of strict criteria in the selection of the subjects for the reference group. Specifically, racial differences could play some role, as most of the studies, sponsored by manufacturers, were performed in the geographical area of the production line and consequently difficult to reproduce in other settings. Moreover, the use of non-stringent criteria in the choice of subjects could have led to the enrolment of individuals with subclinical AITD, thus resulting in relatively high levels of TgAb causing the raise of the 97.5th percentile of the reference value distribution platforms [32][33][34][35][36][37].
The second relevant consideration that emerged from the present study was the variation of e-URLs according to the method used. The e-URL ranged from 2.25 (CL2) to 41.15 IU/mL (COB), with an approximately 18-fold variation, consistent with a previous paper which reported the same magnitude of variation using five IMA methods distinct from those considered in the present study (18). The difference between e-URLs supports concerns regarding inter-method variation [38]. Specifically, there were relevant differences between methods in terms of  medians (31-fold) (p \ 0.05, Kruskal-Wallis test) and interquartile ranges. These discrepancies were not expected and not easily explained; in fact, in recent decades, there have been significant improvements in harmonization between methods [39], resulting from the high level of automation of analytical procedures and the use of the same reference preparation (IRP MRC 65/93). Moreover, analytical imprecision seems not contribute to the above differences, as the values declared by the individual manufacturer were essentially overlapping (although obtained with different protocols, some of them standardized, some others not) and in general lower than 10% for both intraand inter-assay imprecision ( Table 1). Such discordance between TgAb assays could be attributed to various factors, including: (1) TgAb heterogeneity which is often independent to standardization efforts, and which implies different specificity for Tg antigen; (2) Tg interference and (3) differences in assay reagents, including solid phase material and the preparations of the antigen (Tg), which could affect the proper exposure of the immunodominant epitopes. Another important aspect to consider, to explain inter-method variability, was the diverse assay structures of the eleven IMA methods leading to a different LoD (Table 1) ranging from 0.005 to 12 IU/mL. Especially, a clear-cut discrepancy between methods with a LoD lower than 0.2 IU/mL (ARC, AIA and CL2) and methods with a LoD equal to or higher than 2 IU/mL was apparent.
To better evaluate the relationship between methods, ARC was chosen as the reference method on the basis of the best combination between LoD and imprecision (Table 1): the correlation of ARC with the other methods was not satisfactory, in line with the variability of the results, broadly described above. Passing-Bablok regression did not show a satisfactory agreement between assays. Furthermore, consistent with regression results, Bland-Altman plot highlighted a statistically significant positive or negative mean biases.
The lack of acceptable agreement between methods has relevant practical implications: clinicians have to use the same method to monitor TgAb concentration in the followup of DTC, on the other hand, laboratories must keep users timely informed about any modification in TgAb method to simplify re-baselining.
Despite the analysis of the data showed satisfactory analytical performances of some methods in terms of LoD, being able to measure also low levels of TgAb with adequate precision, the main limitation to this study lay in having contributed only indirectly to the debated question of TgAb analytical interference. In fact, the obtained results did not prove but only suggested the opportunity to choose the more sensitive and accurate latest generation methods for measuring TgAb, to better detect any false negative results even in patients with TgAb levels lower The y-intercept is expressed as IU/mL than the cut-off (the so-called ''negative patient''). Therefore, according to these considerations, two different cutoffs for TgAb could be proposed, one for the diagnosis of AITD and one for the effects of TgAb on Tg measurement.

Conclusions
In spite of the attempt of harmonization, quantitative agreement between methods was generally not satisfactory and methods could not be used interchangeably.
Therefore, additional standardization efforts are required to improve analytical performance, and biomedical industries are strongly invited to re-evaluate their assays taking into consideration CLSI approved protocols and guidelines. Finally, as long as the relationship between TgAb concentration and interference in Tg measurement is not clearly defined, TgAb URL must be used with caution, taking into account that it is usually set for the diagnosis of AITD and not for the identification of potential interference in Tg assay.