Hello. I am interested in collating examples where FDA clearly delineates between training and validation data. One example comes from Good Machine Learning Practice for Medical Device Development: Guiding Principles | FDA:
"Training and test datasets are selected and maintained to be appropriately independent of one another. All potential sources of dependence, including patient, data acquisition, and site factors, are considered and addressed to assure independence."
A similar approach to differ these data sets can be inferred from Guidance for Industry and FDA Staff - Statistical Guidance on Reporting Results from Studies Evaluating Diagnostic Tests where comparative results shouldn't combine results from multiple methods or use the outcomes of the new test:
"FDA believes it is potentially misleading to establish the performance of a new test by comparing it to a procedure that incorporates the same new test. Any non-reference standard created in this manner will likely be biased in favor of the new test; that is, it will tend to produce overestimates of agreement of the new test with the non-reference standard."
Are there other sources you would recommend citing or including to help clarify the importance of independence between these data sets? Thank you for your input and help.
<cib-overlay data-dashlane-shadowhost="true"></cib-overlay>
------------------------------
Andrew Hadd
Director of Regulatory Affairs
Natera
Austin TX
United States
------------------------------