By clicking “Accept”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information.

44

Updated on Feb 6th 2024 based on the version endorsed by the Coreper I on Feb 2nd

High quality data and access to high quality data plays a vital role in providing structure and in ensuring the performance of many AI systems, especially when techniques involving the training of models are used, with a view to ensure that the high-risk AI system performs as intended and safely and it does not become a source of discrimination prohibited by Union law. High quality datasets for training, validation and testing require the implementation of appropriate data governance and management practices. Datasets for training, validation and testing, including the labels, should be relevant, sufficiently representative, and to the best extent possible free of errors and complete in view of the intended purpose of the system. In order to facilitate compliance with EU data protection law, such as Regulation (EU) 2016/679, data governance and management practices should include, in the case of personal data, transparency about the original purpose of the data collection, The datasets should also have the appropriate statistical properties, including as regards the persons or groups of persons in relation to whom the high-risk AI system is intended to be used, with specific attention to the mitigation of possible biases in the datasets, that are likely to affect the health and safety of persons, negatively impact fundamental rights or lead to discrimination prohibited under Union law, especially where data outputs influence inputs for future operations (‘feedback loops’) . Biases can for example be inherent in underlying datasets, especially when historical data is being used, or generated when the systems are implemented in real world settings. Results provided by AI systems could be influenced by such inherent biases that are inclined to gradually increase and thereby perpetuate and amplify existing discrimination, in particular for persons belonging to certain vulnerable groups including racial or ethnic groups. The requirement for the datasets to be to the best extent possible complete and free of errors should not affect the use of privacy-preserving techniques in the context of the development and testing of AI systems. In particular, datasets should take into account, to the extent required by their intended purpose, the features, characteristics or elements that are particular to the specific geographical, contextual, behavioural or functional setting which the AI system is intended to be used. The requirements related to data governance can be complied with by having recourse to third parties that offer certified compliance services including verification of data governance, data set integrity, and data training, validation and testing practices, as far as compliance with the data requirements of this Regulation are ensured.

[Previous version]

Implementing Data Governance and Management Practices

High data quality is essential for the performance of many AI systems, especially when techniques involving the training of models are used, with a view to ensure that the high-risk AI system performs as intended and safely and it does not become the source of discrimination prohibited by Union law. High quality training, validation and testing data sets require the implementation of appropriate data governance and management practices. Training, validation and testing data sets should be sufficiently relevant, representative and have the appropriate statistical properties, including as regards the persons or groups of persons on which the high-risk AI system is intended to be used. These datasets should also be as free of errors and complete as possible in view of the intended purpose of the AI system, taking into account, in a proportionate manner, technical feasibility and state of the art, the availability of data and the implementation of appropriate risk management measures so that possible shortcomings of the datasets are duly addressed. The requirement for the datasets to be complete and free of errors should not affect the use of privacy-preserving techniques in the context of the the development and testing of AI systems. Training, validation and testing data sets should take into account, to the extent required by their intended purpose, the features, characteristics or elements that are particular to the specific geographical, behavioural or functional setting or context within which the AI system is intended to be used. In order to protect the right of others from the discrimination that might result from the bias in AI systems, the providers should be able to process also special categories of personal data, as a matter of substantial public interest within the meaning of Article 9(2)(g) of Regulation (EU) 2016/679 and Article 10(2)g) of Regulation (EU) 2018/1725, in order to ensure the bias monitoring, detection and correction in relation to high-risk AI systems.

Report error

Report error

Please keep in mind that this form is only for feedback and suggestions for improvement.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.