Appendix A: Weighting of Questions

This appendix accompanies PLUTO, a structured assessment tool which assesses the public value of data use. The tool consists of a questionnaire, in which each answer has been assigned a specific weight which contributes to the user’s public value score. In this document, we outline the weighting behind each answer as well as our reasoning for setting the weights as such.

Answers which are scored along the y axis represent foreseeable benefits of data use. A higher y score will contribute to a higher public value score. Answers which are scored along the x axis represent risk. A lower x score represents reduced risk, and therefore increases the public value score.

We as the developers of this tool are conscious that any attempt to provide a score pertaining to a complex concept such as public value comes with trade-offs. We have decided the below weighting scheme on the basis of international precedents, our own research and considerations which arose over the course of the project. For the sake of transparency, we are making our approach to this assessment clear.

If you have any comments on how the weightings could be improved, we would love to hear from you.

Finally, please note that the weights listed are subject to adjustment. For a record of historical weightings, click here.

Information About the Applicant

Q1. Is this assessment about you (or your organisation) using data or is it about someone else using your (or another person’s) data?

This question is designed to capture the point of view of the user, and does not impact their public value score.

Impact: benefit

This is about me (or my organisation) using data
0
This is about someone else using my (or another person’s) data (note: you might not be able to answer some questions. You can assess the data use's public value, but it may not be as accurate. For more information, contact the organisation using the data)
0

Explanation

This question is designed to capture the point of view of the user, and does not impact on their public value score.

Q2. Which of the answers below best describe the data user?

Who will be using the data? If you are unsure or if more than two categories apply, please specify via the “Other” option.

Impact: benefit

Individual citizen
+1
Researcher
+1
Public organisation
+1
Private company
0
Charity
+1
Other (please specify)
0

Explanation

As our assessment of public value includes underlying motivations, a privately owned company will be less likely, all things being equal, to produce value for the public (0) than entities committed to the public good (+1). Users of our tool can specify a different or multiple categories via the “Other” option, however as this is an unknown, it is scored at 0.

Q3. Which of the answers below best describe the primary activity of the data user?

This question refers to the primary activity of the data user. If the primary activity can best be explained by a separate category (or across two or more categories), please specify via the “None of the above” option. If you are unsure, please select “I don’t know.”

Impact: benefit

Conduct research
+1
Fulfil a government task
0
Improve public services
+1
Sell a product/service
0
None of the above (please specify)
0
I don’t know
0

Explanation

Data use which is primarily aimed at selling a product or service is typically less likely to produce public value (0). As it does not preclude the creation of public value, however, there are other questions in the questionnaire that ascertain other aspects that have bearing on the creation of public value. Data use which is aimed at producing research (e.g. scientific, medical) will generally have a higher public value, as will those aimed at improving public services (+1).

Q4. What is the size of the data user - is it a large, medium sized, or small entity?

This question aims to capture the size of the data user relative to the public. A large data user (e.g. a multinational company) will typically have more resources at its disposal, and is therefore held to a higher standard regarding its ability to limit the risks and maximise the benefits of its data use. Moreover, its data uses are more likely to affect a greater number of people.

Please choose the option “small entity” if the data user has an annual turnover (or balance sheet) of less than €10 million or equivalent. Choose “large entity” if it is more than €50 million or equivalent in other currency per year. For anything in between, please choose “medium-sized entity”.

If none of the provided options are applicable, please specify via the "None of the above" option. If you are unsure, please select "I don't know.”

Impact: benefit

Small entity
+2
Medium-sized entity
+1
Large entity
0
None of the above (please specify)
0
I don't know
0

Explanation

This question aims to capture the size of the data user relative to the public. A large data user (e.g. a multinational company) will typically have more resources at its disposal, and is therefore held to a higher standard regarding its ability to limit the risks and maximise the benefits of its data use (0). Moreover, its data uses are more likely to affect a greater number of people.

Q5. What is the data user’s primary source of funding?

This question aims to capture how the data user finances its operations. A data user with more financial resources is held to a higher standard regarding its ability to limit risks and maximise benefits of its data use.

If none of the provided options are applicable, please specify via the "None of the above" option. If you are unsure, please select "I don't know.”

Impact: benefit

Predominantly financed through income from own activities or parent/partner company
-1
Predominantly financed by governmental or public bodies (national or international)
0
Predominantly financed by philanthropic foundations
+1
Predominantly financed by large donations
-1
Predominantly financed by small donations
+1
Predominantly financed by a political party or interest organisation
0
None of the above (please specify)
0
I don’t know
0

Explanation

Larger organisations have a higher standard to attain in terms of public value when compared to smaller and medium sized organisations, as they have comparatively more resources (financial, staffing etc.).

Q6. Which of the answers below best describe the data user's public reporting requirements?

This question aims to understand if the data user shares information about its activities with the public. The more regular and public the data user reports on its activities are, the higher the data user's transparency and the higher, in turn, the public value created.

If none of the provided options are applicable, please select "None of the above" and specify your unique reporting requirements. If you are unsure, please select "I don't know."

Impact: benefit

No reporting requirements
-1
Annual report on key financials to financial institution
0
Annual public report on activities to government
+2
Annual non-public report on activities to the funder (or other entities)
0
None of the above (please specify)
0
I don’t know
0

Explanation

In order to ensure trust and enable informed decisions by the public, data use with a high public value tends to be undertaken in a way that is transparent to the public.

Q7. Which of the answers below best describe the data user's legal obligations to provide information about its data use activities to the public?

In many jurisdictions public actors need to disclose information when someone files a Freedom of Information (FOI) request. For data users that do not fall within the scope of FOIs (e.g. privately owned businesses), their track record in providing information when requested by citizens or public bodies is taken into account.

If none of the provided options are applicable, select "None of the above" and specify your unique reporting requirements. If you are unsure, choose "I don't know.”

Impact: risk

No legal options to compel disclosure of information about data use activities to the public beyond civil or criminal proceedings (e.g., as a result of GDPR obligations)
+3
Can be legally compelled to disclose information about its data use activities (e.g., via freedom of information laws)
-2
None of the above (please specify)
0
I don’t know
0

Explanation

Data use with a high public value tends to be undertaken in a way that is transparent to the public. In many jurisdictions public actors need to disclose information when someone files a Freedom of Information (FOI) request. Data users that do not fall within the scope of FOIs (e.g. privately owned businesses) are less accountable for their activities to the public (+3).

Q8. Does the data user have a procedure in place to respond to public requests for information on its activities (e.g. freedom of information requests, other legal obligations, or voluntarily)?

This question assesses if the data user has an established procedure for responding to requests to provide information to the public on its activities. No procedure will reduce the public value of an instance of data use as it does not signal a commitment to transparency, while an established procedure implies a greater transparency, and thus a higher public value.

If none of the provided options are applicable, please specify via the "None of the above" option. If you are unsure, please select "I don't know.”

Impact: risk

Procedure in place to respond to public requests for information
-2
No procedure in place to respond to public requests for information
+2
None of the above (please specify)
0
I don’t know
0

Explanation

Whether or not the data user responds to public requests for information matters when assessing public value. Having a procedure in place is more likely to lead to data use with a high public value, due to increased public accountability (-2). Having no procedure is less likely to lead to public value (+2).

Benefits of the Applicant’s Activity

Q9. Which of the answers below best describe the data user's motivation for the data use?

This question aims at capturing the user’s motivation for the data use. For example, data use aimed at improving medical treatments is of high value to the public, provided that risks are minimised.

If none of the provided options are applicable, select "None of the above" and specify the motivation underlying the data use in question. If you are unsure, please select "I don't know.”

Impact: benefit

Business and commerce
0
Improving public services
+2
Research (primarily scientific, medical)
+2
Research (primarily industry)
+1
Other (please specify)
0
I don’t know
0

Explanation

Advancing scientific research, improving medical treatments, combating climate change and combating inequality are all widely-agreed upon goals for society. Therefore, data use which aims to progress towards these goals will likely be of a higher public value (+2). Similarly, improving public services creates value for the public. While data use which is motivated by commercial profit or increasing revenue does not inherently reduce or increase public value, it is less likely to lead to high public value data uses (0).

Q10. Is it plausible to assume that the data use will benefit people in low and middle income countries (LMICs)?

This question aims at understanding whether the data use will benefit individuals and/or communities in low and middle-income countries (LMICs, or “the Global South”). Multinational corporations frequently generate considerable profits in high-income countries from digital activity in LMICs while paying no tax there, which adversely impacts the public value they generate. Data use which helps reduce global inequality by ensuring benefits are shared equally will be of higher public value.

If none of the provided options are applicable, select "None of the above" and specify the data user’s record of distributing benefits to LMICs. If you are unsure, please select "I don't know.”

Impact: benefit

Data use will plausibly benefit people in LMICs
+2
Data use will not plausibly benefit people in LMICs
-2
Not applicable (please specify)
0
I don’t know
0

Explanation

Data use which helps reduce global inequality by ensuring benefits are shared equitably will be of higher public value. Data use which will plausibly benefit people in lower and middle income countries (LMICs) receives a higher score (+2). No plausibility scores lower (-2), as the data use is less likely to contribute to reducing global inequality.

Q11. Which of the answers below best describe the track record of the data user in strengthening the benefits for marginalised groups?

Certain marginalised groups are afforded special protection under anti-discrimination laws. The German Constitution (Basic Law), for instance, prohibits discrimination on the basis of sex, gender, origin, ethnicity, language, belief, religious and political views, and disability (Article 3 Basic Law). This question aims to assess if the likelihood that a given data use will benefit such groups. Data use which helps reduce inequality and which benefits these groups will be of a higher public value. This question should be answered in line with those groups protected in the relevant jurisdiction or, if inapplicable, in international Human Rights law.

“Some track record” indicates a few instances of strengthening the benefits for marginalised groups. A “strong track record’ indicates consistent and sustained efforts to ensure the benefits of data use are felt by such groups. If you are unsure, please select "I don't know.”

Impact: benefit

No track record of strengthening benefits for groups that are granted protection by way of anti-discrimination laws
-2
Some track record of strengthening benefits for groups that are granted protection by way of anti-discrimination laws
+1
Strong track record of strengthening benefits for groups that are granted protection by way of anti-discrimination laws
+2
I don’t know
0

Explanation

Data use which helps reduce inequality and which benefits groups protected by anti-discrimination laws will be of a higher public value. A strong track record (i.e., consistent and sustained efforts) receives a higher score (+2) than “some track record” (i.e., some instances) (+1). No track record scores lower (-2), as the data use is less likely to contribute to reducing inequality.

Q12. Which of the answers below best describe the data use’s benefits for future generations?

Future generations are likely to suffer from inequities in the distribution of wealth and income, as well as water scarcity, global warming and migration. Data use that serves a growing societal need, or is likely to benefit future generations in other ways, will have a higher public value.

If you are unsure, please select "I don't know.”

Impact: benefit

Benefits primarily cover issues of current and future relevance
+1
Benefits primarily cover issues of relevance for future generations
+1
Benefits are likely to increase in the future
+2
I don’t know
0

Explanation

Public value in data solidarity is greater if benefits will foreseeably increase in the future, and for future generations. Data use that serves a growing societal need and is likely to benefit future generations in other ways, receives a higher score in this category (1, 2) than data use that does not (1).

Risks of the Applicant’s Activity

Q13. Which of the answers below best describe the type(s) of foreseeable risk to (any) individuals or groups coming from the data use?

A key component of assessing the public value of data use is the (foreseeable) risk to individuals or groups, both directly and indirectly. Note that this is not an exhaustive list of potential risk categories. Please use the “Other” option to specify additional risks. Examples of the above include:

Financial Risk: Sensitive financial data (e.g., bank details, tax information etc.) carries a financial risk to data subjects. An indirect financial risk would be individuals denied insurance or loan applications on the basis of predictions made via the data use.

Health risk: If data use involves managing health records for example, there is a risk that mishandling data could lead to misdiagnosis or incorrect medical treatment, potentially causing physical harm to individuals. Data misuse can also risk the mental health of individuals.

Informational Risk: Informational risk is the potential for harm from disclosure of information about an identified individual. All sensitive personal data falls under this category, as well as anonymised data which can be conceivably re-identified.

Political Risk: Data that includes sensitive political indicators (e.g., voting preference, party affiliation, union membership) carries a political risk if they are mishandled or misused.

Social Risk: If the data use risks the social well-being of individuals or communities, and/or includes sensitive social descriptors, such as sexual orientation.

Impact: risk

Financial risk
+3
Health risk
+3
Informational risk
+3
Political risk
+3
Social risk
+3
Other (please specify)
+3
I don’t know
0

Explanation

A key component of assessing the public value of data use is the (foreseeable) risk to individuals or groups, both directly and indirectly. Note that although risks will impact the public value score negatively and are measured cumulatively here, this may be offset if sufficient harm mitigation measures are in place (final section).

Q14. Independent of the type of risk, how likely is it that the risk will materialise and create harm?

This question aims to capture the foreseeable likelihood of risk occurring to individuals and communities as a result of the data use. Data use with a higher relative risk will be of lower public value without sufficient harm mitigation measures in place.

Impact: risk

Low foreseeable likelihood
0
Moderate foreseeable likelihood
+2
High foreseeable likelihood
+4

Explanation

This question aims to capture the foreseeable likelihood of risk occurring to individuals and communities as a result of the data use. Data use with a higher relative risk will be of lower public value (+4) compared to moderate (+2) and low (0).

Q15. Does the data user take measures to limit the negative impact of their data use on the natural environment (e.g., using renewable energy sources)?

Data use which in which the negative impact on the environment is limited will be of higher public value.

Relevant aspects here are: does energy used to store and process the data come from fully renewable sources? Does the data use in itself further climate protection (e.g. climate research?). If you are unsure, please select "I don't know."

Impact: risk

No measures to limit environmental impact
+5
Some measures to limit environmental impact
-2
Extensive measures to limit environmental impact
-5
I don’t know
0

Explanation

Data use which has a positive impact on the environment is of higher public value, as it contributes to the widely-shared goal of protecting the environment and mitigating climate change. Data users which have no measures in place to limit their environmental impact will reduce public value of their activities (+5); less so for those with some measures in place (-2). For data users which have taken extensive measures to limit their environmental impact, public value is increased (-5).

Q16. Which of the answers below best describe the possible elevated risks for marginalised groups ? An example for financial risk would be a refused loan application due to being a member of an ethnic minority group

In most jurisdictions, certain groups in society are afforded special protection under anti-discrimination laws. The German Constitution (Basic Law) for instance prohibits discrimination on the basis of sex, gender, origin, ethnicity, language, belief, religious and political views, and disability (Article 3 Basic Law). Adverse effects on groups who are afforded special protection under the law may also be generated incidentally and unintended (such as a loan application being denied). This question should be answered in line with those groups protected in the relevant jurisdiction or, if inapplicable, in international Human Rights law.

This question aims to assess the foreseeable risk of an instance of data use for such groups. The risk is "elevated" as it is impossible to remove all risk.

Impact: risk

Data use foreseeably entails elevated risk for protected groups
+5
Data use foreseeably entails no elevated risk for protected groups
0
Other (please specify)
0
I don’t know
0

Explanation

Data use which carries risks (direct or indirect) for groups given special protection by law is much less likely to generate public value without significant harm mitigation measures in place (+5). Data use which entails no foreseeable risk scores “neutral” (0).

Q17. Which of the answers below best describe the information given by the data user to the public about the risks of their data use?

This question aims to capture whether or not the data user informs the public about risks related to their data use.

Impact: risk

No information is provided
+2
Information about risks provided to the public at some point during the data use
+1
Information about risks provided to the public prior to commencement of data use
-2
None of the above (please specify)
0
I don’t know
0

Explanation

Data use in which potential risks are openly and transparently communicated prior to commencement will be of higher public value (-2), as it allows individuals and communities to make informed decisions. Communication only after the commencement of data use contributes to a lower public value score (+1), as the public is unable to make the same informed decisions. Data use in which no information is provided to the public scores even lower (+2), as there is no chance for informed decision making.

Q18. Has the data user taken steps to limit the likelihood that the data use creates negative, unintended consequences?

This question concerns the risk of dual use. Dual use, in this instance, refers to the potential for data to be misused by a third party for unintended or malicious ends. For example, medical data collected for research could be de-anonymised and used for targeted advertising, or data collected for a consumer survey could be used for illegal phishing scams. A useful question that can guide thinking is which other uses come to mind when examining the data user’s planned data use, how many are there and how likely are they?

Impact: risk

Steps have been taken to limit negative unintended consequences
0
No steps have been taken to limit negative unintended consequences
+4
None of the above (please specify)
0
I don’t know
0

Explanation

Data use which is likely to lead to negative unintended consequences has a lower score (+4) as it poses foreseeable risks to the public. The risk is reduced (0) if steps have been taken to limit this.

Institutional Safeguards

Q19. Which of the answers below best describe the data user's way of assessing risk?

This question aims to capture whether the data user routinely assesses risks emerging from their data use, and the format in which such assessment is conducted.

Impact: risk

No risk assessment in place
+1
Ad-hoc risk assessment
0
Mandatory risk assessment (for each data use)
-1
I don’t know
0

Explanation

Data users which have no risk assessments in place are less likely to discover problems before they negatively impact individuals and communities (and therefore public value) (+1). Ad-hoc assessments may discover some but miss others (0), whereas a mandatory risk assessment is much more likely to protect the public from significant undue harm, therefore increasing the public value score (-1).

Q20. If there is a risk assessment procedure in place, how are its results used?

This question aims to capture how the data user responds to risk assessments, if they are in place. For instance, whether the results of risk assessments are non-binding recommendations (information), or whether data use may only proceed after having responded to the findings of the risk assessment (actions).

Impact: risk

No risk assessment in place
0
Findings which result in information
-1
Findings which result in actions
-2
None of the above (please specify)
0
I don’t know
0

Explanation

A risk assessment process which results in actions (i.e., harm mitigation) is likely to reduce risk to individuals and communities and therefore scores relatively high (-2). Risk assessments that only result in information (e.g., non-binding recommendations) scores slightly lower (-1).

Q21. Which of the answers below best describe those involved in the process of assessing risks?

This question captures the background of the persons involved in any risk assessment. A person who is familiar with ethics may highlight and be aware of specific risks entailed in a specific instance of data use, thereby reducing overall risk and increasing its public value. Consulting those with a knowledge of data ethics increases the likelihood of as many risks as possible being included in the course of the risk assessment. Other professionals (e.g., lawyers, computer scientists) can also highlight important issues that pertain to public value.

If you are unsure, please specify via the “None of the above” option.

Impact: risk

No risk assessment in place
0
Trained ethicists are not included in the risk assessment
+1
Trained ethicists are included in the risk assessment
-4
Other professionals (e.g. legal, data) involved in risk assessment
-2
None of the above (please specify)
0
I don’t know
0

Explanation

A person who is familiar with ethics may be particularly well placed to highlight and suggest responses emerging from a specific instance of data use, thereby reducing overall risk and increasing its public value (-4). Consulting those with a knowledge of data ethics increases the likelihood of as many risks as possible being included in the course of the risk assessment. Other professionals (legal experts, data scientists) will also be able to highlight specific issues (-2).

Q22. Which of the answers below best describe the process of harm monitoring?

This question aims to capture whether the data user monitors harm emerging from their data use. For example, if a company uses personal data obtained from its website to improve its service, does it monitor and review potential harms to any individuals or communities affected by this data use?

Impact: risk

No monitoring of harms
+2
Harms are monitored
-2
None of the above (please specify)
0
I don’t know
0

Explanation

Monitoring of harms is more likely to reduce risks (-2) as it may capture some harms before or during the data use. If harm is not monitored, the risk to the public is increased (+2), and therefore the public value score is lower.

Q23. Which of the answers below best describe the ability of the data user to end their activity, once unexpected harms occur?

This question aims to understand the data user’s ability to end their activity once unexpected harms occur. For example, if in the course of a medical study it is found that a patient’s data has been leaked, can the study be stopped?

Impact: risk

Harm-causing activity cannot be ended immediately
+3
Harm-causing activity can be ended immediately
-1
None of the above (please specify)
0
I don’t know
0

Explanation

A key aspect of harm mitigation is the ability to stop a process if and when harm occurs. If an instance of data use (e.g., a medical study) cannot be ended immediately when harm occurs, the risk to the public is significantly higher (+3), and therefore the public value score is lower.

Q24. Which of the answers below best describe the complaint procedures available to individuals?

This question aims to capture what complaint procedures are in place for individuals to report harm from data use. For example, if someone were to experience financial harm in the form of being denied a loan on the basis of a predictive algorithm, are they able to submit a complaint to the data user?

Impact: risk

No complaint procedure in place
+3
Complaint procedure is easily accessed (e.g. online)
-1
Complaint procedure designed in an accessible manner (e.g. it is inclusive of users with disabilities)
-2
Guaranteed response times (under 4 weeks)
-2
I don’t know
0

Explanation

Public value is lower when individuals and communities have little or no way to voice their complaint if and when harm occurs (+3). Complaint procedures which are easy to access (-1, -2) and from which a prompt response time can be expected (-2) receive a higher score, as this implies greater accountability of the data user.

Q25. What happens with any complaints?

This question aims to capture how the data user responds to complaints from the public. A prompt response procedure which results in immediate action will improve accountability and trust, and therefore public value.

Impact: risk

Complaint results in immediate action
-3
Complaint results in eventual action
-1
Complaint is acknowledged
0
No complaint procedure in place
+3
I don’t know
0

Explanation

Public value is higher when a complaints procedure results in actions (and therefore harm is mitigated; -3, -1). When there is no procedure, public value is lowered due to increase in risk (+3).