Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update taxonomy of social values #12

Open
Heloisa-Candello opened this issue Dec 11, 2024 · 11 comments
Open

Update taxonomy of social values #12

Heloisa-Candello opened this issue Dec 11, 2024 · 11 comments
Assignees
Labels
enhancement New feature or request

Comments

@Heloisa-Candello
Copy link

The current social value taxonomy requires a review.
Please consider the following taxonomy for the recommended social values.
taxonomy_leaves.docx

@santanavagner
Copy link
Member

Hi @Heloisa-Candello, thank you for the material.

We are going to perform a mapping between existing values and the taxonomy you shared ASAP, so we can discuss the next steps.

Thank you for the contribution!

@santanavagner
Copy link
Member

santanavagner commented Dec 12, 2024

Hi @Heloisa-Candello,

Here's the initial mapping considering the proposed taxonomy and the values we have today.

For some values I was not able to find a match in the new taxonomy.

In your opinion, how do you see the current values mapping to the new values proposed in the cases I was not able to find a direct mapping?

I'm asking because some terms in the proposed taxonomy have a fine granularity (e.g., prompt priming, jailbreaking) while others don't (trust).

How do you see these different levels of granularity being combined?

PS: @seb-brAInethics I'd love to get your input as well as you helped us on defining the initial list of social values.

Sentences count Current positive social value New positive value
11 accountability Accountability
12 accuracy Productivity?
9 advice
14 agreement
10 appropriate
14 awareness
10 collaboration Productivity?
8 commitment Accountability?
26 community and stakeholders Cultural diversity?
4 compliance Compliance
3 control Compliance?
31 copyright, right to ownership Copyright
5 dedication
7 duty
25 education
15 effective and efficiency
9 expertise
26 explainability Explainability
15 fairness Fairness
9 family
9 flexible
19 forthright and honesty
24 impact
34 inclusion and diversity Fairness
8 indelible
8 integrity
32 integrity, compliance, trust, ethics, and dedication Compliance
7 leadership
14 measurability
8 money
10 moral
9 openness
21 participation
0 personal
10 positivity
5 power
32 privacy Privacy
14 proactive
10 professional
12 progress
10 reputation
11 resolution
13 respect and social norms Cultural diversity?
22 responsibility
12 robustness Safety?
15 safety Safety
14 scale
8 security
14 success
11 transformation
17 transparency Transparency
12 trust Trust
11 trust, compliance, and integrity Trust
9 uniformity and indivisibility
10 universal

@santanavagner
Copy link
Member

santanavagner commented Dec 17, 2024

Btw, what do you think about updating our taxonomy according to Llama-guard-2?

  • S1: Violent Crimes
  • S2: Non-Violent Crimes
  • S3: Sex-Related Crimes
  • S4: Child Sexual Exploitation
  • S5: Specialized Advice
  • S6: Privacy
  • S7: Intellectual Property
  • S8: Indiscriminate Weapons
  • S9: Hate
  • S10: Suicide & Self-Harm
  • S11: Sexual Content

Ref: https://huggingface.co/meta-llama/Meta-Llama-Guard-2-8B

Beyond those, Zhaoqing proposed an extension as part of her summer internship:

  • S12: Misinformation & Deception
  • S13: Impoliteness
  • S14: immorality

@santanavagner
Copy link
Member

santanavagner commented Dec 18, 2024

Here's a first attempt considering llama-guard-2-8b values.

Let me know your thoughts.

Sentences count Current negative social value New negative value
95 abuse, exploitation, and vulnerability S1: Violent Crimes
14 arm trafficking S1: Violent Crimes
29 bigamy, polygamy, and adultery S12: Misinformation & Deception
1 blasphemy S9: Hate
2 conflict and dissensus S12: Misinformation & Deception
0 criticality
185 deception, lure, coercion, and persuasion S12: Misinformation & Deception
12 digital piracy S2: Non-Violent Crimes
3 distrust S12: Misinformation & Deception
17 drug dealing, drug use, and drug abuse S2: Non-Violent Crimes
15 embezzlement S2: Non-Violent Crimes
0 failure
55 falsification and misinformation S12: Misinformation & Deception
53 fraud and forgery S2: Non-Violent Crimes
12 gambling S2: Non-Violent Crimes
45 hacking, cracking, phishing, phreaking, and identity theft S2: Non-Violent Crimes
8 harassment S3: Sex-Related Crimes
58 harm (inflicting or planning) S1: Violent Crimes
6 harmful bias S9: Hate
8 money laundering S2: Non-Violent Crimes
37 murder S1: Violent Crimes
0 negativity
2 opaqueness S12: Misinformation & Deception
3 perjury S2: Non-Violent Crimes
23 pickpocketing S2: Non-Violent Crimes
6 prompt hacking S2: Non-Violent Crimes
8 racism and stereotypes S9: Hate
0 retaliation
35 smuggling S2: Non-Violent Crimes
25 tax evasion S2: Non-Violent Crimes
1 technocentrism S14: immorality
77 terrorism, arson, and poisoning S8: Indiscriminate Weapons
50 theft S2: Non-Violent Crimes
27 traffic of influence, bribing, ransom, and payola S2: Non-Violent Crimes
2 unsafety S12: Misinformation & Deception / S2: Non-Violent Crimes
17 vandalism S1: Violent Crimes

@seb-brAInethics
Copy link

seb-brAInethics commented Dec 18, 2024 via email

@santanavagner
Copy link
Member

Hi @seb-brAInethics, thanks for the feedback.

Here's an initial cluster analysis considering llama-guard with our extension.

negative_values-20241218

Here are the ones under "violent":

violent_crimes-20241218

And here the ones under "non-violent":

non_violent_crimes-20241218

I feel that we may need to explore each of the groupings from these two major clusters to identify the sentences referring to psychological violence. However, can we measure/grade such a thing? I mean, simple things from for ones may result in psychological violence for others. 🤔

@santanavagner
Copy link
Member

santanavagner commented Dec 18, 2024

Btw, granite guardian is considering the following risks in prompting-time currently:

  • Harm (harm): content considered generally harmful
  • Social Bias (social_bias): prejudice based on identity or characteristics
  • Jailbreaking (jailbreak): deliberate instances of manipulating AI to generate harmful, undesired, or inappropriate content
  • Violence (violence): content promoting physical, mental, or sexual harm
  • Profanity (profanity): use of offensive language or insults
  • Sexual Content (sexual_content): explicit or suggestive material of a sexual nature
  • Unethical Behavior (unethical_behavior): actions that violate moral or legal standards

Source: https://ollama.com/library/granite3-guardian

So, here's an updated mapping now considering also granite-guardian risks:

Sentences count Current negative social value Lllama-guard-2 (extended) Granite-guardian
95 abuse, exploitation, and vulnerability S1: Violent Crimes Violence
14 arm trafficking S1: Violent Crimes Violence
29 bigamy, polygamy, and adultery S12: Misinformation & Deception Unethical Behavior
1 blasphemy S9: Hate Profanity
2 conflict and dissensus S12: Misinformation & Deception Harm
0 criticality
185 deception, lure, coercion, and persuasion S12: Misinformation & Deception Unethical Behavior
12 digital piracy S2: Non-Violent Crimes Unethical Behavior
3 distrust S12: Misinformation & Deception Unethical Behavior
17 drug dealing, drug use, and drug abuse S2: Non-Violent Crimes Harm
15 embezzlement S2: Non-Violent Crimes Unethical Behavior
0 failure
55 falsification and misinformation S12: Misinformation & Deception Harm
53 fraud and forgery S2: Non-Violent Crimes Unethical Behavior
12 gambling S2: Non-Violent Crimes Unethical Behavior
45 hacking, cracking, phishing, phreaking, and identity theft S2: Non-Violent Crimes Harm
8 harassment S3: Sex-Related Crimes Violence
58 harm (inflicting or planning) S1: Violent Crimes Violence
6 harmful bias S9: Hate Social Bias
8 money laundering S2: Non-Violent Crimes Unethical Behavior
37 murder S1: Violent Crimes Violence
0 negativity
2 opaqueness S12: Misinformation & Deception Unethical Behavior
3 perjury S2: Non-Violent Crimes Harm
23 pickpocketing S2: Non-Violent Crimes Harm
6 prompt hacking S2: Non-Violent Crimes Jailbreaking?
8 racism and stereotypes S9: Hate Social Bias
0 retaliation
35 smuggling S2: Non-Violent Crimes Harm
25 tax evasion S2: Non-Violent Crimes Unethical Behavior
1 technocentrism S14: immorality Unethical Behavior
77 terrorism, arson, and poisoning S8: Indiscriminate Weapons Violence
50 theft S2: Non-Violent Crimes Harm
27 traffic of influence, bribing, ransom, and payola S2: Non-Violent Crimes Unethical Behavior
2 unsafety S12: Misinformation & Deception / S2: Non-Violent Crimes Harm
17 vandalism S1: Violent Crimes Violence

@seb-brAInethics
Copy link

seb-brAInethics commented Dec 18, 2024 via email

@santanavagner
Copy link
Member

santanavagner commented Dec 18, 2024

Thank you. I also feel that the set of risks from granite-guardian risks is too coarse for our use case. Maybe our extended version for llama-guard taxonomy is a good compromise.

On a side note, I'm experiencing something I've already expected, i.e., by having less cohesive clusters (i.e., bigger clusters encompassing different, more granular types of harm), threshold used previously in the recommendation algorithm are not working in the same way and need to be updated as well.

For instance, removal recommendations that were working with lower_threhold of 0.3 for all-minilm-l6-v2, now need to be set to 0.0 in order to retrieve things from these uber clusters.

@cassiasamp cassiasamp added the enhancement New feature or request label Dec 23, 2024
@santanavagner
Copy link
Member

@seb-brAInethics I moved the discussion about negative values to a new issue:
#14

This way, we continue the discussion here only for the positive social values.

@seb-brAInethics
Copy link

seb-brAInethics commented Jan 8, 2025 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants