The NSA has been directed by an executive order from the President to remove internal and external content with the following 27 words.
- Anti-Racism
- Racism
- Allyship
- Bias
- DEI
- Diversity
- Diverse
- Confirmation Bias
- Equity
- Equitableness
- Feminism
- Gender
- Gender Identity
- Inclusion
- Inclusive
- All-Inclusive
- Inclusivity
- Injustice
- Intersectionality
- Prejudice
- Privilege
- Racial Identity
- Sexuality
- Stereotypes
- Pronouns
- Transgender
- Equality
This could pose an issue for content that may contain words or phrases relating to cyber-security, technology, etc.
I imagine this is how it is going down:
grep -rlFf wordlist.txt . | xargs rm -f
Could just get a list of unique files that contain one of the banned words and do something with it too.
grep -rFof wordlist.txt . >files_to_remove.txt
What about the Linux Kernel?
So, let’s see what the Linux Kernel would need to delete.
grep -rFi -of wordlist.txt linux-6.13.2/ >files_to_remove.txt
awk -F: '{print tolower($2)}' files_to_remove.txt | awk '{count[$1]++} END {for (word in count) print word, count[word]}' | sort -k2 -nr > word_frequencies.txt
Get unique files:
cut -d':' -f1 files_to_remove.txt | sort | uniq > uniq_files.txt
Count unique files to delete:
cut -d':' -f1 files_to_remove.txt | sort | uniq | wc -l
Results
Unique files to delete: 5923 of 87174 total files.
Top 10 Filetypes to be Deleted
Extension | Count | Percentage of Unique Files |
---|---|---|
.c | 2764 | 46.7% |
.h | 1472 | 24.9% |
.dts | 423 | 7.1% |
.dtsi | 335 | 5.7% |
.rst | 301 | 5.1% |
.yaml | 240 | 4.0% |
.S | 75 | 1.3% |
.txt | 62 | 1.0% |
.sh | 55 | 0.9% |
.json | 36 | 0.6% |
Other | 160 | 2.7% |
- C-related files (
.c
,.h
) dominate, making up about 71.6% of the unique files. - Device Tree Source (
.dts
,.dtsi
) files make up 12.8%, suggesting embedded systems or Linux kernel development. - Configuration & documentation files (
.rst
,.yaml
,.txt
,.json
) total ~11.6%, showing structured data and documentation usage. - Shell scripts (
.sh
) and Assembly (.S
) make up a smaller portion (2.2%), likely for build or automation scripts.
Categorized Files
- Code Files (
.c
,.h
,.S
,.sh
): 4,366 files (73.7%) - Configuration (
.yaml
,.json
,.dts
,.dtsi
): 1,034 files (17.5%) - Documentation (
.rst
,.txt
): 363 files (6.1%) - Other/Miscellaneous: 160 files (2.7%)
Word frequencies across all files:
bias 33121
dei 9472
privilege 3899
diversity 837
inclusive 724
inclusion 232
equality 88
diverse 50
gender 31
prejudice 17
pronouns 2