Deleting Files from the Linux Kernel

The NSA has been directed by an executive order from the President to remove internal and external content with the following 27 words.

  • Anti-Racism
  • Racism
  • Allyship
  • Bias
  • DEI
  • Diversity
  • Diverse
  • Confirmation Bias
  • Equity
  • Equitableness
  • Feminism
  • Gender
  • Gender Identity
  • Inclusion
  • Inclusive
  • All-Inclusive
  • Inclusivity
  • Injustice
  • Intersectionality
  • Prejudice
  • Privilege
  • Racial Identity
  • Sexuality
  • Stereotypes
  • Pronouns
  • Transgender
  • Equality

This could pose an issue for content that may contain words or phrases relating to cyber-security, technology, etc.

I imagine this is how it is going down:

grep -rlFf wordlist.txt . | xargs rm -f

Could just get a list of unique files that contain one of the banned words and do something with it too.

grep -rFof wordlist.txt . >files_to_remove.txt

What about the Linux Kernel?

So, let’s see what the Linux Kernel would need to delete.

grep -rFi -of wordlist.txt linux-6.13.2/ >files_to_remove.txt
awk -F: '{print tolower($2)}' files_to_remove.txt | awk '{count[$1]++} END {for (word in count) print word, count[word]}' | sort -k2 -nr > word_frequencies.txt

Get unique files:


cut -d':' -f1 files_to_remove.txt | sort | uniq > uniq_files.txt

Count unique files to delete:

cut -d':' -f1 files_to_remove.txt | sort | uniq | wc -l

Results

Unique files to delete: 5923 of 87174 total files.

Top 10 Filetypes to be Deleted

ExtensionCountPercentage of Unique Files
.c276446.7%
.h147224.9%
.dts4237.1%
.dtsi3355.7%
.rst3015.1%
.yaml2404.0%
.S751.3%
.txt621.0%
.sh550.9%
.json360.6%
Other1602.7%
  • C-related files (.c, .h) dominate, making up about 71.6% of the unique files.
  • Device Tree Source (.dts, .dtsi) files make up 12.8%, suggesting embedded systems or Linux kernel development.
  • Configuration & documentation files (.rst, .yaml, .txt, .json) total ~11.6%, showing structured data and documentation usage.
  • Shell scripts (.sh) and Assembly (.S) make up a smaller portion (2.2%), likely for build or automation scripts.

Categorized Files

  • Code Files (.c, .h, .S, .sh): 4,366 files (73.7%)
  • Configuration (.yaml, .json, .dts, .dtsi): 1,034 files (17.5%)
  • Documentation (.rst, .txt): 363 files (6.1%)
  • Other/Miscellaneous: 160 files (2.7%)

Word frequencies across all files:

bias 33121
dei 9472
privilege 3899
diversity 837
inclusive 724
inclusion 232
equality 88
diverse 50
gender 31
prejudice 17
pronouns 2