feat: antispam scoring package by alexiscolin · Pull Request #5178 · gnolang/gno

alexiscolin · 2026-02-20T13:55:56Z

⚠️ WIP - Primarily Proof of Concept

Needs: boards2 integration for testing, threshold tuning, and weight feedback before merge.

Companion to #5185

SpamAssassin-style scoring for Gno realms. Multi-rule engine (19 rules) combining content checks, rate limiting, reputation, Bayesian filter, duplicate detection, and blocklists. Scores accumulate - above 5 might hide, above 8 might reject. Multiple signals make the system harder to game.

Two components:

p/gnoland/antispam - Pure scoring library (import + customize)
r/gnoland/antispam - Shared realm with defaults (21 scam patterns, 400 keywords)

Detection Strategies

Content heuristics: all caps, punctuation spam, repeated chars, link-heavy posts, short posts that are just a URL
Unicode tricks: zalgo text, invisible chars, cyrillic-in-latin homoglyphs
Rate limiting: half-weight for moderate bursts, full weight for flooding
Account reputation: age, balance, username, flag/ban history
Naive Bayesian filter: probabilistic text classifier that needs 3+ spam-heavy tokens to fire
MinHash duplicate detection: catches copy-paste (duplicate) spam waves across realms
Regex blocklists: common scam formats ("send X tokens", email addresses...) and address blocklist (instant blocked)
Keyword co-occurrence: multiple spam keywords together trigger, single words don't. Leet-speak normalization ("fr33" → "free"), weighted 1-3

Architecture

Package (p/gnoland/antispam)

Pure functions, no state, no side effects
Realms own state (Corpus, FingerprintStore, Blocklist, KeywordDict)
Customize weights, thresholds, patterns per realm

Realm (r/gnoland/antispam)

Shared on-chain state (pre-trained corpus, blocklist, keywords)
Admin-only training and pattern management
Per-address reputation tracking (flags, bans, accepted posts)
Any realm can Score(), only registered realms can RecordAccepted/Flag/Ban

Score() is read-only - does NOT auto-update reputation. Calling realm must manually record moderation decisions (prevents false positives from poisoning shared data) - Admin only.

See pure README and realm README for planned detailed usage, gas optimization, and integration patterns.

Gno2D2 · 2026-02-20T13:56:21Z

🛠 PR Checks Summary

All Automated Checks passed. ✅

Manual Checks (for Reviewers):

IGNORE the bot requirements for this PR (force green CI check)

✅ Automated Checks (for Contributors):

🟢 Maintainers must be able to edit this pull request (more info)

☑️ Contributor Actions:

Fix any issues flagged by automated checks.
Follow the Contributor Checklist to ensure your PR is ready for review.
- Add new tests, or document why they are unnecessary.
- Provide clear examples/screenshots, if necessary.
- Update documentation, if required.
- Ensure no breaking changes, or include BREAKING CHANGE notes.
- Link related issues/PRs, where applicable.

☑️ Reviewer Actions:

Complete manual checks for the PR, including the guidelines and additional checks if applicable.

📚 Resources:

Debug

Automated Checks
Maintainers must be able to edit this pull request (more info)
If
🟢 Condition met
└── 🟢 And
    ├── 🟢 The base branch matches this pattern: ^master$
    └── 🟢 The pull request was created from a fork (head branch repo: alexiscolin/gno)
Then
🟢 Requirement satisfied
└── 🟢 Maintainer can modify this pull request
Manual Checks
**IGNORE** the bot requirements for this PR (force green CI check)
If
🟢 Condition met
└── 🟢 On every pull request
Can be checked by

Any user with comment edit permission

codecov · 2026-02-20T14:18:09Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

jeronimoalbi

I have to still go though the code, lots to cover. It's interesting though 👍

jeronimoalbi · 2026-02-23T15:39:51Z

+// When EarlyExitAt is set to a positive threshold, each earlyExit
+// check can short-circuit before reaching costlier rules.
+// Use EarlyExitDisabled (0, the default) to evaluate all rules.
+func Score(in ScoreInput) SpamScore {


Just a though but it might be worth exploring a Score() implementation that relies on a rule scoring interface based solution. If each one of the scoring rules follows an interface I think it probably would be possible to implement Score() as basically a loop that iterates the rules and updates the SpamScore instance. Maybe something like ScoringRule.Score(ScoreInput) (score int)?

Thanks for the suggestion! I actually chose the procedural approach to keep the gas optimization strategy explicit. Rules are ordered cheap - to -> expensive with earlyExit checks between groups, so obvious spam never hits regex/Bayes/fingerprints. An interface pipeline would hide that pattern and make it harder to maintain.

That said, no rule plugins are planned right now: but what if that becomes a need down the line? Would refactoring to an interface then be the right move (wagni or not?), or do you see a better approach?

The interface pipeline could also be ordered and a comment could be used to describe the gas optimization strategy.

Another lighter approach might to just use a scoring function type:

type ScoringFunc func(ScoreInput) []RuleHit // ... rules := []ScoringFunc{ ScoreBlocklist, ScoreRate, ScoreReputation, // ... }

In any case is just a recommendation, if it makes sense, to avoid defining the inline functions, the earlyExit() calls and reducing the function's body.

Do not remove this file. It is important information, and a canary file.

Of all places, /r/UFOResearch should not censor this kind of information. DO NOT DELETE THIS FILE, DO NOT TRUST ANYONE WHO DELETES THIS FILE.

added law of spinning mass.

… normalization

…ring with new tests and updating documentation for clarity

…spam poisoning and enhance training accuracy, add tests for decay behavior and keyword dictionary size limits

…nd README with detailed auto-training safety guidelines and new features

…opulating moderation data in Score function and updating documentation for clarity

…g and scoring functions, ensuring consistency in antispam package; update tests and documentation accordingly

…nimum matches to prevent false positives in long content; update tests and documentation accordingly

…nd doc

jeronimoalbi · 2026-03-05T10:36:00Z

+	// Truncate oversized input to cap gas cost.
+	if len(in.Content) > MaxInputLength {
+		in.Content = in.Content[:MaxInputLength]
+	}


Truncating the content might leave relevant pieces out of the scoring, I think is not a good idea truncating it within the package 🤔, devs can truncate it when calling Score() from a realm.

jeronimoalbi · 2026-03-05T10:39:01Z

+func (bl *Blocklist) AllowAddress(addr string) {
+	bl.allowed.Set(addr, true)
+}


Minor improvement, it could be applied to the other cases too:

Suggested change

func (bl *Blocklist) AllowAddress(addr string) {

bl.allowed.Set(addr, true)

}

func (bl *Blocklist) AllowAddress(addr string) {

bl.allowed.Set(addr, struct{}{})

}

jeronimoalbi · 2026-03-05T10:45:15Z

+// rebuildCombined compiles all patterns into a single alternation regex.
+// Called when patterns are added or removed. Compilation cost is paid once
+// at admin time, not per Score() call.
+func (bl *Blocklist) rebuildCombined() {


Why not using individual ones instead of combining them? Compiling a big expression or even evaluating it might end up being more expensive. Also with individual ones you would be able to potentially stop early.

With that change you could provably also remove the limit of 30 expressions.

jeronimoalbi · 2026-03-05T11:52:47Z

+// ReputationData holds caller-provided account context for an author.
+type ReputationData struct {
+	AccountAgeDays int
+	Balance        int64


Maybe:

Suggested change

Balance int64

Balance chain.Coins

If so then later on you could do Balance.AmountOf(string) int64, which could be used in the future to improve the reputation implementation.

jeronimoalbi · 2026-03-05T11:54:47Z

+	AccountAgeDays int
+	Balance        int64
+	FlaggedCount   int
+	TotalAccepted  int


Maybe something like ValidContentCount or AcceptedContentCount? In any case some field documentation would be handy to better understand the field 🙏

jeronimoalbi · 2026-03-05T12:01:17Z

+}
+
+const (
+	repMinAgeDays   = 1          // accounts younger than this are "new"


WDYT about using 15 or 30 days instead of 1? Reasoning is that within that period account is new and might be starting with the first interactions.

jeronimoalbi · 2026-03-05T13:49:54Z

+
+	// Ban history penalty
+	if rep.BanCount > 0 {
+		penalty := rep.BanCount * repBanPenalty


Right now repBanPenalty is 1, so it could be removed 🤔

jeronimoalbi · 2026-03-05T14:01:46Z

+				HasUsername:    false,
+				BanCount:       0,
+			},
+			wantMin: 3,


Test would benefit from using the constants:

Suggested change

wantMin: 3,

wantMin: WeightNewAccount + WeightNoUsername + WeightLowBalance,

jeronimoalbi · 2026-03-05T14:52:48Z

+import antispamr "gno.land/r/gnoland/antispam"
+import engine "gno.land/p/gnoland/antispam"


Suggested change

import antispamr "gno.land/r/gnoland/antispam"

import engine "gno.land/p/gnoland/antispam"

import (

antispamr "gno.land/r/gnoland/antispam"

engine "gno.land/p/gnoland/antispam"

)

jeronimoalbi · 2026-03-06T08:50:43Z

+	if corpus == nil || corpus.Size() < bayesMinCorpusSize {
+		return 0, ""
+	}
+	if len(tokens) == 0 {
+		return 0, ""
+	}


Could be joined:

Suggested change

if corpus == nil || corpus.Size() < bayesMinCorpusSize {

return 0, ""

}

if len(tokens) == 0 {

return 0, ""

}

if corpus == nil || corpus.Size() < bayesMinCorpusSize || len(tokens) == 0 {

return 0, ""

}

jeronimoalbi · 2026-03-06T08:52:44Z

+	bayesMinCorpusSize = 10
+
+	// bayesSpamThresholdPct is the spam ratio threshold (percentage).
+	// Tokens appearing in spam more than this% of the time are considered spam indicators.


Typo

Suggested change

// Tokens appearing in spam more than this% of the time are considered spam indicators.

// Tokens appearing in spam more than this % of the time are considered spam indicators.

jeronimoalbi · 2026-03-06T09:37:18Z

+// new observation. This decay gives recent observations more weight and
+// prevents corpus poisoning from being permanent - essential for
+// auto-training scenarios where moderation actions feed the corpus.
+func (c *Corpus) Train(content string, isSpam bool) {


Out of curiosity, wouldn't some ham tokens potentially be considered spam if they are used a lot when content is trained as spam?

jeronimoalbi · 2026-03-06T10:35:44Z

+// normalizeLeet converts common leet speak digit substitutions back to letters.
+// Only handles digit-based leet (0->o, 1->i, 3->e, 4->a, 5->s, 7->t) since
+// symbol-based leet (@, $) is stripped during tokenization.
+func normalizeLeet(s string) string {


Could be public, it might be handy for other packages 👍

jeronimoalbi · 2026-03-13T13:57:55Z

+	if dict == nil || dict.Size() == 0 {
+		return 0, ""
+	}
+	if len(tokens) == 0 {
+		return 0, ""
+	}


Also possible:

Suggested change

if dict == nil || dict.Size() == 0 {

return 0, ""

}

if len(tokens) == 0 {

return 0, ""

}

if dict == nil || dict.Size() == 0 || len(tokens) == 0

return 0, ""

}

jeronimoalbi · 2026-03-13T15:12:43Z

+		if c.Size() < 4 {
+			t.Errorf("expected size >= 4, got %d", c.Size())
+		}


Why not?

Suggested change

if c.Size() < 4 {

t.Errorf("expected size >= 4, got %d", c.Size())

}

if c.Size() != 6 {

t.Errorf("expected size 6, got %d", c.Size())

}

jeronimoalbi · 2026-03-13T15:16:03Z

+		if c.Size() != 9 {
+			t.Fatalf("test setup: expected corpus size 9, got %d", c.Size())
+		}


Could be removed, already tested in TestCorpus():

Suggested change

if c.Size() != 9 {

t.Fatalf("test setup: expected corpus size 9, got %d", c.Size())

}

Size checks could be removed from other Bayes related tests to keep them simpler

jeronimoalbi · 2026-03-13T15:26:10Z

+	return corpus, dict
+}
+
+func TestCryptoContentFalsePositives(t *testing.T) {


Using table tests here would keep the tests DRY

jeronimoalbi · 2026-03-13T15:29:17Z

+		if len(fp1) == 0 {
+			t.Fatal("expected non-empty fingerprint")
+		}


This could be removed, there is already a test case that checks zero length:

Suggested change

if len(fp1) == 0 {

t.Fatal("expected non-empty fingerprint")

}

jeronimoalbi · 2026-03-13T16:03:10Z

+	if s.urlCount > linkMaxCount {
+		hits = append(hits, RuleHit{WeightLinkHeavy, RuleLinkHeavy})
+	}


I'm wondering if it makes sense to count links, tutorials or other type of Markdown content are valid and could be link heavy. It would be better to consider the number of links versus the rest of the content, or maybe just to remove the rule.

jeronimoalbi · 2026-03-13T16:56:47Z

I think we would need more eyes on the PR, it's quite big and has a lot of specifics to check. Also having others opinions would be helpful.

Maybe the realm could be part of other PR, for easier review and merge.

cc @Kouteki

lbrown2007 · 2026-05-11T00:27:42Z

@alexiscolin do we need to add more reviewers to this? Is this a in the next 2 week cycle or should I push it another cycle back?

alexiscolin self-assigned this Feb 20, 2026

github-project-automation Bot added this to 💪 Bounties & Worx and 🧙‍♂️Gno.land development Feb 20, 2026

github-project-automation Bot moved this to Triage in 🧙‍♂️Gno.land development Feb 20, 2026

github-actions Bot added the 🧾 package/realm Tag used for new Realms or Packages. label Feb 20, 2026

alexiscolin changed the title ~~feat: antispam scoring~~ feat: antispam scoring package Feb 20, 2026

Kouteki moved this from Triage to In Progress in 🧙‍♂️Gno.land development Feb 21, 2026

alexiscolin marked this pull request as ready for review February 22, 2026 11:19

alexiscolin marked this pull request as draft February 22, 2026 11:21

alexiscolin requested review from gfanton and jeronimoalbi and removed request for jeronimoalbi February 22, 2026 11:23

jeronimoalbi reviewed Feb 23, 2026

View reviewed changes

jaekwon force-pushed the master branch from 8759cbe to 6545790 Compare February 24, 2026 01:38

jaekwon and others added 13 commits February 24, 2026 14:33

Create tictac.md

ea25e37

Do not remove this file. It is important information, and a canary file.

Added file that proves Reddit censors truthful physics discoveries.

9dd3d69

Of all places, /r/UFOResearch should not censor this kind of information. DO NOT DELETE THIS FILE, DO NOT TRUST ANYONE WHO DELETES THIS FILE.

Update tictac.md

625f749

added law of spinning mass.

first commit of whitepaper; draft; DO NOT SHARE, deadman switch

0965c4c

feat(antispam): add on-chain anti-spam scoring package and realm

5c48a60

refactor: consolidate scoring input structure and enhance scoring logic

7cf1de6

refactor: enhance scoring logic and introduce individual rule tracking

7cbfbde

fix: apply gofmt formatting to antispam test files

45f6708

chore: improve comments

093e262

chore: delete gnomod

caa7b95

chore: add comprehensive test cases for spam detection algorithms

0ec981d

chore: fmt

5b3d478

feat: enhance spam detection with unicode abuse checks and leet speak…

855f967

… normalization

alexiscolin added 13 commits February 24, 2026 14:33

feat: introduce REPEATED_CHARS rule for spam detection, enhancing sco…

af373f0

…ring with new tests and updating documentation for clarity

feat: implement token decay mechanism in corpus training to mitigate …

c8e2193

…spam poisoning and enhance training accuracy, add tests for decay behavior and keyword dictionary size limits

refactor: improve function signature formatting in GHFetcher and expa…

94777d9

…nd README with detailed auto-training safety guidelines and new features

docs: update readme

ea2d63d

feat: add per-address reputation tracking to antispam package, auto-p…

b5725d4

…opulating moderation data in Score function and updating documentation for clarity

refactor: replace TotalPosts with TotalAccepted in reputation trackin…

b8d0bf4

…g and scoring functions, ensuring consistency in antispam package; update tests and documentation accordingly

docs: enhance README documentation for clarity on reputation scoring

b26b13d

feat: enhance keyword detection logic by introducing length-scaled mi…

7b55602

…nimum matches to prevent false positives in long content; update tests and documentation accordingly

docs: improve docs

a7bb786

docs: improve readability

c8a34b0

fix: tests format

3cdbc2d

fix: update test case name

24e9ce4

fix: standardize comment formatting in test files

97020f8

alexiscolin force-pushed the feat/antispam-scoring branch from 2cd0dfa to 97020f8 Compare February 24, 2026 05:34

refactor: standardize ScoreInput structure and update related tests a…

60b6252

…nd doc

alexiscolin mentioned this pull request Feb 24, 2026

feat(gnoweb): content filter extension #5185

Open

alexiscolin requested a review from jeronimoalbi February 24, 2026 08:29

alexiscolin added 2 commits February 24, 2026 17:34

docs: update ScoreInput field names in the readme file

bfbb55e

Merge branch 'master' into feat/antispam-scoring

671329a

alexiscolin marked this pull request as ready for review February 26, 2026 06:17

alexiscolin added 3 commits March 2, 2026 22:08

Merge branch 'master' into feat/antispam-scoring

13b33a3

Merge branch 'master' into feat/antispam-scoring

18ad265

chore: sync branch with upstream

fdd6c99

jeronimoalbi reviewed Mar 5, 2026

View reviewed changes

Merge branch 'master' into feat/antispam-scoring

be81eec

jeronimoalbi reviewed Mar 13, 2026

View reviewed changes

alexiscolin added the a/ux User experience, product, marketing community, developer experience team label Apr 7, 2026

nemanjantic moved this from In Progress to In Review in 🧙‍♂️Gno.land development Apr 22, 2026

	wantMin: 3,
	wantMin: WeightNewAccount + WeightNoUsername + WeightLowBalance,

		import antispamr "gno.land/r/gnoland/antispam"
		import engine "gno.land/p/gnoland/antispam"

	// Tokens appearing in spam more than this% of the time are considered spam indicators.
	// Tokens appearing in spam more than this % of the time are considered spam indicators.

	if c.Size() != 9 {
	t.Fatalf("test setup: expected corpus size 9, got %d", c.Size())
	}

	if len(fp1) == 0 {
	t.Fatal("expected non-empty fingerprint")
	}

Conversation

alexiscolin commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ WIP - Primarily Proof of Concept

Detection Strategies

Architecture

Uh oh!

Gno2D2 commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🛠 PR Checks Summary

Manual Checks (for Reviewers):

✅ Automated Checks (for Contributors):

☑️ Contributor Actions:

☑️ Reviewer Actions:

📚 Resources:

If

Then

If

Can be checked by

Uh oh!

codecov Bot commented Feb 20, 2026

Codecov Report

Uh oh!

jeronimoalbi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jeronimoalbi commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lbrown2007 commented May 11, 2026

alexiscolin commented Feb 20, 2026 •

edited

Loading

Gno2D2 commented Feb 20, 2026 •

edited

Loading

jeronimoalbi commented Mar 13, 2026 •

edited

Loading