Skip to content

feat: antispam scoring package#5178

Open
alexiscolin wants to merge 40 commits into
gnolang:masterfrom
alexiscolin:feat/antispam-scoring
Open

feat: antispam scoring package#5178
alexiscolin wants to merge 40 commits into
gnolang:masterfrom
alexiscolin:feat/antispam-scoring

Conversation

@alexiscolin
Copy link
Copy Markdown
Member

@alexiscolin alexiscolin commented Feb 20, 2026

⚠️ WIP - Primarily Proof of Concept

Needs: boards2 integration for testing, threshold tuning, and weight feedback before merge.

Companion to #5185

SpamAssassin-style scoring for Gno realms. Multi-rule engine (19 rules) combining content checks, rate limiting, reputation, Bayesian filter, duplicate detection, and blocklists. Scores accumulate - above 5 might hide, above 8 might reject. Multiple signals make the system harder to game.

Two components:

  • p/gnoland/antispam - Pure scoring library (import + customize)
  • r/gnoland/antispam - Shared realm with defaults (21 scam patterns, 400 keywords)

Detection Strategies

  • Content heuristics: all caps, punctuation spam, repeated chars, link-heavy posts, short posts that are just a URL
  • Unicode tricks: zalgo text, invisible chars, cyrillic-in-latin homoglyphs
  • Rate limiting: half-weight for moderate bursts, full weight for flooding
  • Account reputation: age, balance, username, flag/ban history
  • Naive Bayesian filter: probabilistic text classifier that needs 3+ spam-heavy tokens to fire
  • MinHash duplicate detection: catches copy-paste (duplicate) spam waves across realms
  • Regex blocklists: common scam formats ("send X tokens", email addresses...) and address blocklist (instant blocked)
  • Keyword co-occurrence: multiple spam keywords together trigger, single words don't. Leet-speak normalization ("fr33" → "free"), weighted 1-3

Architecture

Package (p/gnoland/antispam)

  • Pure functions, no state, no side effects
  • Realms own state (Corpus, FingerprintStore, Blocklist, KeywordDict)
  • Customize weights, thresholds, patterns per realm

Realm (r/gnoland/antispam)

  • Shared on-chain state (pre-trained corpus, blocklist, keywords)
  • Admin-only training and pattern management
  • Per-address reputation tracking (flags, bans, accepted posts)
  • Any realm can Score(), only registered realms can RecordAccepted/Flag/Ban

Score() is read-only - does NOT auto-update reputation. Calling realm must manually record moderation decisions (prevents false positives from poisoning shared data) - Admin only.

See pure README and realm README for planned detailed usage, gas optimization, and integration patterns.

@Gno2D2
Copy link
Copy Markdown
Collaborator

Gno2D2 commented Feb 20, 2026

🛠 PR Checks Summary

All Automated Checks passed. ✅

Manual Checks (for Reviewers):
  • IGNORE the bot requirements for this PR (force green CI check)
Read More

🤖 This bot helps streamline PR reviews by verifying automated checks and providing guidance for contributors and reviewers.

✅ Automated Checks (for Contributors):

🟢 Maintainers must be able to edit this pull request (more info)

☑️ Contributor Actions:
  1. Fix any issues flagged by automated checks.
  2. Follow the Contributor Checklist to ensure your PR is ready for review.
    • Add new tests, or document why they are unnecessary.
    • Provide clear examples/screenshots, if necessary.
    • Update documentation, if required.
    • Ensure no breaking changes, or include BREAKING CHANGE notes.
    • Link related issues/PRs, where applicable.
☑️ Reviewer Actions:
  1. Complete manual checks for the PR, including the guidelines and additional checks if applicable.
📚 Resources:
Debug
Automated Checks
Maintainers must be able to edit this pull request (more info)

If

🟢 Condition met
└── 🟢 And
    ├── 🟢 The base branch matches this pattern: ^master$
    └── 🟢 The pull request was created from a fork (head branch repo: alexiscolin/gno)

Then

🟢 Requirement satisfied
└── 🟢 Maintainer can modify this pull request

Manual Checks
**IGNORE** the bot requirements for this PR (force green CI check)

If

🟢 Condition met
└── 🟢 On every pull request

Can be checked by

  • Any user with comment edit permission

@alexiscolin alexiscolin changed the title feat: antispam scoring feat: antispam scoring package Feb 20, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 20, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@Kouteki Kouteki moved this from Triage to In Progress in 🧙‍♂️Gno.land development Feb 21, 2026
@alexiscolin alexiscolin marked this pull request as ready for review February 22, 2026 11:19
@alexiscolin alexiscolin marked this pull request as draft February 22, 2026 11:21
@alexiscolin alexiscolin requested review from gfanton and jeronimoalbi and removed request for jeronimoalbi February 22, 2026 11:23
Copy link
Copy Markdown
Member

@jeronimoalbi jeronimoalbi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to still go though the code, lots to cover. It's interesting though 👍

Comment thread examples/gno.land/p/gnoland/antispam/antispam.gno
Comment thread examples/gno.land/p/gnoland/antispam/antispam.gno
// When EarlyExitAt is set to a positive threshold, each earlyExit
// check can short-circuit before reaching costlier rules.
// Use EarlyExitDisabled (0, the default) to evaluate all rules.
func Score(in ScoreInput) SpamScore {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a though but it might be worth exploring a Score() implementation that relies on a rule scoring interface based solution. If each one of the scoring rules follows an interface I think it probably would be possible to implement Score() as basically a loop that iterates the rules and updates the SpamScore instance. Maybe something like ScoringRule.Score(ScoreInput) (score int)?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion! I actually chose the procedural approach to keep the gas optimization strategy explicit. Rules are ordered cheap - to -> expensive with earlyExit checks between groups, so obvious spam never hits regex/Bayes/fingerprints. An interface pipeline would hide that pattern and make it harder to maintain.

That said, no rule plugins are planned right now: but what if that becomes a need down the line? Would refactoring to an interface then be the right move (wagni or not?), or do you see a better approach?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The interface pipeline could also be ordered and a comment could be used to describe the gas optimization strategy.

Another lighter approach might to just use a scoring function type:

type ScoringFunc func(ScoreInput) []RuleHit

// ...

rules := []ScoringFunc{
  ScoreBlocklist,
  ScoreRate,
  ScoreReputation,
  // ...
}

In any case is just a recommendation, if it makes sense, to avoid defining the inline functions, the earlyExit() calls and reducing the function's body.

Comment thread examples/gno.land/p/gnoland/antispam/antispam_test.gno Outdated
Comment thread examples/gno.land/r/gnoland/antispam/antispam.gno
…ring with new tests and updating documentation for clarity
…spam poisoning and enhance training accuracy, add tests for decay behavior and keyword dictionary size limits
…nd README with detailed auto-training safety guidelines and new features
…opulating moderation data in Score function and updating documentation for clarity
…g and scoring functions, ensuring consistency in antispam package; update tests and documentation accordingly
…nimum matches to prevent false positives in long content; update tests and documentation accordingly
@alexiscolin alexiscolin force-pushed the feat/antispam-scoring branch from 2cd0dfa to 97020f8 Compare February 24, 2026 05:34
@alexiscolin alexiscolin marked this pull request as ready for review February 26, 2026 06:17
Comment on lines +156 to +159
// Truncate oversized input to cap gas cost.
if len(in.Content) > MaxInputLength {
in.Content = in.Content[:MaxInputLength]
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Truncating the content might leave relevant pieces out of the scoring, I think is not a good idea truncating it within the package 🤔, devs can truncate it when calling Score() from a realm.

Comment on lines +64 to +66
func (bl *Blocklist) AllowAddress(addr string) {
bl.allowed.Set(addr, true)
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor improvement, it could be applied to the other cases too:

Suggested change
func (bl *Blocklist) AllowAddress(addr string) {
bl.allowed.Set(addr, true)
}
func (bl *Blocklist) AllowAddress(addr string) {
bl.allowed.Set(addr, struct{}{})
}

// rebuildCombined compiles all patterns into a single alternation regex.
// Called when patterns are added or removed. Compilation cost is paid once
// at admin time, not per Score() call.
func (bl *Blocklist) rebuildCombined() {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not using individual ones instead of combining them? Compiling a big expression or even evaluating it might end up being more expensive. Also with individual ones you would be able to potentially stop early.

With that change you could provably also remove the limit of 30 expressions.

// ReputationData holds caller-provided account context for an author.
type ReputationData struct {
AccountAgeDays int
Balance int64
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe:

Suggested change
Balance int64
Balance chain.Coins

If so then later on you could do Balance.AmountOf(string) int64, which could be used in the future to improve the reputation implementation.

AccountAgeDays int
Balance int64
FlaggedCount int
TotalAccepted int
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe something like ValidContentCount or AcceptedContentCount? In any case some field documentation would be handy to better understand the field 🙏

}

const (
repMinAgeDays = 1 // accounts younger than this are "new"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT about using 15 or 30 days instead of 1? Reasoning is that within that period account is new and might be starting with the first interactions.


// Ban history penalty
if rep.BanCount > 0 {
penalty := rep.BanCount * repBanPenalty
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now repBanPenalty is 1, so it could be removed 🤔

HasUsername: false,
BanCount: 0,
},
wantMin: 3,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test would benefit from using the constants:

Suggested change
wantMin: 3,
wantMin: WeightNewAccount + WeightNoUsername + WeightLowBalance,

Comment on lines +10 to +11
import antispamr "gno.land/r/gnoland/antispam"
import engine "gno.land/p/gnoland/antispam"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
import antispamr "gno.land/r/gnoland/antispam"
import engine "gno.land/p/gnoland/antispam"
import (
antispamr "gno.land/r/gnoland/antispam"
engine "gno.land/p/gnoland/antispam"
)

Comment on lines +192 to +197
if corpus == nil || corpus.Size() < bayesMinCorpusSize {
return 0, ""
}
if len(tokens) == 0 {
return 0, ""
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be joined:

Suggested change
if corpus == nil || corpus.Size() < bayesMinCorpusSize {
return 0, ""
}
if len(tokens) == 0 {
return 0, ""
}
if corpus == nil || corpus.Size() < bayesMinCorpusSize || len(tokens) == 0 {
return 0, ""
}

bayesMinCorpusSize = 10

// bayesSpamThresholdPct is the spam ratio threshold (percentage).
// Tokens appearing in spam more than this% of the time are considered spam indicators.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo

Suggested change
// Tokens appearing in spam more than this% of the time are considered spam indicators.
// Tokens appearing in spam more than this % of the time are considered spam indicators.

// new observation. This decay gives recent observations more weight and
// prevents corpus poisoning from being permanent - essential for
// auto-training scenarios where moderation actions feed the corpus.
func (c *Corpus) Train(content string, isSpam bool) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, wouldn't some ham tokens potentially be considered spam if they are used a lot when content is trained as spam?

// normalizeLeet converts common leet speak digit substitutions back to letters.
// Only handles digit-based leet (0->o, 1->i, 3->e, 4->a, 5->s, 7->t) since
// symbol-based leet (@, $) is stripped during tokenization.
func normalizeLeet(s string) string {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be public, it might be handy for other packages 👍

Comment on lines +204 to +209
if dict == nil || dict.Size() == 0 {
return 0, ""
}
if len(tokens) == 0 {
return 0, ""
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also possible:

Suggested change
if dict == nil || dict.Size() == 0 {
return 0, ""
}
if len(tokens) == 0 {
return 0, ""
}
if dict == nil || dict.Size() == 0 || len(tokens) == 0
return 0, ""
}

Comment on lines +154 to +156
if c.Size() < 4 {
t.Errorf("expected size >= 4, got %d", c.Size())
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not?

Suggested change
if c.Size() < 4 {
t.Errorf("expected size >= 4, got %d", c.Size())
}
if c.Size() != 6 {
t.Errorf("expected size 6, got %d", c.Size())
}

Comment on lines +282 to +284
if c.Size() != 9 {
t.Fatalf("test setup: expected corpus size 9, got %d", c.Size())
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be removed, already tested in TestCorpus():

Suggested change
if c.Size() != 9 {
t.Fatalf("test setup: expected corpus size 9, got %d", c.Size())
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Size checks could be removed from other Bayes related tests to keep them simpler

return corpus, dict
}

func TestCryptoContentFalsePositives(t *testing.T) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using table tests here would keep the tests DRY

Comment on lines +13 to +15
if len(fp1) == 0 {
t.Fatal("expected non-empty fingerprint")
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be removed, there is already a test case that checks zero length:

Suggested change
if len(fp1) == 0 {
t.Fatal("expected non-empty fingerprint")
}

Comment on lines +67 to +69
if s.urlCount > linkMaxCount {
hits = append(hits, RuleHit{WeightLinkHeavy, RuleLinkHeavy})
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if it makes sense to count links, tutorials or other type of Markdown content are valid and could be link heavy. It would be better to consider the number of links versus the rest of the content, or maybe just to remove the rule.

@jeronimoalbi
Copy link
Copy Markdown
Member

jeronimoalbi commented Mar 13, 2026

I think we would need more eyes on the PR, it's quite big and has a lot of specifics to check. Also having others opinions would be helpful.

Maybe the realm could be part of other PR, for easier review and merge.

cc @Kouteki

@alexiscolin alexiscolin added the a/ux User experience, product, marketing community, developer experience team label Apr 7, 2026
@nemanjantic nemanjantic moved this from In Progress to In Review in 🧙‍♂️Gno.land development Apr 22, 2026
@lbrown2007
Copy link
Copy Markdown

@alexiscolin do we need to add more reviewers to this? Is this a in the next 2 week cycle or should I push it another cycle back?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

a/ux User experience, product, marketing community, developer experience team 🧾 package/realm Tag used for new Realms or Packages.

Projects

Status: No status
Status: In Review

Development

Successfully merging this pull request may close these issues.

7 participants