Building Quality Software with AI or: How Test-from Risk to Reliability

The AI Development and Its Hidden Costs
Historical Context
LLMs: Capabilities and Fundamental Limitations
Philosophical Foundation: Why TDD Principles Are Critical in the AI Era
Test-Driven Generation (TDG)
Beyond TDD: Specification-Driven Development with GitHub Spec Kit
The Path Forward
Resources
Appendix A: The Risk Assessment Framework: Making Informed Decisions

The AI Development and Its Hidden Costs

We are living through the most significant shift in software development since the advent of high-level programming languages. Large Language Models (LLMs) like GPT, Claude, and GitHub Copilot have democratised code generation, enabling developers to produce functionality at unprecedented speed. A single well -crafted prompt can generate hundreds of lines of working code in seconds. Features that once took days to implement can be scaffolded in minutes.

Yet this remarkable capability comes with a profound challenge: speed without quality is just fast failure. Early adopters of generative AI in software development have discovered that while these tools excel at producing syntactically correct code, they struggle with the deeper aspects of software craftsmanship: maintainability, security, performance optimisation, and alignment with business requirements.

The fundamental issue isn't technical but philosophical. Most AI-assisted development approaches focus on generation first, validation second. Developers write prompts describing what they want, review the generated code, and iterate until it "looks right." This approach inherently inverts the quality-first principles that underpin reliable software engineering.

Historical Context

To understand why generative AI presents both opportunity and risk, we need to examine the evolution of software development practices. The history of programming is essentially the history of managing complexity at scale.

The Era of Individual Craftsmanship (1950s-1970s)

Early programming was highly individual. Programmers worked directly with machine code, then assembly language, crafting each instruction by hand. Quality control was implicit: you understood every line because you wrote every line. But this approach couldn't scale beyond individual developers working on relatively simple systems.

The Rise of Engineering Discipline (1980s-1990s)

As software systems grew more complex, the industry developed engineering disciplines: structured programming, modular design, code reviews, and systematic testing. These practices emerged from hard-learned lessons about what happens when software quality breaks down at scale: cost overruns, security breaches, and system failures that could impact entire organisations.

Agile and Test-Driven Practices (2000s-2010s)

The agile movement brought renewed focus on rapid feedback cycles and quality-first development. Test-Driven Development (TDD) emerged as a particularly powerful practice, not just for testing but for driving design decisions through the discipline of writing tests first. Kent Beck's insight was profound: when you write the test before the implementation, you're forced to think clearly about what the code should do before you think about how it should do it.

The AI Generation Challenge (2020s-Present)

Generative AI tools represent a return to rapid, individual code generation, but at a scale and speed that makes traditional quality control mechanisms inadequate. A developer can generate more code in an hour with AI assistance than they might normally write in a week. Traditional code review processes, designed around human-scale development velocity, become bottlenecks rather than safety nets.

LLMs: Capabilities and Fundamental Limitations

To use generative AI effectively in software development, we must understand what these tools actually are and are not. This understanding forms the foundation for any quality-driven approach to AI-assisted development.

The Anthropomorphization Problem

Martin Fowler's essay "Who is LLM?" highlights a critical cognitive bias in AI interaction: our tendency to attribute human-like qualities to these systems. When ChatGPT responds to a coding question with confidence and apparent expertise, it's natural to treat it as a knowledgeable colleague. When GitHub Copilot suggests elegant solutions to complex problems, we might assume it "understands" our code base the way an experienced developer would.

This anthropomorphization is more than just a philosophical curiosity; it has practical consequences. When we treat LLMs as thinking entities, we unconsciously adjust our verification standards. We might accept explanations that seem reasonable without checking implementation details, or trust suggested refactoring because the AI "seems confident" about its benefits.

Probabilistic Nature and Pattern Matching

LLMs generate responses through sophisticated statistical pattern matching based on their training data. They identify patterns in text (including code) and generate outputs that statistically resemble those patterns. This process can produce remarkably good results, but it operates fundamentally differently from human reasoning.

Consider a human developer implementing a payment processing function. They think about business rules, edge cases, security requirements, and integration points. They might consult documentation, consider error scenarios, and design with future maintenance in mind.

An LLM implementing the same function operates by pattern matching: it recognizes that "payment processing" typically involves certain code structures, library calls, and error handling patterns. It generates code that statistically resembles payment processing implementations in its training data. The result might be functionally correct, but it lacks the contextual reasoning that guides human implementation decisions.

The "Garbage In, Garbage Out" Amplification

LLMs amplify the quality of their inputs. Vague requirements produce vague implementations. Incomplete specifications lead to incomplete solutions. Missing context results in code that works in isolation but fails when integrated with existing systems.

This amplification effect is particularly dangerous because LLMs can make poor specifications look good. A superficial prompt like "create a user authentication system" might generate hundreds of lines of professional-looking code that handles basic login flows but completely ignores security best practices, scalability concerns, or integration requirements.

The Detectability Challenge

One of the most insidious aspects of AI-generated code is that errors aren't always obvious. Syntax errors are rare; LLMs excel at generating syntactically correct code. The problems typically lie in:

Logic errors: Code that compiles and runs but doesn't handle edge cases correctly
Security vulnerabilities: Implementations that work under normal conditions but expose attack vectors
Performance issues: Solutions that work with test data but fail under production load
Integration problems: Code that works in isolation but breaks when combined with existing systems
Maintainability issues: Solutions that solve immediate problems but create technical debt

These issues often surface weeks or months after implementation, making them expensive to fix and potentially damaging to system reliability.

Philosophical Foundation: Why TDD Principles Are Critical in the AI Era

Understanding TDD Beyond Testing

Test-Driven Development represents one of the most profound shifts in how we think about software construction. Despite its name, TDD is not primarily about testing: it's about design thinking, requirement clarification, and quality assurance built into the development process itself.

The traditional understanding of TDD focuses on the mechanical process: Red-Green-Refactor. Write a failing test, make it pass, clean up the code. But this misses the deeper philosophical insight that makes TDD powerful: tests are executable specifications of intent.

When we write a test before writing implementation code, we're forced to answer fundamental questions:

What exactly should this code do?
What inputs should it accept?
What outputs should it produce?
How should it behave in edge cases?
What constitutes failure, and how should failures be handled?

These questions become exponentially more important when the implementation will be generated by an AI system that operates through pattern matching rather than intentional reasoning.

The Imagination vs. Implementation Model

Mark Winteringham's model from "Software Testing with Generative AI" provides crucial insight into why TDD principles matter for AI-assisted development. He describes two overlapping circles:

Imagination Circle: What we want our software to do: our expectations, requirements, and intended behaviours, both explicit and implicit.
Implementation Circle: What our software actually does: its real behaviour under various conditions, including edge cases and failure modes.
Quality emerges from the alignment between these circles. The more they overlap, the higher our confidence that we're building the right thing correctly.

In traditional development, humans work in both circles simultaneously. We imagine what we want, then implement it with that imagination guiding our choices. AI-assisted development breaks this connection; the AI implements without imagination, relying instead on pattern matching from its training data.

TDD can restore this connection by forcing us to fully develop the imagination circle before the implementation circle. Our tests become the bridge between human intention and AI generation.

The Risk Amplification Problem

Without disciplined approaches like TDD, AI-assisted development amplifies existing software development risks:

Specification Drift: In traditional development, vague requirements lead to implementation guesswork. With AI generation, vague requirements lead to implementations that might work correctly by accident, or fail catastrophically in edge cases that weren't considered.
Technical Debt Accumulation: Human developers naturally consider maintainability as they code because they know they'll have to live with their decisions. AI systems optimise for "works now" without considering long-term consequences.
Security Vulnerabilities: Security requires thinking about what attackers might try to do: scenarios typically not covered in training data patterns. AI-generated code tends to handle happy paths well but often misses security considerations.
Integration Challenges: Real systems are complex networks of interacting components. AI systems excel at creating individual components but struggle with the subtle integration requirements that experienced developers intuitively understand.

TDD as a Communication Protocol with AI

In AI-assisted development, tests serve as more than quality assurance: they become our primary means of communicating intent to the AI system. This transforms the role of tests from validation tools to specification languages.

Consider these two approaches to implementing user authentication:

Approach 1 - Prompt-Driven:

Create a user authentication system with login and registration functionality.

Approach 2 - Test-Driven:

func TestUserRegistration(t *testing.T) {
	tests := []struct {
		name        string
		username    string
		email       string
		password    string
		expectError bool
		errorType   string
	}{
		{
			name:        "valid registration",
			username:    "john_doe",
			email:       "john@example.com",
			password:    "SecureP@ssw0rd!",
			expectError: false,
		},
		{
			name:        "duplicate email",
			username:    "jane_doe",
			email:       "john@example.com", // Same email as above
			password:    "AnotherP@ssw0rd!",
			expectError: true,
			errorType:   "ErrDuplicateEmail",
		},
		{
			name:        "weak password",
			username:    "weak_user",
			email:       "weak@example.com",
			password:    "123",
			expectError: true,
			errorType:   "ErrWeakPassword",
		},
		{
			name:        "invalid email",
			username:    "invalid_user",
			email:       "not-an-email",
			password:    "ValidP@ssw0rd!",
			expectError: true,
			errorType:   "ErrInvalidEmail",
		},
	}

	for _, tt := range tests {
		t.Run(tt.name, func(t *testing.T) {
			result, err := RegisterUser(tt.username, tt.email, tt.password)

			if tt.expectError {
				assert.Error(t, err)
				assert.Equal(t, tt.errorType, err.Error())
				assert.Nil(t, result)
			} else {
				assert.NoError(t, err)
				assert.NotNil(t, result)
				assert.Equal(t, tt.username, result.Username)
				assert.Equal(t, tt.email, result.Email)
				assert.NotEmpty(t, result.ID)
			}
		})
	}
}

The first approach leaves everything to AI interpretation. The second approach creates an unambiguous specification of exactly what registration should do under various conditions.

Test-Driven Generation (TDG)

Test-Driven Generation represents a fundamental shift in how we approach AI-assisted development. Instead of generating code first and validating second, TDG puts quality constraints first and uses AI generation as a tool to satisfy those constraints.

The TDG Philosophy

TDG operates on three core principles:

Tests as Executable Specifications: Tests define exactly what the code should do, serving as unambiguous communication between human intent and AI generation.
Constraint-Driven Generation: Rather than asking AI "what should this code do?", TDG asks "how can this code satisfy these specific constraints?"
Continuous Validation: Every generated component is immediately validated against comprehensive tests, creating a tight feedback loop that catches problems early.

The TDG Cycle

The TDG process expands the traditional Red-Green-Refactor cycle:

Red: Write tests that specify desired behaviour, including edge cases and error conditions
Generate: Use AI to create an implementation that attempts to satisfy the tests
Validate: Run tests against the generated implementation
Iterate: If tests fail, refine the generation prompt and repeat
Green: When all tests pass, the basic functionality is complete
Review: Human review for non-functional requirements (security, performance, maintainability)
Refactor: Improve the implementation while maintaining test coverage

TDG in Practice: Building a Secure API Handler

Let's see how TDG can be applied with an example: generate a JWT authentication middleware for a Go web service. Instead of starting with a prompt like Create JWT middleware for Gin that validates tokens and extracts user information, we begin with tests that specify exactly what secure authentication should do.

Step 1: Write Security-Focused Tests

See this commit

Step 2: Generate Implementation to Satisfy Tests

With tests in place, we can now prompt an AI system (we used Claude) with specific constraints:

Prompt:

Implement a JWT middleware function in authentication/auth.go for Gin that satisfies these test requirements:
1. Must validate Bearer token format in Authorization header
2. Must verify token signature using provided signing key and method
3. Must check token expiration
4. Must validate required claims (user_id)
5. Must handle all error cases specified in tests
6. Must set user_id in Gin context for valid tokens

Here are the failing tests: authentication/auth_test.go

Generate an implementation that makes all tests pass.

This approach forces the AI to address each security requirement explicitly rather than relying on pattern matching from potentially insecure training examples.

See this commit

Step 3: Iterative Refinement

If the first generated implementation doesn't pass all tests, the failing tests provide specific feedback for refinement. For example, if the AI generates code that doesn't handle the Bearer prefix correctly, the test failure makes this immediately obvious, and we can refine the prompt (in our sample repo, we didn’t need to do this step, because the first iteration made all tests pass):

Refined Prompt:

The implementation failed the 'malformed authorization header' test. The code must specifically check for 'Bearer '
prefix and reject tokens that don't use this format. Update the implementation.

Step 4: Human Review for Non-Functional Requirements

Once all tests pass, human review focuses on aspects that tests might not capture:

Code readability and maintainability
Performance characteristics
Security best practices beyond functional requirements
Integration with existing systems
Error logging and monitoring

See this commit

Benefits of the TDG Approach

Explicit Security Requirements: Security constraints are encoded in tests rather than assumed to be implicit in AI generation.
Comprehensive Edge Case Coverage: Tests force consideration of failure modes and edge cases that AI might otherwise miss.
Immediate Feedback: Failed tests provide specific, actionable feedback for improving generated code.
Documentation: Tests serve as living documentation of exactly what the code should do.
Regression Prevention: As the codebase evolves, tests ensure that AI-generated modifications don't break existing functionality.
Reduced Review Burden: Human reviewers can focus on high-level design and non-functional requirements rather than basic correctness.

Beyond TDD: Specification-Driven Development with GitHub Spec Kit

While Test-Driven Generation provides a solid foundation for quality AI-assisted development, we can extend these principles even further with comprehensive specification-driven approaches. GitHub's Spec Kit represents an evolution beyond traditional TDD, emphasising complete requirements specification before any code generation begins; though this shift can come at the cost of losing smaller feedback loops and rapid iterations that TDD/TDG naturally encourages, sometimes leading to rapid code bloat as large specifications generate expansive scaffolding before validation.

The Limitations of Test-Only Approaches

TDD and TDG are powerful, but they have inherent limitations when applied to complex systems:

Component-Level Focus: Tests naturally focus on individual functions or classes. System-level behaviour and integration patterns are harder to capture in unit tests.
Implementation Bias: Even well-written tests can inadvertently bias toward particular implementation approaches, potentially limiting AI exploration of alternative solutions.
Context Gaps: Tests capture what code should do, but they don't always capture why it should do it, or how it fits into broader business objectives.
Non-Functional Requirements: Performance, scalability, maintainability, and other architectural concerns are difficult to encode in traditional tests.

Specification-Driven Development Philosophy

Specification-Driven Development (SDD) addresses these limitations by establishing comprehensive requirements documentation before any implementation work begins. In the context of AI-assisted development, this creates a more complete "imagination circle" that guides AI generation toward solutions that satisfy both functional and non-functional requirements.

SDD operates on several levels:

Constitutional Principles: Fundamental values and constraints that should guide all development decisions
Behavioural Specifications: Detailed descriptions of how the system should behave under various conditions
Technical Constraints: Performance, security, scalability, and integration requirements
User Journey Mapping: End-to-end workflows that show how individual components combine to create user value

The Spec Kit Workflow

GitHub's Spec Kit provides a structured approach to specification-driven development that works with AI assistance. In our example, we are using Spec Kit to implement a Kafka producer in Go.

Establish project principles ( /speckit.constitution)

We use the /speckit.constitution development guidelines that will guide all subsequent development:

Prompt:

Create principles focused on code quality, testing standards, user experience consistency, and performance
requirements

See this commit

This commit updates the project's constitution file (.specify/memory/constitution.md) to establish governing principles and development guidelines for the project. It restructures and expands the constitution focusing on code quality standards, testing requirements, and performance criteria that will guide all future development work. The produced constitution defines core principles (e.g. code quality first, testing standards, user experience consistency, performance requirements), specific performance benchmarks, development workflow requirements including mandatory code reviews and testing gates, and governance processes for maintaining constitutional compliance across all future development.

Create the spec (`/speckit.specify`)

Then we use /speckit.specify to describe what we want to build. We focus on the what and why, not the tech stack:

Prompt:

Build a service that publishes messages to specified Kafka topics, handling serialization, partitioning, and delivery
acknowledgments to ensure reliable and efficient data transmission. It encapsulates configuration for retries,
batching, and error handling to guarantee message delivery semantics.

See this commit

This commit creates the complete feature specification for a Kafka producer service by adding two new files in a new feature branch: a detailed specification document (specs/001-kafka-producer/spec.md) and a requirements checklist (specs/001-kafka-producer/checklists/requirements.md) . The specification defines a service that publishes messages to Kafka topics with reliable delivery guarantees, focusing on the "what and why" rather than technical implementation. It includes three prioritised user stories (basic publishing, error handling, performance optimisation), functional requirements covering message handling and delivery semantics, measurable success criteria, and comprehensive edge cases for production scenarios. The accompanying checklist validates specification completeness and confirms readiness for the planning phase.

Create a technical implementation plan (`/speckit.plan`)

We use /speckit.plan to provide the tech stack and architecture choices:

Prompt:

The app uses Golang with minimal number of libraries. Use Golang's standard library as much as possible instead of
3rd party libraries.

See this commit

This commit creates the complete implementation plan for the Kafka producer service using Golang with minimal dependencies by adding comprehensive documentation files. The plan includes a detailed technical architecture (plan.md) specifying Go 1.21+ with only essential Kafka client libraries (shopify/sarama or confluent-kafka-go) and standard library usage, constitutional compliance verification against code quality and performance requirements, project structure following Go conventions, and supporting materials including API contracts (contracts/api.yaml), data models (data-model.md), quick-start guide (quickstart.md), and research documentation (research.md)

Break down into tasks (`/speckit.tasks`)

Then we use /speckit.tasks to create an actionable task list from the implementation plan.

See this commit

This commit creates a comprehensive actionable task list (specs/001-kafka-producer/tasks.md) that breaks down the Kafka producer service implementation into specific, executable tasks organised across multiple phases. The task list follows Test-Driven Development principles with tests written first before implementation, organises work by user story priority (P1: Basic Publishing, P2: Error Handling, P3: Performance Optimisation), includes detailed dependency tracking and parallel execution opportunities marked with [P] tags, provides multiple implementation strategies (MVP-first, incremental delivery, parallel team approach), and ensures each user story can be independently developed and tested. The tasks cover everything from initial Go module setup through final deployment documentation, with clear checkpoints for validating each user story's functionality before proceeding to the next phase.

Execute implementation (`/speckit.implement`)

We use /speckit.implement to execute all tasks and build the feature according to the plan:

Claude output:

The MVP (Minimum Viable Product) foundation is complete! User Story 1 can now be:
- Deployed independently - All components functional
- Tested independently - Full test suite available
- Demonstrated - API endpoints operational
- Extended - Ready for User Stories 2 & 3

See this commit

This commit implements the complete MVP Kafka producer service by executing all planned tasks and creating Go code. The implementation includes the full project structure with main application entry point (cmd/kafka-producer/main.go), internal modules for configuration, health checks, metrics, message models, producer logic, serialisation, and HTTP server handling, Docker containerisation, Go module setup with required dependencies, test suite covering unit tests, integration tests with em bedded Kafka, and contract tests for API endpoints. We then apply /speckit.implement in subsequent iterations to implement other user stories. See the corresponding commits in the repository.

Advantages of Specification-Driven AI Development

Holistic System Thinking: Specifications force consideration of how components interact within larger systems, leading to better AI-generated integration code.
Business Alignment: Constitutional principles ensure that AI-generated solutions align with business objectives, not just technical requirements.
Quality Gates: Multiple specification levels create quality gates that catch problems before they reach implementation.
Better Prompts: Comprehensive specifications enable much more targeted and effective AI prompts.
Stakeholder Communication: Specifications serve as communication tools between technical and business stakeholders, ensuring alignment before implementation begins.

Combining SDD with TDG

The most effective approach combines specification-driven planning with test-driven generation:

Constitutional Definition: Establish principles and constraints
Behavioural Specification: Define comprehensive requirements
Technical Planning: Create implementation architecture
Test Generation: Convert specifications into comprehensive test suites
AI Implementation: Generate code to satisfy tests and specifications
Human Review: Validate alignment with constitutional principles

This combined approach provides both the comprehensive planning benefits of SDD and the quality assurance benefits of TDG.

The Path Forward

The integration of generative AI into software development represents the most significant shift in our industry since the advent of high-level programming languages. The potential for productivity gains is enormous: teams can generate working functionality at unprecedented speed and explore implementation alternatives that would have been cost-prohibitive to develop manually.

However, this potential comes with corresponding risks. Without appropriate quality controls, AI-generated code can introduce subtle bugs, security vulnerabilities, and maintainability challenges that compound over time. The question facing development teams is not whether to adopt AI assistance, but how to adopt it responsibly.

The Quality-First Imperative

Teams that prioritise generation speed over code quality create technical debt faster than they can resolve it. Teams that establish quality-first processes (e.g. Test-Driven Generation or Specification-Driven Development) can achieve both speed and reliability.

This shift requires recognising that AI tools are inference engines, not compilers. They generate probabilistic outputs based on pattern matching, not deterministic results based on formal specifications. This fundamental difference means that traditional quality assurance approaches, designed around predictable human development patterns, are insufficient for AI-generated code.

The Evolution of Developer Skills

AI-assisted development doesn't diminish the importance of developer expertise; it transforms it. Instead of spending time on syntactic code construction, developers focus on:

Requirement Specification: Writing comprehensive, unambiguous descriptions of what software should do
Quality Design: Creating test suites and specifications that constrain AI generation toward correct solutions
Risk Assessment: Evaluating when AI assistance is appropriate and what level of oversight is required
System Integration: Ensuring AI-generated components work correctly within larger systems
Architectural Thinking: Making high-level design decisions that AI tools cannot make independently

These skills represent an evolution toward higher-level thinking about software systems. Developers become architects and quality orchestrators rather than code typists.

The Organisational Transformation

Successful AI adoption requires more than individual skill development: it demands organisational commitment to quality-first processes. This includes:

Cultural Change: Moving from "ship fast, fix later" to "specify completely, generate correctly"
Process Evolution: Establishing TDG and SDD workflows that constrain AI output toward quality
Measurement Systems: Tracking quality and velocity metrics that provide empirical feedback on AI effectiveness
Continuous Learning: Building organisational capability to evolve AI practices as tools improve

Looking Ahead

The AI development landscape continues evolving rapidly. New models, better training techniques, and improved tooling will increase AI capability and reduce error rates. However, the fundamental challenges like specification quality, risk assessment, and human oversight will remain central to responsible AI adoption.

The teams that establish disciplined approaches to AI-assisted development now will be best positioned to leverage future improvements. They will have the processes, skills, and organisational culture needed to harness more powerful AI tools while maintaining the quality standards that customers and businesses depend on.

The future of software development lies not in choosing between human expertise and AI capability, but in combining both through disciplined, quality-first processes. Test-Driven Development principles provide the foundation for this combination, ensuring that as AI tools become more powerful, the software they help create becomes more reliable.

The code may be generated by AI, but the responsibility for its quality, security, and impact remains firmly in human hands. TDD and specification-driven approaches can ensure we're equipped to handle that responsibility effectively, transforming AI from a source of risk into a tool for building better software faster.

Resources

Appendix A: The Risk Assessment Framework: Making Informed Decisions

Not all development tasks carry the same risk when AI-assisted. Birgitta Böckeler's three-dimensional risk framework provides a practical approach to calibrating our quality processes:

Probability: How likely is AI to make mistakes?

Low Probability Scenarios (Trust with light oversight):
- Standard CRUD operations following well-established patterns
- Simple data transformations with clear input/output specifications
- Boilerplate code generation (handlers, models, basic APIs)
- Test fixture creation and mock data generation
Medium Probability Scenarios (Moderate oversight required):
- Business logic implementation with complex rules
- Integration code between well-understood systems
- Performance optimisation of existing algorithms
- Error handling and logging implementation
High Probability Scenarios (Heavy oversight required):
- Security-sensitive operations (authentication, authorisation, cryptography)
- Complex algorithms with subtle correctness requirements
- Novel integration patterns or experimental approaches
- Code involving financial calculations or regulatory compliance

Impact: What happens if AI gets it wrong?

Low Impact (Learning opportunities):
- Development tooling and internal scripts
- Non-customer-facing features in development environments
- Prototype code and proof-of-concepts
- Documentation and internal process automation
Medium Impact (Quality gates required):
- Customer-facing features with graceful degradation
- Performance improvements to existing systems
- New functionality with comprehensive roll back plans
- Internal tools used by team members
High Impact (Multiple validation layers):
- Customer data processing and storage
- Financial transactions and billing logic
- Security and authentication systems
- Core business logic affecting revenue or compliance

Detectability: How easily can problems be spotted?

High Detectability (Faster feedback loops):
- Compilation errors and type mismatches
- Test failures and obvious runtime exceptions
- Performance regressions caught by benchmarks
- Integration failures in development environments
Medium Detectability (Standard review processes):
- Logic errors caught by comprehensive test suites
- API contract violations detected by integration tests
- Code quality issues identified by static analysis
- Performance issues visible under realistic load
Low Detectability (Enhanced scrutiny required):
- Subtle security vulnerabilities
- Race conditions and concurrency bugs
- Memory leaks and resource management issues
- Business logic errors that manifest only in edge cases

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.claude/commands		.claude/commands
.specify		.specify
authentication		authentication
bin		bin
cmd/kafka-producer		cmd/kafka-producer
configs		configs
docs		docs
internal		internal
specs/001-kafka-producer		specs/001-kafka-producer
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
.golangci.yml		.golangci.yml
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
README.md		README.md
go.mod		go.mod
go.sum		go.sum
kafka-producer		kafka-producer

Folders and files

Latest commit

History

Repository files navigation

Building Quality Software with AI or: How Test-from Risk to Reliability

The AI Development and Its Hidden Costs

Historical Context

The Era of Individual Craftsmanship (1950s-1970s)

The Rise of Engineering Discipline (1980s-1990s)

Agile and Test-Driven Practices (2000s-2010s)

The AI Generation Challenge (2020s-Present)

LLMs: Capabilities and Fundamental Limitations

The Anthropomorphization Problem

Probabilistic Nature and Pattern Matching

The "Garbage In, Garbage Out" Amplification

The Detectability Challenge

Philosophical Foundation: Why TDD Principles Are Critical in the AI Era

Understanding TDD Beyond Testing

The Imagination vs. Implementation Model

The Risk Amplification Problem

TDD as a Communication Protocol with AI

Test-Driven Generation (TDG)

The TDG Philosophy

The TDG Cycle

TDG in Practice: Building a Secure API Handler

Step 1: Write Security-Focused Tests

Step 2: Generate Implementation to Satisfy Tests

Step 3: Iterative Refinement

Step 4: Human Review for Non-Functional Requirements

Benefits of the TDG Approach

Beyond TDD: Specification-Driven Development with GitHub Spec Kit

The Limitations of Test-Only Approaches

Specification-Driven Development Philosophy

The Spec Kit Workflow

Establish project principles ( /speckit.constitution)

Create the spec (/speckit.specify)

Create a technical implementation plan (/speckit.plan)

Break down into tasks (/speckit.tasks)

Execute implementation (/speckit.implement)

Advantages of Specification-Driven AI Development

Combining SDD with TDG

The Path Forward

The Quality-First Imperative

The Evolution of Developer Skills

The Organisational Transformation

Looking Ahead

Resources

Appendix A: The Risk Assessment Framework: Making Informed Decisions

Probability: How likely is AI to make mistakes?

Impact: What happens if AI gets it wrong?

Detectability: How easily can problems be spotted?

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages

Create the spec (`/speckit.specify`)

Create a technical implementation plan (`/speckit.plan`)

Break down into tasks (`/speckit.tasks`)

Execute implementation (`/speckit.implement`)