Future Improvements

Potential enhancements to the TableSleuth CDK infrastructure.

Priority 1: Cost Optimization

1. Spot Instance Support

Current Status: The ec2.Instance L2 construct in AWS CDK does not support spot instances directly.

Issue: When attempting to use spot instances, the following error occurs:

AttributeError: module 'aws_cdk.aws_ec2' has no attribute 'SpotInstanceOptions'

Root Cause: The ec2.Instance construct is designed for on-demand instances only. AWS CDK does not provide a SpotInstanceOptions class or spot instance support for the high-level ec2.Instance construct.

Current Implementation: The CDK stack uses on-demand instances only. The use_spot configuration option is preserved in cdk.json but currently has no effect. A warning is printed if use_spot is set to true.

Improvement: Refactor to use Launch Templates with Auto Scaling Groups

# Create launch template
launch_template = ec2.LaunchTemplate(
    self,
    "TablesleuthLaunchTemplate",
    instance_type=ec2.InstanceType(self.config.instance_type),
    machine_image=machine_image,
    security_group=self.security_group,
    role=self.instance_role,
    user_data=ec2.UserData.custom(self._get_user_data()),
    block_devices=[
        ec2.BlockDevice(
            device_name="/dev/xvda",
            volume=ec2.BlockDeviceVolume.ebs(
                volume_size=30,
                encrypted=True,
                delete_on_termination=True,
            ),
        )
    ],
)

# Create Auto Scaling Group with spot
asg = autoscaling.AutoScalingGroup(
    self,
    "TablesleuthASG",
    vpc=self.vpc,
    launch_template=launch_template,
    min_capacity=1,
    max_capacity=1,
    desired_capacity=1,
    spot_price="0.05" if self.config.use_spot else None,
)

Alternative Approaches:

Option 1: EC2 Fleet (L1 Construct)

fleet = ec2.CfnEC2Fleet(
    self,
    "SpotFleet",
    launch_template_configs=[
        ec2.CfnEC2Fleet.FleetLaunchTemplateConfigRequestProperty(
            launch_template_specification=ec2.CfnEC2Fleet.FleetLaunchTemplateSpecificationRequestProperty(
                launch_template_id=launch_template.launch_template_id,
                version="$Latest",
            )
        )
    ],
    target_capacity_specification=ec2.CfnEC2Fleet.TargetCapacitySpecificationRequestProperty(
        total_target_capacity=1,
        default_target_capacity_type="spot",
    ),
    spot_options=ec2.CfnEC2Fleet.SpotOptionsRequestProperty(
        allocation_strategy="lowestPrice",
        instance_interruption_behavior="terminate",
    ),
)

Option 2: AWS Batch For workloads that can tolerate interruptions:

compute_env = batch.FargateComputeEnvironment(
    self,
    "SpotComputeEnv",
    vpc=self.vpc,
    spot=True,
)

Cost Comparison:

Instance Type	On-Demand	Spot	Savings
t3.small	~~$0.02/hour (~~$15/month)	~~$0.006/hour (~~$4.50/month)	70%
m4.xlarge	~~$0.20/hour (~~$144/month)	~~$0.06/hour (~~$43/month)	70%

Benefits:

70% cost savings for dev/staging environments
Configurable via useSpot in cdk.json
Maintains same functionality

Recommendation: For the TableSleuth use case (development/testing environment), the current on-demand implementation is acceptable because:

Simplicity - The ec2.Instance construct is simpler and more straightforward
Stability - No risk of spot instance interruptions during analysis
Cost - Using t3.small for dev environment is already very affordable (~$15/month)
Stop When Not in Use - Users can stop the instance when not in use to save costs

If cost optimization is critical, implement Option 1 (Launch Templates with Auto Scaling Group) in a future version.

References:

Priority 2: Security Enhancements

1. AWS Secrets Manager Integration

Current: Passwords in environment variables

Improvement:

# Store password in Secrets Manager
password = secretsmanager.Secret(
    self,
    "GizmoSQLPassword",
    description="GizmoSQL server password",
    generate_secret_string=secretsmanager.SecretStringGenerator(
        exclude_punctuation=True,
        password_length=32,
    ),
)

# Reference in user data
user_data.add_commands(
    f"export GIZMOSQL_PASSWORD=$(aws secretsmanager get-secret-value "
    f"--secret-id {password.secret_arn} --query SecretString --output text)"
)

# Grant read access
password.grant_read(instance.role)

Benefits:

Automatic password generation
Rotation support
Audit trail via CloudTrail
No passwords in environment variables

2. CloudWatch Alarms

Add monitoring and alerting:

# CPU utilization alarm
cpu_alarm = cloudwatch.Alarm(
    self,
    "HighCPUAlarm",
    metric=instance.metric_cpu_utilization(),
    threshold=80,
    evaluation_periods=2,
    alarm_description="Alert when CPU exceeds 80%",
)

# Disk space alarm
disk_alarm = cloudwatch.Alarm(
    self,
    "LowDiskSpaceAlarm",
    metric=cloudwatch.Metric(
        namespace="CWAgent",
        metric_name="disk_used_percent",
        dimensions_map={"path": "/"},
    ),
    threshold=85,
    evaluation_periods=2,
    comparison_operator=cloudwatch.ComparisonOperator.GREATER_THAN_THRESHOLD,
)

# Status check alarm
status_alarm = cloudwatch.Alarm(
    self,
    "StatusCheckAlarm",
    metric=instance.metric_status_check_failed(),
    threshold=1,
    evaluation_periods=2,
)

3. Systems Manager Session Manager

Replace SSH with Session Manager:

# Add SSM permissions to role
role.add_managed_policy(
    iam.ManagedPolicy.from_aws_managed_policy_name(
        "AmazonSSMManagedInstanceCore"
    )
)

# No key pair needed
# Connect via: aws ssm start-session --target <instance-id>

Benefits:

No SSH keys to manage
Better audit trail
No open ports required
Session recording

4. AWS Config Rules

Add compliance monitoring:

# Ensure EBS encryption
config.ManagedRule(
    self,
    "EBSEncryptionRule",
    identifier=config.ManagedRuleIdentifiers.EC2_EBS_ENCRYPTION_BY_DEFAULT,
)

# Ensure VPC Flow Logs
config.ManagedRule(
    self,
    "VPCFlowLogsRule",
    identifier=config.ManagedRuleIdentifiers.VPC_FLOW_LOGS_ENABLED,
)

Priority 2: Operational Excellence

5. Unit Tests

Add infrastructure tests:

# tests/test_stack.py
from aws_cdk import assertions as assertions
import pytest

def test_vpc_created():
    app = cdk.App()
    stack = TablesleuthStack(app, "TestStack", config=test_config)
    template = assertions.Template.from_stack(stack)

    template.resource_count_is("AWS::EC2::VPC", 1)
    template.has_resource_properties("AWS::EC2::VPC", {
        "CidrBlock": "10.10.0.0/16"
    })

def test_ebs_encryption_enabled():
    app = cdk.App()
    stack = TablesleuthStack(app, "TestStack", config=test_config)
    template = assertions.Template.from_stack(stack)

    template.has_resource_properties("AWS::EC2::Instance", {
        "BlockDeviceMappings": [
            {
                "Ebs": {
                    "Encrypted": True
                }
            }
        ]
    })

def test_iam_least_privilege():
    app = cdk.App()
    stack = TablesleuthStack(app, "TestStack", config=test_config)
    template = assertions.Template.from_stack(stack)

    # Ensure no wildcard actions
    template.has_resource_properties("AWS::IAM::Role", {
        "Policies": assertions.Match.array_with([
            assertions.Match.object_like({
                "PolicyDocument": {
                    "Statement": assertions.Match.array_with([
                        assertions.Match.object_like({
                            "Action": assertions.Match.not_(
                                assertions.Match.string_like_regexp(".*\\*.*")
                            )
                        })
                    ])
                }
            })
        ])
    })

6. CDK Aspects for Validation

Automated compliance checks:

from aws_cdk import IAspect, Annotations
import jsii

@jsii.implements(IAspect)
class SecurityAspect:
    def visit(self, node):
        # Check S3 buckets have encryption
        if isinstance(node, s3.CfnBucket):
            if not node.bucket_encryption:
                Annotations.of(node).add_error(
                    "S3 buckets must have encryption enabled"
                )

        # Check EC2 instances have encrypted EBS
        if isinstance(node, ec2.CfnInstance):
            if not node.block_device_mappings:
                Annotations.of(node).add_warning(
                    "EC2 instances should have encrypted EBS volumes"
                )

        # Check security groups don't allow 0.0.0.0/0 SSH
        if isinstance(node, ec2.CfnSecurityGroup):
            for rule in node.security_group_ingress or []:
                if rule.get("FromPort") == 22 and rule.get("CidrIp") == "0.0.0.0/0":
                    Annotations.of(node).add_error(
                        "Security groups should not allow SSH from 0.0.0.0/0"
                    )

# Apply to app
Aspects.of(app).add(SecurityAspect())

7. CDK Nag

Automated best practices validation:

from cdk_nag import AwsSolutionsChecks, NagSuppressions

# Apply AWS Solutions checks
Aspects.of(app).add(AwsSolutionsChecks(verbose=True))

# Suppress specific rules with justification
NagSuppressions.add_stack_suppressions(
    stack,
    [
        {
            "id": "AwsSolutions-IAM4",
            "reason": "Managed policies acceptable for this use case"
        }
    ]
)

8. CloudWatch Dashboard

Add monitoring dashboard:

dashboard = cloudwatch.Dashboard(
    self,
    "TablesleuthDashboard",
    dashboard_name=f"tablesleuth-{environment}",
)

dashboard.add_widgets(
    cloudwatch.GraphWidget(
        title="CPU Utilization",
        left=[instance.metric_cpu_utilization()],
    ),
    cloudwatch.GraphWidget(
        title="Network Traffic",
        left=[
            instance.metric_network_in(),
            instance.metric_network_out(),
        ],
    ),
    cloudwatch.SingleValueWidget(
        title="Instance Status",
        metrics=[instance.metric_status_check_failed()],
    ),
)

Priority 3: Architecture Improvements

9. Separate Stateful Stack

Split VPC into separate stack:

# network_stack.py
class NetworkStack(Stack):
    def __init__(self, scope, construct_id, **kwargs):
        super().__init__(scope, construct_id, **kwargs)

        self.vpc = ec2.Vpc(
            self,
            "TablesleuthVPC",
            # ... configuration
        )

# compute_stack.py
class ComputeStack(Stack):
    def __init__(self, scope, construct_id, vpc, **kwargs):
        super().__init__(scope, construct_id, **kwargs)

        self.instance = ec2.Instance(
            self,
            "TablesleuthInstance",
            vpc=vpc,
            # ... configuration
        )

# app.py
network_stack = NetworkStack(app, "TablesleuthNetwork-dev")
compute_stack = ComputeStack(
    app,
    "TablesleuthCompute-dev",
    vpc=network_stack.vpc,
)

Benefits:

Independent lifecycle management
VPC can be shared across stacks
Easier to update compute without touching network

10. Custom AMI with EC2 Image Builder

Pre-bake Python and dependencies:

# Create image pipeline
image_recipe = imagebuilder.CfnImageRecipe(
    self,
    "TablesleuthRecipe",
    name="tablesleuth-recipe",
    version="1.0.0",
    parent_image="ami-xxxxx",  # Amazon Linux 2023
    components=[
        {
            "componentArn": python_component.attr_arn,
        },
        {
            "componentArn": gizmosql_component.attr_arn,
        },
    ],
)

# Use custom AMI
instance = ec2.Instance(
    self,
    "TablesleuthInstance",
    machine_image=ec2.MachineImage.generic_linux({
        "us-east-2": custom_ami_id,
    }),
)

Benefits:

Faster instance startup (~1 minute vs ~5 minutes)
Consistent environment
Easier updates

11. Auto Scaling Group

Support multiple instances:

asg = autoscaling.AutoScalingGroup(
    self,
    "TablesleuthASG",
    vpc=vpc,
    instance_type=instance_type,
    machine_image=machine_image,
    min_capacity=1,
    max_capacity=5,
    desired_capacity=1,
    user_data=user_data,
)

# Add scaling policies
asg.scale_on_cpu_utilization(
    "ScaleOnCPU",
    target_utilization_percent=70,
)

12. Application Load Balancer

Add load balancer for multiple instances:

lb = elbv2.ApplicationLoadBalancer(
    self,
    "TablesleuthLB",
    vpc=vpc,
    internet_facing=True,
)

listener = lb.add_listener(
    "Listener",
    port=443,
    certificates=[certificate],
)

listener.add_targets(
    "Targets",
    port=8080,
    targets=[asg],
    health_check=elbv2.HealthCheck(
        path="/health",
        interval=cdk.Duration.seconds(30),
    ),
)

Priority 4: Advanced Features

13. Backup and Recovery

Automated EBS snapshots:

backup_plan = backup.BackupPlan(
    self,
    "BackupPlan",
    backup_plan_name="tablesleuth-backup",
)

backup_plan.add_rule(
    backup.BackupPlanRule(
        backup_vault=backup_vault,
        rule_name="DailyBackup",
        schedule_expression=events.Schedule.cron(
            hour="2",
            minute="0",
        ),
        delete_after=cdk.Duration.days(7),
    )
)

backup_plan.add_selection(
    "Selection",
    resources=[
        backup.BackupResource.from_ec2_instance(instance),
    ],
)

14. Cost Optimization

Scheduled start/stop:

# Lambda function to stop instance at night
stop_function = lambda_.Function(
    self,
    "StopInstance",
    runtime=lambda_.Runtime.PYTHON_3_11,
    handler="index.handler",
    code=lambda_.Code.from_inline("""
import boto3
def handler(event, context):
    ec2 = boto3.client('ec2')
    ec2.stop_instances(InstanceIds=[event['instance_id']])
    return {'statusCode': 200}
"""),
)

# EventBridge rule to trigger at 6 PM
events.Rule(
    self,
    "StopInstanceRule",
    schedule=events.Schedule.cron(hour="18", minute="0"),
    targets=[targets.LambdaFunction(stop_function)],
)

15. Multi-Region Deployment

Deploy to multiple regions:

# app.py
for region in ["us-east-2", "us-west-2", "eu-west-1"]:
    TablesleuthStack(
        app,
        f"TablesleuthStack-{environment}-{region}",
        config=config,
        env=cdk.Environment(
            account=os.getenv("CDK_DEFAULT_ACCOUNT"),
            region=region,
        ),
    )

16. CI/CD Pipeline

Automated deployment:

pipeline = pipelines.CodePipeline(
    self,
    "Pipeline",
    synth=pipelines.ShellStep(
        "Synth",
        input=pipelines.CodePipelineSource.git_hub(
            "user/repo",
            "main",
        ),
        commands=[
            "npm install -g aws-cdk",
            "pip install -r requirements.txt",
            "cdk synth",
        ],
    ),
)

# Add deployment stages
pipeline.add_stage(
    TablesleuthStage(app, "Dev", config=dev_config)
)
pipeline.add_stage(
    TablesleuthStage(app, "Prod", config=prod_config),
    pre=[
        pipelines.ManualApprovalStep("PromoteToProd"),
    ],
)

Implementation Priority

Phase 1: Security (Next Sprint)

Secrets Manager integration
CloudWatch alarms
Systems Manager Session Manager

Phase 2: Testing (Following Sprint)

Unit tests
CDK Aspects
CDK Nag

Phase 3: Operations (Future)

CloudWatch Dashboard
Separate stateful stack
Backup and recovery

Phase 4: Advanced (Long-term)

Estimated Effort

Improvement	Effort	Impact	Priority
Secrets Manager	2 hours	High	P1
CloudWatch Alarms	1 hour	Medium	P1
Session Manager	1 hour	High	P1
Unit Tests	4 hours	High	P2
CDK Aspects	2 hours	Medium	P2
CDK Nag	1 hour	Medium	P2
Dashboard	2 hours	Low	P3
Separate Stacks	3 hours	Medium	P3
Custom AMI	8 hours	Medium	P4
Auto Scaling	4 hours	Low	P4
Load Balancer	4 hours	Low	P4
CI/CD Pipeline	8 hours	High	P4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Future Improvements

Priority 1: Cost Optimization

1. Spot Instance Support

Priority 2: Security Enhancements

1. AWS Secrets Manager Integration

2. CloudWatch Alarms

3. Systems Manager Session Manager

4. AWS Config Rules

Priority 2: Operational Excellence

5. Unit Tests

6. CDK Aspects for Validation

7. CDK Nag

8. CloudWatch Dashboard

Priority 3: Architecture Improvements

9. Separate Stateful Stack

10. Custom AMI with EC2 Image Builder

11. Auto Scaling Group

12. Application Load Balancer

Priority 4: Advanced Features

13. Backup and Recovery

14. Cost Optimization

15. Multi-Region Deployment

16. CI/CD Pipeline

Implementation Priority

Phase 1: Security (Next Sprint)

Phase 2: Testing (Following Sprint)

Phase 3: Operations (Future)

Phase 4: Advanced (Long-term)

Estimated Effort

References

FilesExpand file tree

FUTURE_IMPROVEMENTS.md

Latest commit

History

FUTURE_IMPROVEMENTS.md

File metadata and controls

Future Improvements

Priority 1: Cost Optimization

1. Spot Instance Support

Priority 2: Security Enhancements

1. AWS Secrets Manager Integration

2. CloudWatch Alarms

3. Systems Manager Session Manager

4. AWS Config Rules

Priority 2: Operational Excellence

5. Unit Tests

6. CDK Aspects for Validation

7. CDK Nag

8. CloudWatch Dashboard

Priority 3: Architecture Improvements

9. Separate Stateful Stack

10. Custom AMI with EC2 Image Builder

11. Auto Scaling Group

12. Application Load Balancer

Priority 4: Advanced Features

13. Backup and Recovery

14. Cost Optimization

15. Multi-Region Deployment

16. CI/CD Pipeline

Implementation Priority

Phase 1: Security (Next Sprint)

Phase 2: Testing (Following Sprint)

Phase 3: Operations (Future)

Phase 4: Advanced (Long-term)

Estimated Effort

References