Skip to content

KMC multi-samples kmer count #249

@SunJJ230120

Description

@SunJJ230120

Hi,
I'm currently working on a project that requires k-mer frequency analysis for multiple samples using KMC, with each sample containing approximately 60-70GB of paired-end sequencing data. As I evaluate the best approach for processing this dataset, I was hoping to get some expert advice on the relative merits of two potential strategies: (1) processing all samples simultaneously through a single input file (input_files.txt) vs (2) running KMC individually on each sample and subsequently merging results using kmc_tools. Specifically, I'm particularly interested in understanding how these approaches compare in terms of computational efficiency (memory requirements and processing time), result accuracy, and flexibility for downstream analysis. Any insights or recommendations you could share about optimizing this workflow for large-scale data would be greatly appreciated.
Thanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions