KMC multi-samples kmer count

Hi，
I'm currently working on a project that requires k-mer frequency analysis for multiple samples using KMC, with each sample containing approximately 60-70GB of paired-end sequencing data. As I evaluate the best approach for processing this dataset, I was hoping to get some expert advice on the relative merits of two potential strategies: (1) processing all samples simultaneously through a single input file (input_files.txt) vs (2) running KMC individually on each sample and subsequently merging results using kmc_tools. Specifically, I'm particularly interested in understanding how these approaches compare in terms of computational efficiency (memory requirements and processing time), result accuracy, and flexibility for downstream analysis. Any insights or recommendations you could share about optimizing this workflow for large-scale data would be greatly appreciated.
Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KMC multi-samples kmer count #249

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

KMC multi-samples kmer count #249

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions