Skip to content

comparison accuracy of "spacedust clustersearch" decreases significantly when the targetdb increases #15

@DUANZIHAOO

Description

@DUANZIHAOO

Hi, I found that when the size of the querydb is much smaller than that of the targetdb, it seems that the performance of Spacedust will decrease significantly.

My 3 faa file:

  1. cluster.faa (a selected set of determined conserved gene clusters, 4 Mb)
  2. target.faa (some selected prokaryotic genomes, 1 Gb)
  3. test_target.faa (subset of target.faa for test, with some conserved gene clusters, 1 Mb)

I set test_target.faa this way to confirm the testing performance of Spacedust.
My code below.

foldseek createdb test_target.faa test_targetDB --prostt5-model weights --threads 4 --gpu 1
spacedust createsetdb test_targetDB test_targetSetDB tmpFolder
Same to cluster.faa and target.faa.
spacedust clustersearch clusterSetDB test_targetSetDB result.tsv tmpFolder --search-mode 2 --num-iterations 2 --threads 4

The result looks great. Some pre-set conserved gene clusters in test_target.faa were detected by spacedust clustersearch.
Then I used the same pipline to run the target.faa.

spacedust clustersearch clusterSetDB targetSetDB result.tsv tmpFolder --search-mode 2 --num-iterations 2 --threads 4

However, the result looks discouraging. Even some gene clusters that were previously compared in subset (test_target) were lost in the entire set (target).

I'm not sure what the reason is. Can you give me some advice? I hope Spacedust continues to improve. Thank!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions