Skip to content

Determining sample-specific anchors and targets | SATC file analylsis #53

@cprovido

Description

@cprovido

I"ve ran the splash pipeline to generate the satc.dump bin files below. I'm wondering what this file represents.
I am assuming this follows the header format of counts | extendor | sample_id where the first 31 nucleotides of the extendor is the anchor (--anchor_len was default) and the remaining nucleotides will be varying targets.

head -n 10 bin15.satc.dump
29 AAAAAAACAAAAAAACAAAAATTAGCCGGGTGAGGTGGTGTGTGCCTGGAGTCTCAGCAACT sample100_r1
1 AAAAAAACAAAAAAACAAAAATTAGCCGGGTGATAGTGGAGCAGGCCTGTAATCCCAGCTAC sample101_r1

Please confirm if this is the format and also some additional questions below.

  1. Are the extendors present in satc.dump files all possible detected sequences before statistical testing of valid extendors?
  2. Is OASIS performed to derive statistically significant extendors?
  3. Are satc.dump files ordered by anchors? Such that every n lines represent the same anchor and lines differ in the matching target or sample?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions