I"ve ran the splash pipeline to generate the satc.dump bin files below. I'm wondering what this file represents.
I am assuming this follows the header format of counts | extendor | sample_id where the first 31 nucleotides of the extendor is the anchor (--anchor_len was default) and the remaining nucleotides will be varying targets.
head -n 10 bin15.satc.dump
29 AAAAAAACAAAAAAACAAAAATTAGCCGGGTGAGGTGGTGTGTGCCTGGAGTCTCAGCAACT sample100_r1
1 AAAAAAACAAAAAAACAAAAATTAGCCGGGTGATAGTGGAGCAGGCCTGTAATCCCAGCTAC sample101_r1
Please confirm if this is the format and also some additional questions below.
- Are the extendors present in satc.dump files all possible detected sequences before statistical testing of valid extendors?
- Is OASIS performed to derive statistically significant extendors?
- Are satc.dump files ordered by anchors? Such that every n lines represent the same anchor and lines differ in the matching target or sample?
I"ve ran the splash pipeline to generate the satc.dump bin files below. I'm wondering what this file represents.
I am assuming this follows the header format of
counts | extendor | sample_idwhere the first 31 nucleotides of the extendor is the anchor (--anchor_len was default) and the remaining nucleotides will be varying targets.Please confirm if this is the format and also some additional questions below.