Conversation
…I since these are generally never needed outside that context. feat: fully functional zero-copy splicing mechanics. fix: bug in rev and rev comp causing garbage output.
Ok got it, that makes sense. Yeah I guess documentation is the best way to go for now. Thanks for the thorough explanation! |
|
Hey Brian, this looks good! Thanks to @BradBalderson I just caught and fixed a bug in the reverse complement function. I think it would make sense that ~half of the sequences are way off with that bug. Can you try again with the latest commits? |
Without VCF normalisationUsing the VCF directly to create the GVL db, without any normalisation with Nucleotide-levelMUCH better seq sim for nuc level. Amino acid-levelUnfortunately, still lots of stop codons: And actually the AA seq sim is a bit lower than before: With VCF normalisation[placeholder] |
for more information, see https://pre-commit.ci







Closes #24.
docs: fix version format to be vX.Y.Z
feat: initial prototype for splicing.
Splice regions together
Allow different definition of an overlapping variant to be fully exonic and not overlapping with splice sites a la Haplosaurus.
Update Dataset API (or maybe a new class) to reflect different shape and definition of a row.
Tests against Haplosaurus on 1kGP chr22 @bschilder
Performance issues, possibly from slow RC