Conversation
| if identifier.startswith("GCF_"): | ||
| return f"insdc.gcf:{identifier}" |
There was a problem hiding this comment.
let's also add in
if identifier.startswith("GCA_"):
return f"insdc.gca:{identifier}"| @@ -0,0 +1,710 @@ | |||
| import json | |||
There was a problem hiding this comment.
Before you start doing any refactoring, can you add in an integration test that checks the results of parsing the JSON data into all 8 CDM tables? Let me know when you have done that so I can take a look.
16aa4cf to
eab27c9
Compare
06c5508 to
a84db46
Compare
| expected_tables = [ | ||
| "contig", | ||
| "contig_x_contigcollection", | ||
| "contigcollection_x_feature", | ||
| "contigcollection_x_protein", | ||
| "feature", | ||
| "feature_x_protein", | ||
| "identifier", | ||
| "name", | ||
| ] |
There was a problem hiding this comment.
You also need the protein table -- it looks like the parser is not capturing the protein information any more.
5d4a64c to
0301f1b
Compare
| # Load NCBI dataset from NCBI API | ||
| sample_api_response = test_data_dir / "refseq" / "annotation_report.json" | ||
| dataset = json.load(sample_api_response.open()) | ||
|
|
||
| # Run parse function | ||
| parse_annotation_data(spark, [dataset], TEST_NS) |
There was a problem hiding this comment.
You need to load the annotation_report.parsed.json file here and use that to populated expected_df.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #71 +/- ##
===========================================
- Coverage 52.69% 50.56% -2.14%
===========================================
Files 63 66 +3
Lines 3241 3787 +546
===========================================
+ Hits 1708 1915 +207
- Misses 1533 1872 +339
Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
9b5e4d0 to
6039dc6
Compare
eba4b50 to
fe8695b
Compare
No description provided.