Only annotate
Created by: hoelzer
I want to run the nextflow w/o the virus detection part. Let's say I have a FASTA with viruses sequences that I just want to annotate and not filter for putative virus signals.
This is now implemented in the nextflow via the flag
--onlyannotate
If the flag is set, the detection sub-workflow will be skipped. Renaming and Length filtering will be still applied.
I also added --viphog_version
and --meta_version
because Guillermo and I are currently testing the different model versions we have. With the flags, I can more easily switch between the different tables (this is more a development thing)
I thought you might be interested in the changes, so please have a look, and if you are fine please feel free to merge this into master.
Merge request reports
Activity
550 583 --sankey select the x taxa with highest count for sankey plot, try and error to change plot [default: $params.sankey] 551 584 --chunk WIP: chunk FASTA files into smaller pieces for parallel calculation [default: $params.chunk] 585 --onlyannotate Only annotate the input FASTA (no virus prediction, only contig length filtering) [default: $params.only_annotate] 552 586 553 587 ${c_yellow}Developing:${c_reset} 554 --version define the ViPhOG db version to be used [default: $params.version] 555 v1: no additional bit score filter (--cut_ga not applied, just e-value filtered) 556 v2: --cut_ga, min score used as sequence-specific GA, 3 bit trimmed for domain-specific GA 557 v3: --cut_ga, like v2 but seq-specific GA trimmed by 3 bits if second best score is 'nan' 588 --viphog_version define the ViPhOG db version to be used [default: $params.viphog_version] 589 v1: no additional bit score filter (--cut_ga not applied, just e-value filtered) 590 v2: --cut_ga, min score used as sequence-specific GA, 3 bit trimmed for domain-specific GA 591 v3: --cut_ga, like v2 but seq-specific GA trimmed by 3 bits if second best score is 'nan' 592 --meta_version define the metadata table version to be used [default: $params.meta_version] 593 v1: older version of the meta data table using an outdated NCBI virus taxonomy 594 v2: 2020 version of NCBI virus taxonomy 550 583 --sankey select the x taxa with highest count for sankey plot, try and error to change plot [default: $params.sankey] 551 584 --chunk WIP: chunk FASTA files into smaller pieces for parallel calculation [default: $params.chunk] 585 --onlyannotate Only annotate the input FASTA (no virus prediction, only contig length filtering) [default: $params.only_annotate] 552 586 553 587 ${c_yellow}Developing:${c_reset} 554 --version define the ViPhOG db version to be used [default: $params.version] 555 v1: no additional bit score filter (--cut_ga not applied, just e-value filtered) 556 v2: --cut_ga, min score used as sequence-specific GA, 3 bit trimmed for domain-specific GA 557 v3: --cut_ga, like v2 but seq-specific GA trimmed by 3 bits if second best score is 'nan' 588 --viphog_version define the ViPhOG db version to be used [default: $params.viphog_version] 589 v1: no additional bit score filter (--cut_ga not applied, just e-value filtered) 590 v2: --cut_ga, min score used as sequence-specific GA, 3 bit trimmed for domain-specific GA 591 v3: --cut_ga, like v2 but seq-specific GA trimmed by 3 bits if second best score is 'nan' 592 --meta_version define the metadata table version to be used [default: $params.meta_version] 593 v1: older version of the meta data table using an outdated NCBI virus taxonomy 594 v2: 2020 version of NCBI virus taxonomy Created by: hoelzer
This is exactly the pain point Guillermo and I are working on. We currently achieve more taxon assignments with the metadata table v1 based on the old NCBI virus taxonomy, although the v2 is more complete. We investigate, if some of the additional assignments with v1 are just false positives. And yes, the idea is to simply use the updated v2 of this table.
For now, it's easier for me to test v1 vs v2 with this flag :)
550 583 --sankey select the x taxa with highest count for sankey plot, try and error to change plot [default: $params.sankey] 551 584 --chunk WIP: chunk FASTA files into smaller pieces for parallel calculation [default: $params.chunk] 585 --onlyannotate Only annotate the input FASTA (no virus prediction, only contig length filtering) [default: $params.only_annotate] 552 586 553 587 ${c_yellow}Developing:${c_reset} 554 --version define the ViPhOG db version to be used [default: $params.version] 555 v1: no additional bit score filter (--cut_ga not applied, just e-value filtered) 556 v2: --cut_ga, min score used as sequence-specific GA, 3 bit trimmed for domain-specific GA 557 v3: --cut_ga, like v2 but seq-specific GA trimmed by 3 bits if second best score is 'nan' 588 --viphog_version define the ViPhOG db version to be used [default: $params.viphog_version] 589 v1: no additional bit score filter (--cut_ga not applied, just e-value filtered) 590 v2: --cut_ga, min score used as sequence-specific GA, 3 bit trimmed for domain-specific GA 591 v3: --cut_ga, like v2 but seq-specific GA trimmed by 3 bits if second best score is 'nan' 592 --meta_version define the metadata table version to be used [default: $params.meta_version] 593 v1: older version of the meta data table using an outdated NCBI virus taxonomy 594 v2: 2020 version of NCBI virus taxonomy