Skip to content
Snippets Groups Projects

Only annotate

Merged Martin Beracochea requested to merge only-annotate into master

Created by: hoelzer

I want to run the nextflow w/o the virus detection part. Let's say I have a FASTA with viruses sequences that I just want to annotate and not filter for putative virus signals.

This is now implemented in the nextflow via the flag

--onlyannotate

If the flag is set, the detection sub-workflow will be skipped. Renaming and Length filtering will be still applied.

I also added --viphog_version and --meta_version because Guillermo and I are currently testing the different model versions we have. With the flags, I can more easily switch between the different tables (this is more a development thing)

I thought you might be interested in the changes, so please have a look, and if you are fine please feel free to merge this into master.

Merge request reports

Merged by (Mar 4, 2025 3:04am UTC)

Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
550 583 --sankey select the x taxa with highest count for sankey plot, try and error to change plot [default: $params.sankey]
551 584 --chunk WIP: chunk FASTA files into smaller pieces for parallel calculation [default: $params.chunk]
585 --onlyannotate Only annotate the input FASTA (no virus prediction, only contig length filtering) [default: $params.only_annotate]
552 586
553 587 ${c_yellow}Developing:${c_reset}
554 --version define the ViPhOG db version to be used [default: $params.version]
555 v1: no additional bit score filter (--cut_ga not applied, just e-value filtered)
556 v2: --cut_ga, min score used as sequence-specific GA, 3 bit trimmed for domain-specific GA
557 v3: --cut_ga, like v2 but seq-specific GA trimmed by 3 bits if second best score is 'nan'
588 --viphog_version define the ViPhOG db version to be used [default: $params.viphog_version]
589 v1: no additional bit score filter (--cut_ga not applied, just e-value filtered)
590 v2: --cut_ga, min score used as sequence-specific GA, 3 bit trimmed for domain-specific GA
591 v3: --cut_ga, like v2 but seq-specific GA trimmed by 3 bits if second best score is 'nan'
592 --meta_version define the metadata table version to be used [default: $params.meta_version]
593 v1: older version of the meta data table using an outdated NCBI virus taxonomy
594 v2: 2020 version of NCBI virus taxonomy
  • Review: Approved

    Looks good, I'll give this a try.

  • Martin Beracochea approved this merge request

    approved this merge request

  • Martin Beracochea
    Martin Beracochea @mbc started a thread on commit aa3d1155
  • 550 583 --sankey select the x taxa with highest count for sankey plot, try and error to change plot [default: $params.sankey]
    551 584 --chunk WIP: chunk FASTA files into smaller pieces for parallel calculation [default: $params.chunk]
    585 --onlyannotate Only annotate the input FASTA (no virus prediction, only contig length filtering) [default: $params.only_annotate]
    552 586
    553 587 ${c_yellow}Developing:${c_reset}
    554 --version define the ViPhOG db version to be used [default: $params.version]
    555 v1: no additional bit score filter (--cut_ga not applied, just e-value filtered)
    556 v2: --cut_ga, min score used as sequence-specific GA, 3 bit trimmed for domain-specific GA
    557 v3: --cut_ga, like v2 but seq-specific GA trimmed by 3 bits if second best score is 'nan'
    588 --viphog_version define the ViPhOG db version to be used [default: $params.viphog_version]
    589 v1: no additional bit score filter (--cut_ga not applied, just e-value filtered)
    590 v2: --cut_ga, min score used as sequence-specific GA, 3 bit trimmed for domain-specific GA
    591 v3: --cut_ga, like v2 but seq-specific GA trimmed by 3 bits if second best score is 'nan'
    592 --meta_version define the metadata table version to be used [default: $params.meta_version]
    593 v1: older version of the meta data table using an outdated NCBI virus taxonomy
    594 v2: 2020 version of NCBI virus taxonomy
    • Created by: hoelzer

      This is exactly the pain point Guillermo and I are working on. We currently achieve more taxon assignments with the metadata table v1 based on the old NCBI virus taxonomy, although the v2 is more complete. We investigate, if some of the additional assignments with v1 are just false positives. And yes, the idea is to simply use the updated v2 of this table.

      For now, it's easier for me to test v1 vs v2 with this flag :)

  • Created by: hoelzer

    Looks good, I'll give this a try.

    So I can merge this into master? (I don't want to destroy anything ;) )

  • Martin Beracochea
    Martin Beracochea @mbc started a thread on commit aa3d1155
  • 550 583 --sankey select the x taxa with highest count for sankey plot, try and error to change plot [default: $params.sankey]
    551 584 --chunk WIP: chunk FASTA files into smaller pieces for parallel calculation [default: $params.chunk]
    585 --onlyannotate Only annotate the input FASTA (no virus prediction, only contig length filtering) [default: $params.only_annotate]
    552 586
    553 587 ${c_yellow}Developing:${c_reset}
    554 --version define the ViPhOG db version to be used [default: $params.version]
    555 v1: no additional bit score filter (--cut_ga not applied, just e-value filtered)
    556 v2: --cut_ga, min score used as sequence-specific GA, 3 bit trimmed for domain-specific GA
    557 v3: --cut_ga, like v2 but seq-specific GA trimmed by 3 bits if second best score is 'nan'
    588 --viphog_version define the ViPhOG db version to be used [default: $params.viphog_version]
    589 v1: no additional bit score filter (--cut_ga not applied, just e-value filtered)
    590 v2: --cut_ga, min score used as sequence-specific GA, 3 bit trimmed for domain-specific GA
    591 v3: --cut_ga, like v2 but seq-specific GA trimmed by 3 bits if second best score is 'nan'
    592 --meta_version define the metadata table version to be used [default: $params.meta_version]
    593 v1: older version of the meta data table using an outdated NCBI virus taxonomy
    594 v2: 2020 version of NCBI virus taxonomy
  • Looks good, I'll give this a try.

    So I can merge this into master? (I don't want to destroy anything ;) )

    Go ahead and merge it.

  • Merged by: hoelzer at 2020-05-20 08:27:08 UTC

  • Please register or sign in to reply