Skip to content
Snippets Groups Projects
This project is mirrored from https://:*****@github.com/Ensembl/ensembl.git. Pull mirroring failed .
Repository mirroring has been paused due to too many failed attempts. It can be resumed by a project maintainer or owner.
Last successful update .
  1. Jan 08, 2020
    • Marek Szuba's avatar
      RFAMParser: relax selection criteria on analysis.logic_name · 20d5e0fc
      Marek Szuba authored
      Due to changes in the structure of the production database, since
      release 98 the value of analysis.logic_name corresponding to non-coding
      RNA can be either 'ncrna' (which is what we used before) or
      'ncrna_species_name'. Change the SQL query used to map RFAM IDs to
      Ensembl stable IDs so that it can correctly handle species using the
      latter syntax, i.e. human, mouse and zebrafish.
      
      Issue: ENSINT-402
      20d5e0fc
  2. Jan 02, 2020
  3. Sep 26, 2019
  4. Jul 25, 2019
    • Marek Szuba's avatar
      ChecksumParser: add a comment about read-only input paths · 0f02729d
      Marek Szuba authored
      See ENSCORESW-3197. Have to think about the correct way of specifying
      where to put that output file though, especially given the parser
      doesn't actually delete it after it is done with it.
      0f02729d
    • Marek Szuba's avatar
    • Marek Szuba's avatar
      ChecksumParser: increment checksum_xref_id BEFORE use · 41aa26b1
      Marek Szuba authored
      The initial value of the variable $counter is set to the highest
      checksum_xref.checksum_xref_id found if the table in question is not
      empty, or 0 if it is. This causes problems if $counter is only
      incremented after each use in the input-file loop:
       - for a non-empty table the parser would attempt to re-use an existing
         value of checksum_xref_id for the first entry read from the input
         file. checksum_xref_id is the primary key of checksum_xref so its
         values have to be unique, therefore "LOAD DATA" silently discards the
         offending row;
       - for an empty table we lose one input row as well but it is the SECOND
         rather than the first one. Reason: 0 is not a valid value for
         auto_increment fields in MySQL, resulting in the first row being
         inserted with the first allowed ID value of 1 - which brings us back
         to the previous scenario when "LOAD DATA" attempts to insert the
         second row.
      
      Incrementing $counter before use ought to address both forms of the
      problem.
      41aa26b1
  5. Jul 18, 2019
  6. Jun 17, 2019
  7. Jun 11, 2019
  8. May 14, 2019
  9. Mar 18, 2019
    • Marek Szuba's avatar
      stable_id_lookup: extract RNAProduct stable IDs from core databases · 502dcfff
      Marek Szuba authored
      Uses the same type of SQL SELECT queries as Translation, which makes
      sense given how similar they are.
      
      Tested on test-genome-DBs/homo_sapiens/core, works without errors.
      
      Aborts upon encountering a core database missing the 'rnaproduct' table
      but that is in my humble opinion very much desired behaviour, as it could
      indicate incomplete application of schema patches in the release this
      will be included in.
      502dcfff
  10. Feb 21, 2019
  11. Feb 13, 2019
  12. Feb 12, 2019
  13. Jan 02, 2019
  14. Dec 19, 2018
  15. Dec 17, 2018
  16. Dec 07, 2018
  17. Dec 06, 2018
  18. Oct 25, 2018
  19. Oct 18, 2018
  20. Oct 15, 2018
    • Wojtek Bazant's avatar
      Fix bug: use return instead of next · 4ab71f7e
      Wojtek Bazant authored
      return goes back one frame up the stack
      next goes back to the closest frame on the stack that supports the
      operation (that is close enough in RefSeqGPFFParser alone)
      It works unless I subclass create_xrefs, and then my Hive workers die:
      
      Lost control. Check your Runnable for loose 'next' statements that are
      not part of a loop       WORKER_ERROR
      4ab71f7e
    • Wojtek Bazant's avatar
      C. elegans specific parsing of RefSeq_dna file · 7d6346f7
      Wojtek Bazant authored
      - New xref: to a WormBase CDS feature
      - Modify WormbaseCElegansRefSeqGPFFParser to serve both kinds of files
      - extract a utility method from RefSeqGPFFParser
      - xref_config.ini stanza for wormbase_cds
      - tests for new functionality
      7d6346f7
    • Wojtek Bazant's avatar
      C. elegans references use WormBase mapping to INSDC protein ids · d66449b6
      Wojtek Bazant authored
      - maintain naming convention: WormBase specific stuff says Wormbase at the front
      - rewrite WormBaseDirectParser
      - WormBaseDirectParser populates protein_ids
      - superclass method to make dependent protein_ids as parent
      - tap into UniProtParser
        + also skip EMBL scaffold ids (we can't reliably assign them)
      - tap into RefSeqGPFFParser
        + extract a method
      - tests for new stuff
        + add %args to parametrise test_parser
      
      Benefits for RefSeqGPFFParser:
      RefSeq proteins have coordinates as part of their identity, so we
      can't reliably sequence match them, we will also pick up all paralogs.
      This change fixes this spurious mapping.
      Benefits for UniProtParser:
      Not the above: UniProt entries are not tied to coordinates so all
      paralogs map to the same entry. We can handle versioning and updates
      a bit better: if WormBase updates an entry and a protein id changes but
      UniProt doesn't reflect this yet, with the change we will still pick up
      the UniProt entry although we can't sequence match any more.
      d66449b6
  21. Oct 01, 2018
    • Marek Szuba's avatar
      Remove artificial dependency on XML::Simple · ddb71bb1
      Marek Szuba authored
      The only part of the xref-mapping pipeline that depended on the
      long-deprecated module XML::Simple was TAIROntologyParser - which did
      not actually *use* that module for anything. Get rid of the useless
      import, thus making it unnecessary for XML::Simple to be mentioned in
      the cpanfile.
      ddb71bb1
  22. Sep 25, 2018
    • Marek Szuba's avatar
      create_release_tasks.pl: distinguish between submitter and assignee · 98f8ba25
      Marek Szuba authored
      Useful under the circumstance when the person running the script is not
      in fact the RelCo for the next release, as it has already been the case
      before. Saves one having to manually reassign all the newly created
      tickets to the actual RelCo. Conversely, if it is the same person just
      omit the new argument and the RelCo user name will be used to connect to
      JIRA.
      
      Note that the function validating user names is only applied to RelCo
      ones. This is intentional, JIRA itself will complain if the submitter is
      not authorised to create ENSCORESW tickets.
      98f8ba25
  23. Sep 24, 2018
  24. Sep 11, 2018
  25. Sep 07, 2018
  26. Sep 05, 2018