Skip to content
Snippets Groups Projects
This project is mirrored from https://:*****@github.com/Ensembl/ensembl.git. Pull mirroring failed .
Repository mirroring has been paused due to too many failed attempts. It can be resumed by a project maintainer or owner.
Last successful update .
  1. Jan 08, 2020
    • Marek Szuba's avatar
      RFAMParser: relax selection criteria on analysis.logic_name · 20d5e0fc
      Marek Szuba authored
      Due to changes in the structure of the production database, since
      release 98 the value of analysis.logic_name corresponding to non-coding
      RNA can be either 'ncrna' (which is what we used before) or
      'ncrna_species_name'. Change the SQL query used to map RFAM IDs to
      Ensembl stable IDs so that it can correctly handle species using the
      latter syntax, i.e. human, mouse and zebrafish.
      
      Issue: ENSINT-402
      20d5e0fc
  2. Jul 25, 2019
    • Marek Szuba's avatar
      ChecksumParser: add a comment about read-only input paths · 0f02729d
      Marek Szuba authored
      See ENSCORESW-3197. Have to think about the correct way of specifying
      where to put that output file though, especially given the parser
      doesn't actually delete it after it is done with it.
      0f02729d
    • Marek Szuba's avatar
    • Marek Szuba's avatar
      ChecksumParser: increment checksum_xref_id BEFORE use · 41aa26b1
      Marek Szuba authored
      The initial value of the variable $counter is set to the highest
      checksum_xref.checksum_xref_id found if the table in question is not
      empty, or 0 if it is. This causes problems if $counter is only
      incremented after each use in the input-file loop:
       - for a non-empty table the parser would attempt to re-use an existing
         value of checksum_xref_id for the first entry read from the input
         file. checksum_xref_id is the primary key of checksum_xref so its
         values have to be unique, therefore "LOAD DATA" silently discards the
         offending row;
       - for an empty table we lose one input row as well but it is the SECOND
         rather than the first one. Reason: 0 is not a valid value for
         auto_increment fields in MySQL, resulting in the first row being
         inserted with the first allowed ID value of 1 - which brings us back
         to the previous scenario when "LOAD DATA" attempts to insert the
         second row.
      
      Incrementing $counter before use ought to address both forms of the
      problem.
      41aa26b1
  3. May 14, 2019
  4. Jan 02, 2019
  5. Oct 25, 2018
  6. Oct 18, 2018
  7. Oct 15, 2018
    • Wojtek Bazant's avatar
      Fix bug: use return instead of next · 4ab71f7e
      Wojtek Bazant authored
      return goes back one frame up the stack
      next goes back to the closest frame on the stack that supports the
      operation (that is close enough in RefSeqGPFFParser alone)
      It works unless I subclass create_xrefs, and then my Hive workers die:
      
      Lost control. Check your Runnable for loose 'next' statements that are
      not part of a loop       WORKER_ERROR
      4ab71f7e
    • Wojtek Bazant's avatar
      C. elegans specific parsing of RefSeq_dna file · 7d6346f7
      Wojtek Bazant authored
      - New xref: to a WormBase CDS feature
      - Modify WormbaseCElegansRefSeqGPFFParser to serve both kinds of files
      - extract a utility method from RefSeqGPFFParser
      - xref_config.ini stanza for wormbase_cds
      - tests for new functionality
      7d6346f7
    • Wojtek Bazant's avatar
      C. elegans references use WormBase mapping to INSDC protein ids · d66449b6
      Wojtek Bazant authored
      - maintain naming convention: WormBase specific stuff says Wormbase at the front
      - rewrite WormBaseDirectParser
      - WormBaseDirectParser populates protein_ids
      - superclass method to make dependent protein_ids as parent
      - tap into UniProtParser
        + also skip EMBL scaffold ids (we can't reliably assign them)
      - tap into RefSeqGPFFParser
        + extract a method
      - tests for new stuff
        + add %args to parametrise test_parser
      
      Benefits for RefSeqGPFFParser:
      RefSeq proteins have coordinates as part of their identity, so we
      can't reliably sequence match them, we will also pick up all paralogs.
      This change fixes this spurious mapping.
      Benefits for UniProtParser:
      Not the above: UniProt entries are not tied to coordinates so all
      paralogs map to the same entry. We can handle versioning and updates
      a bit better: if WormBase updates an entry and a protein id changes but
      UniProt doesn't reflect this yet, with the change we will still pick up
      the UniProt entry although we can't sequence match any more.
      d66449b6
  8. Oct 01, 2018
    • Marek Szuba's avatar
      Remove artificial dependency on XML::Simple · ddb71bb1
      Marek Szuba authored
      The only part of the xref-mapping pipeline that depended on the
      long-deprecated module XML::Simple was TAIROntologyParser - which did
      not actually *use* that module for anything. Get rid of the useless
      import, thus making it unnecessary for XML::Simple to be mentioned in
      the cpanfile.
      ddb71bb1
  9. Sep 07, 2018
  10. Sep 05, 2018
  11. Sep 04, 2018
  12. Sep 03, 2018
  13. Aug 30, 2018
  14. Aug 13, 2018
  15. Jul 25, 2018
  16. Jun 28, 2018
  17. Jun 12, 2018
  18. Apr 09, 2018
  19. Apr 06, 2018
    • Matthew Laird's avatar
      ENSCORESW-2553 · 5dfe4299
      Matthew Laird authored
      - Update RefSeqCoordinateParser such that it fetches the RefSeq accession either from the stable_id or the display_xref in the otherfeatures database.
      5dfe4299
  20. Mar 26, 2018
  21. Mar 14, 2018
  22. Feb 08, 2018
  23. Feb 07, 2018
  24. Feb 01, 2018
  25. Jan 31, 2018
  26. Jan 19, 2018
  27. Jan 17, 2018
  28. Jan 10, 2018