Commits · 20d5e0fc88322711323824673e1e87c9d008494f · ensembl-gh-mirror / ensembl

This project is mirrored from https://:*****@github.com/Ensembl/ensembl.git. Pull mirroring failed 2 months ago.
Repository mirroring has been paused due to too many failed attempts. It can be resumed by a project maintainer or owner.
Last successful update 5 months ago.

Jan 08, 2020

RFAMParser: relax selection criteria on analysis.logic_name · 20d5e0fc

Marek Szuba authored 5 years ago

Due to changes in the structure of the production database, since
release 98 the value of analysis.logic_name corresponding to non-coding
RNA can be either 'ncrna' (which is what we used before) or
'ncrna_species_name'. Change the SQL query used to map RFAM IDs to
Ensembl stable IDs so that it can correctly handle species using the
latter syntax, i.e. human, mouse and zebrafish.

Issue: ENSINT-402

20d5e0fc

Sep 26, 2019
- Enable VGNC xrefs for callithrix_jacchus and papio_anubis · f1829335
  Marek Szuba authored 5 years ago
  
  f1829335
Jul 25, 2019

ChecksumParser: add a comment about read-only input paths · 0f02729d

Marek Szuba authored 5 years ago

See ENSCORESW-3197. Have to think about the correct way of specifying
where to put that output file though, especially given the parser
doesn't actually delete it after it is done with it.

0f02729d

ChecksumParser: check if we have opened the temporary file for writing · 2c00b430
Marek Szuba authored 5 years ago

2c00b430

ChecksumParser: increment checksum_xref_id BEFORE use · 41aa26b1

Marek Szuba authored 5 years ago

The initial value of the variable $counter is set to the highest
checksum_xref.checksum_xref_id found if the table in question is not
empty, or 0 if it is. This causes problems if $counter is only
incremented after each use in the input-file loop:
 - for a non-empty table the parser would attempt to re-use an existing
   value of checksum_xref_id for the first entry read from the input
   file. checksum_xref_id is the primary key of checksum_xref so its
   values have to be unique, therefore "LOAD DATA" silently discards the
   offending row;
 - for an empty table we lose one input row as well but it is the SECOND
   rather than the first one. Reason: 0 is not a valid value for
   auto_increment fields in MySQL, resulting in the first row being
   inserted with the first allowed ID value of 1 - which brings us back
   to the previous scenario when "LOAD DATA" attempts to insert the
   second row.

Incrementing $counter before use ought to address both forms of the
problem.

41aa26b1

Jun 11, 2019
- Enable VGNC xrefs for felis_catus, macaca_mulatta and microcebus_murinus · aa4155d0
  Marek Szuba authored 5 years ago
  
  aa4155d0
May 14, 2019
- ENSCORESW-3147 : correctly capture all required fields from file · acb922b0
  Magali Ruffier authored 5 years ago
  
  acb922b0
Jan 02, 2019
- Yearly copyright update · a8c451eb
  Tiago Grego authored 6 years ago
  
  a8c451eb
Oct 25, 2018
- Code review from Tiago · f993d60a
  Wojtek Bazant authored 6 years ago
  
  f993d60a
Oct 18, 2018
- Use references instead of copying · e327bc83
  Wojtek Bazant authored 6 years ago
```
It made recognising incorrect entries needlessly slow
```
  e327bc83
Oct 15, 2018

Fix bug: use return instead of next · 4ab71f7e

Wojtek Bazant authored 6 years ago

return goes back one frame up the stack
next goes back to the closest frame on the stack that supports the
operation (that is close enough in RefSeqGPFFParser alone)
It works unless I subclass create_xrefs, and then my Hive workers die:

Lost control. Check your Runnable for loose 'next' statements that are
not part of a loop       WORKER_ERROR

4ab71f7e

C. elegans specific parsing of RefSeq_dna file · 7d6346f7

Wojtek Bazant authored 6 years ago

- New xref: to a WormBase CDS feature
- Modify WormbaseCElegansRefSeqGPFFParser to serve both kinds of files
- extract a utility method from RefSeqGPFFParser
- xref_config.ini stanza for wormbase_cds
- tests for new functionality

7d6346f7

C. elegans references use WormBase mapping to INSDC protein ids · d66449b6

Wojtek Bazant authored 6 years ago

- maintain naming convention: WormBase specific stuff says Wormbase at the front
- rewrite WormBaseDirectParser
- WormBaseDirectParser populates protein_ids
- superclass method to make dependent protein_ids as parent
- tap into UniProtParser
  + also skip EMBL scaffold ids (we can't reliably assign them)
- tap into RefSeqGPFFParser
  + extract a method
- tests for new stuff
  + add %args to parametrise test_parser

Benefits for RefSeqGPFFParser:
RefSeq proteins have coordinates as part of their identity, so we
can't reliably sequence match them, we will also pick up all paralogs.
This change fixes this spurious mapping.
Benefits for UniProtParser:
Not the above: UniProt entries are not tied to coordinates so all
paralogs map to the same entry. We can handle versioning and updates
a bit better: if WormBase updates an entry and a protein id changes but
UniProt doesn't reflect this yet, with the change we will still pick up
the UniProt entry although we can't sequence match any more.

d66449b6

Oct 01, 2018

Remove artificial dependency on XML::Simple · ddb71bb1

Marek Szuba authored 6 years ago

The only part of the xref-mapping pipeline that depended on the
long-deprecated module XML::Simple was TAIROntologyParser - which did
not actually *use* that module for anything. Get rid of the useless
import, thus making it unnecessary for XML::Simple to be mentioned in
the cpanfile.

ddb71bb1

Sep 11, 2018
- ENSCORESW-2850 : update usage with all options · f5c96618
  Magali Ruffier authored 6 years ago
  
  f5c96618
Sep 07, 2018
- tidy up badly initialised values · 8293dad6
  Magali Ruffier authored 6 years ago
  
  8293dad6
Sep 05, 2018
- ENSCORESW-2850 : optimised for single species run · 805432c6
  Magali Ruffier authored 6 years ago
  
  805432c6
- ENSCORESW-2853 : match Ensembl species casing · ffacd1b1
  Magali Ruffier authored 6 years ago
  
  ffacd1b1
- ENSCORESW-2805 : tidy up dependent sources · 77e32653
  Magali Ruffier authored 6 years ago
  
  77e32653
- ENSCORESW-2805 : add default priority description · 9bbb4715
  Magali Ruffier authored 6 years ago
  
  9bbb4715
- ENSCORESW-2805 : remove empty fields · 1388eb12
  Magali Ruffier authored 6 years ago
  
  1388eb12
- ENSCORESW-2805 : remove aliases as not useful · 0d09a1a7
  Magali Ruffier authored 6 years ago
  
  0d09a1a7
- ENSCORESW-2805 : re-order species by division · 96491a64
  Magali Ruffier authored 6 years ago
  
  96491a64
- ENSCORESW-2805 : default sources for protists · 3a2da289
  Magali Ruffier authored 6 years ago
  
  3a2da289
- ENSCORESW-2805 : remove leftover metazoa · 65f9bab1
  Magali Ruffier authored 6 years ago
  
  65f9bab1
- ENSCORESW-2805 : default sources for plants · 376814f1
  Magali Ruffier authored 6 years ago
  
  376814f1
- ENSCORESW-2805 : set default sources for fungi · f07b01e7
  Magali Ruffier authored 6 years ago
  
  f07b01e7
- ENSCORESW-2805 : remove some unused sources · 23a12753
  Magali Ruffier authored 6 years ago
  
  23a12753
- ENSCORESW-2805 : simplify RefSeq sources · 548b5c8d
  Magali Ruffier authored 6 years ago
  
  548b5c8d
- ENSCORESW-2805 : cionas are considered vertebrates · 933e55fe
  Magali Ruffier authored 6 years ago
  
  933e55fe
- ENSCORESW-2805 : set generic sources for metazoa · 98dcc546
  Magali Ruffier authored 6 years ago
  
  98dcc546
- ENSCORESW-2805 : remove sources with local files · 7cc78354
  Magali Ruffier authored 6 years ago
  
  7cc78354
- ENSCORESW-2805 : remove some unused sources · 2a85640f
  Magali Ruffier authored 6 years ago
  
  2a85640f
- ENSCORESW-2805 : remove UniGene deprecated source · 03f60def
  Magali Ruffier authored 6 years ago
  
  03f60def
- ENSCORESW-2553 : retrieve correct accession for peptides · 84352fcf
  Magali Ruffier authored 6 years ago
  
  84352fcf
- ENSCORESW-2792 : one UCSC parser per species · bbb4a973
  Magali Ruffier authored 6 years ago
  
  bbb4a973
Sep 04, 2018
- ENSCORESW-2837 : do not import dependent VGNC · 374bef64
  Magali Ruffier authored 6 years ago
  
  374bef64
- ENSCORESW-2744 : remove dependent_xref when removing master · 667e4c42
  Magali Ruffier authored 6 years ago
  
  667e4c42
- ENSCORESW-2792 : add documentation for future reference · 45632852
  Magali Ruffier authored 6 years ago
  
  45632852
Sep 03, 2018

XrefParser::Database uses DBConnection · 93ad6e79

Wojtek Bazant authored 6 years ago

+ XrefParser::Database stores a DBConnection, setters delegate to it
+ New test database, for xrefs
+ Setup to test XrefParsers
+ Tests for WormbaseDirectParser

93ad6e79