Commits · 20d5e0fc88322711323824673e1e87c9d008494f · ensembl-gh-mirror / ensembl

This project is mirrored from https://:*****@github.com/Ensembl/ensembl.git. Pull mirroring failed 2 months ago.
Repository mirroring has been paused due to too many failed attempts. It can be resumed by a project maintainer or owner.
Last successful update 5 months ago.

Jan 08, 2020

RFAMParser: relax selection criteria on analysis.logic_name · 20d5e0fc

Marek Szuba authored 5 years ago

Due to changes in the structure of the production database, since
release 98 the value of analysis.logic_name corresponding to non-coding
RNA can be either 'ncrna' (which is what we used before) or
'ncrna_species_name'. Change the SQL query used to map RFAM IDs to
Ensembl stable IDs so that it can correctly handle species using the
latter syntax, i.e. human, mouse and zebrafish.

Issue: ENSINT-402

20d5e0fc

Jul 25, 2019

ChecksumParser: add a comment about read-only input paths · 0f02729d

Marek Szuba authored 5 years ago

See ENSCORESW-3197. Have to think about the correct way of specifying
where to put that output file though, especially given the parser
doesn't actually delete it after it is done with it.

0f02729d

ChecksumParser: check if we have opened the temporary file for writing · 2c00b430
Marek Szuba authored 5 years ago

2c00b430

ChecksumParser: increment checksum_xref_id BEFORE use · 41aa26b1

Marek Szuba authored 5 years ago

The initial value of the variable $counter is set to the highest
checksum_xref.checksum_xref_id found if the table in question is not
empty, or 0 if it is. This causes problems if $counter is only
incremented after each use in the input-file loop:
 - for a non-empty table the parser would attempt to re-use an existing
   value of checksum_xref_id for the first entry read from the input
   file. checksum_xref_id is the primary key of checksum_xref so its
   values have to be unique, therefore "LOAD DATA" silently discards the
   offending row;
 - for an empty table we lose one input row as well but it is the SECOND
   rather than the first one. Reason: 0 is not a valid value for
   auto_increment fields in MySQL, resulting in the first row being
   inserted with the first allowed ID value of 1 - which brings us back
   to the previous scenario when "LOAD DATA" attempts to insert the
   second row.

Incrementing $counter before use ought to address both forms of the
problem.

41aa26b1

May 14, 2019
- ENSCORESW-3147 : correctly capture all required fields from file · acb922b0
  Magali Ruffier authored 5 years ago
  
  acb922b0
Jan 02, 2019
- Yearly copyright update · a8c451eb
  Tiago Grego authored 6 years ago
  
  a8c451eb
Oct 25, 2018
- Code review from Tiago · f993d60a
  Wojtek Bazant authored 6 years ago
  
  f993d60a
Oct 18, 2018
- Use references instead of copying · e327bc83
  Wojtek Bazant authored 6 years ago
```
It made recognising incorrect entries needlessly slow
```
  e327bc83
Oct 15, 2018

Fix bug: use return instead of next · 4ab71f7e

Wojtek Bazant authored 6 years ago

return goes back one frame up the stack
next goes back to the closest frame on the stack that supports the
operation (that is close enough in RefSeqGPFFParser alone)
It works unless I subclass create_xrefs, and then my Hive workers die:

Lost control. Check your Runnable for loose 'next' statements that are
not part of a loop       WORKER_ERROR

4ab71f7e

C. elegans specific parsing of RefSeq_dna file · 7d6346f7

Wojtek Bazant authored 6 years ago

- New xref: to a WormBase CDS feature
- Modify WormbaseCElegansRefSeqGPFFParser to serve both kinds of files
- extract a utility method from RefSeqGPFFParser
- xref_config.ini stanza for wormbase_cds
- tests for new functionality

7d6346f7

C. elegans references use WormBase mapping to INSDC protein ids · d66449b6

Wojtek Bazant authored 6 years ago

- maintain naming convention: WormBase specific stuff says Wormbase at the front
- rewrite WormBaseDirectParser
- WormBaseDirectParser populates protein_ids
- superclass method to make dependent protein_ids as parent
- tap into UniProtParser
  + also skip EMBL scaffold ids (we can't reliably assign them)
- tap into RefSeqGPFFParser
  + extract a method
- tests for new stuff
  + add %args to parametrise test_parser

Benefits for RefSeqGPFFParser:
RefSeq proteins have coordinates as part of their identity, so we
can't reliably sequence match them, we will also pick up all paralogs.
This change fixes this spurious mapping.
Benefits for UniProtParser:
Not the above: UniProt entries are not tied to coordinates so all
paralogs map to the same entry. We can handle versioning and updates
a bit better: if WormBase updates an entry and a protein id changes but
UniProt doesn't reflect this yet, with the change we will still pick up
the UniProt entry although we can't sequence match any more.

d66449b6

Oct 01, 2018

Remove artificial dependency on XML::Simple · ddb71bb1

Marek Szuba authored 6 years ago

The only part of the xref-mapping pipeline that depended on the
long-deprecated module XML::Simple was TAIROntologyParser - which did
not actually *use* that module for anything. Get rid of the useless
import, thus making it unnecessary for XML::Simple to be mentioned in
the cpanfile.

ddb71bb1

Sep 07, 2018
- tidy up badly initialised values · 8293dad6
  Magali Ruffier authored 6 years ago
  
  8293dad6
Sep 05, 2018
- ENSCORESW-2850 : optimised for single species run · 805432c6
  Magali Ruffier authored 6 years ago
  
  805432c6
- ENSCORESW-2853 : match Ensembl species casing · ffacd1b1
  Magali Ruffier authored 6 years ago
  
  ffacd1b1
- ENSCORESW-2553 : retrieve correct accession for peptides · 84352fcf
  Magali Ruffier authored 6 years ago
  
  84352fcf
- ENSCORESW-2792 : one UCSC parser per species · bbb4a973
  Magali Ruffier authored 6 years ago
  
  bbb4a973
Sep 04, 2018
- ENSCORESW-2837 : do not import dependent VGNC · 374bef64
  Magali Ruffier authored 6 years ago
  
  374bef64
- ENSCORESW-2792 : add documentation for future reference · 45632852
  Magali Ruffier authored 6 years ago
  
  45632852
Sep 03, 2018

XrefParser::Database uses DBConnection · 93ad6e79

Wojtek Bazant authored 6 years ago

+ XrefParser::Database stores a DBConnection, setters delegate to it
+ New test database, for xrefs
+ Setup to test XrefParsers
+ Tests for WormbaseDirectParser

93ad6e79

Aug 30, 2018
- ENSCORESW-2792 : separate mouse UCSC to avoid clashes · 522c72c1
  Magali Ruffier authored 6 years ago
  
  522c72c1
Aug 13, 2018
- ENSCORESW-2725 : can run for species, taxon or division · ed1d7e9d
  Magali Ruffier authored 6 years ago
  
  ed1d7e9d
Jul 25, 2018
- ENSCORESW-2810 : pass RFAMParser even if no xrefs found · ca72a63e
  Magali Ruffier authored 6 years ago
  
  ca72a63e
Jun 28, 2018
- extend special characters to allow · c4183ecf
  Magali Ruffier authored 6 years ago
  
  c4183ecf
Jun 12, 2018
- ENSCORESW-2723 : skip dodgy descriptions · 802ec0c0
  Magali Ruffier authored 6 years ago
  
  802ec0c0
Apr 09, 2018
- updated empty string to undef to store empty description as NULL rather than empty string · e51ff4d1
  premanand17 authored 6 years ago
  
  e51ff4d1
Apr 06, 2018

ENSCORESW-2553 · 5dfe4299

Matthew Laird authored 6 years ago

- Update RefSeqCoordinateParser such that it fetches the RefSeq accession either from the stable_id or the display_xref in the otherfeatures database.

5dfe4299

Mar 26, 2018

Fix to fix in commit ().... · d84d2f15

Matthew Laird authored 6 years ago

Fix to fix in commit f9dc4756 (ENSCORESW-462). info_text should not have been null'ed by default as the column is NOT NULL.

d84d2f15

Mar 14, 2018
- Updated to handle empty description in xref table - ENSCORESW-462 · f9dc4756
  premanand17 authored 7 years ago
  
  f9dc4756
Feb 08, 2018
- custom download for HGNC · 74c61732
  Magali Ruffier authored 7 years ago
  
  74c61732
- allow https as well as http · e2d103f6
  Magali Ruffier authored 7 years ago
  
  e2d103f6
Feb 07, 2018
- consistent description retrieval · 9738d2e3
  Magali Ruffier authored 7 years ago
  
  9738d2e3
Feb 01, 2018
- clean up Xenbase description · 2a2ffed7
  Magali Ruffier authored 7 years ago
  
  2a2ffed7
Jan 31, 2018
- store link to NCBIGene where available · cce3a8c9
  Magali Ruffier authored 7 years ago
  
  cce3a8c9
- use correct field delimiter · 75e5fd55
  Magali Ruffier authored 7 years ago
  
  75e5fd55
Jan 19, 2018
- simplify source parsing · 8296ce89
  Magali Ruffier authored 7 years ago
  
  8296ce89
- can specify species and taxon directly · e6412b23
  Magali Ruffier authored 7 years ago
  
  e6412b23
- clean up connections · bff35fcf
  Magali Ruffier authored 7 years ago
  
  bff35fcf
Jan 17, 2018
- make dbconnection persistent · 69a6a321
  Magali Ruffier authored 7 years ago
  
  69a6a321
Jan 10, 2018
- use correct variables · 3245c16b
  Magali Ruffier authored 7 years ago
  
  3245c16b