ensembl-gh-mirror issueshttps://gitlab.ebi.ac.uk/groups/ensembl-gh-mirror/-/issues2016-08-10T09:04:20Zhttps://gitlab.ebi.ac.uk/ensembl-gh-mirror/Bio-DB-HTS/-/issues/32Ancillary items from Bio-SamTools2016-08-10T09:04:20ZMarek SzubaAncillary items from Bio-SamTools*Created by: drjsanger*
@keiranmraine created a ticket on the perviously copied repo, so I'm reopening it as it's been lost.
bam2bedgraph is missing, however Bio-DB-HTS also doesn't include the utility perl scripts that use it.
I hav...*Created by: drjsanger*
@keiranmraine created a ticket on the perviously copied repo, so I'm reopening it as it's been lost.
bam2bedgraph is missing, however Bio-DB-HTS also doesn't include the utility perl scripts that use it.
I have written an htslib dependant version of bam2bedgraph **(NB the input style is slightly different)**.
Should you choose to include it you can find it here:
https://github.com/ICGC-TCGA-PanCancer/PCAP-core/blob/dev/c/bam2bedgraph.c
https://gitlab.ebi.ac.uk/ensembl-gh-mirror/Bio-DB-HTS/-/issues/17Memory leak: Bio::DB::HTS::Tabix - very reproducible2016-05-20T16:01:42ZMarek SzubaMemory leak: Bio::DB::HTS::Tabix - very reproducible*Created by: keiranmraine*
Use the following small bit of code to reproduce with a tabix indexed BED file:
```
#!/usr/bin/perl
use strict;
use Bio::DB::HTS::Tabix;
my ($file, $search, $iterations) = @ARGV;
my $brass_np = Bio::DB::HTS...*Created by: keiranmraine*
Use the following small bit of code to reproduce with a tabix indexed BED file:
```
#!/usr/bin/perl
use strict;
use Bio::DB::HTS::Tabix;
my ($file, $search, $iterations) = @ARGV;
my $brass_np = Bio::DB::HTS::Tabix->new(filename => $file);
for(0..$iterations) {
my $iter = $brass_np->query($search);
while(my $record = $iter->next){ }
}
```
Example runs (coordinate must hit records):
```
$ /usr/bin/time -f '%Mk maxresident' perl htsTabix_mem.pl test.bed.gz 1:9551-10140 1
102848k maxresident
$ /usr/bin/time -f '%Mk maxresident' perl htsTabix_mem.pl test.bed.gz 1:9551-10140 10
108128k maxresident
$ /usr/bin/time -f '%Mk maxresident' perl htsTabix_mem.pl test.bed.gz 1:9551-10140 100
160880k maxresident
$ /usr/bin/time -f '%Mk maxresident' perl htsTabix_mem.pl test.bed.gz 1:9551-10140 1000
688240k maxresident
```
Seems a very similar problem to `Bio::DB::HTS::Faidx`
This is pretty critical as we've discovered this in the middle of a pre-release test cycle
https://gitlab.ebi.ac.uk/ensembl-gh-mirror/Bio-DB-HTS/-/issues/15INSTALL.pl - use master.zip instead of git clone2016-08-04T10:50:13ZMarek SzubaINSTALL.pl - use master.zip instead of git clone*Created by: keiranmraine*
Hi,
Have you considered just using the standard 'master.zip' instead of cloning the repository? It should be a much smaller download, especially for htslib, e.g.
```
curl -sSL --retry 10 -o master.zip https...*Created by: keiranmraine*
Hi,
Have you considered just using the standard 'master.zip' instead of cloning the repository? It should be a much smaller download, especially for htslib, e.g.
```
curl -sSL --retry 10 -o master.zip https://github.com/Ensembl/Bio-HTS/archive/master.zip
```
Keiran
https://gitlab.ebi.ac.uk/ensembl-gh-mirror/Bio-DB-HTS/-/issues/13Bio::DB::HTS::Tabix - docs not accurate2016-05-03T12:40:09ZMarek SzubaBio::DB::HTS::Tabix - docs not accurate*Created by: keiranmraine*
The docs need rewording for 'query' to make it very clear that the coordinate format is 1-based for start and stop, unlike the legacy Tabix module (which was 0-based start).
It is also inaccurate to say that ...*Created by: keiranmraine*
The docs need rewording for 'query' to make it very clear that the coordinate format is 1-based for start and stop, unlike the legacy Tabix module (which was 0-based start).
It is also inaccurate to say that to retrieve 1 coordinate that a string of '12:5000000-5000001' is required:
```
$ zcat test.bed.gz
1 9 10 . stuff
1 10 11 . more
$ perl htsTabix.pl test.bed.gz 1 9 10
1 9 10 . stuff
$ perl htsTabix.pl test.bed.gz 1 9 11
1 9 10 . stuff
1 10 11 . more
$ perl htsTabix.pl test.bed.gz 1 10 11
1 9 10 . stuff
1 10 11 . more
$ perl htsTabix.pl test.bed.gz 1 11 11
1 10 11 . more
$ perl htsTabix.pl test.bed.gz 1 12 12
```
script:
```
use strict;
use warnings;
use Bio::DB::HTS::Tabix;
my $file = shift @ARGV;
my $tabix = Bio::DB::HTS::Tabix->new(filename => $file);
my $iter = $tabix->query(sprintf '%s:%d-%d', @ARGV);
while(my $l = $iter->next) {
print $l, "\n";
}
```
https://gitlab.ebi.ac.uk/ensembl-gh-mirror/Bio-DB-HTS/-/issues/8BioPerl version2016-03-18T10:12:14ZMarek SzubaBioPerl version*Created by: andrewyatz*
Just tried to install from CPAN via cpanm and noticed that it installed BioPerl version 1.69. Is there any particular reason why we need such a late version of BioPerl? Not that I've got any issues surrounding i...*Created by: andrewyatz*
Just tried to install from CPAN via cpanm and noticed that it installed BioPerl version 1.69. Is there any particular reason why we need such a late version of BioPerl? Not that I've got any issues surrounding it just could/should it be relaxed?
https://gitlab.ebi.ac.uk/ensembl-gh-mirror/Bio-DB-HTS/-/issues/6Memory leak in Faidx.pm/xs2016-03-15T17:44:36ZMarek SzubaMemory leak in Faidx.pm/xs*Created by: keiranmraine*
Hi,
It looks like there is a mem leak in the Faidx.pm/xs module:
```
$ /usr/bin/time perl snpPanelGcCorrections.pl genome.fa SnpPositions.tsv
Chr Position ...
AUTO_1 1 3023783 ...
0.19user 0.02syst...*Created by: keiranmraine*
Hi,
It looks like there is a mem leak in the Faidx.pm/xs module:
```
$ /usr/bin/time perl snpPanelGcCorrections.pl genome.fa SnpPositions.tsv
Chr Position ...
AUTO_1 1 3023783 ...
0.19user 0.02system 0:00.22elapsed 97%CPU (0avgtext+0avgdata 129104maxresident)k
0inputs+0outputs (0major+10915minor)pagefaults 0swaps
$ /usr/bin/time perl snpPanelGcCorrections.pl genome.fa SnpPositions.tsv
Chr Position ...
AUTO_1 1 3023783 ...
AUTO_2 1 3036178 ...
0.36user 0.03system 0:00.40elapsed 97%CPU (0avgtext+0avgdata 237648maxresident)k
0inputs+0outputs (0major+18918minor)pagefaults 0swaps
$ /usr/bin/time perl snpPanelGcCorrections.pl genome.fa SnpPositions.tsv
Chr Position ...
AUTO_1 1 3023783 ...
AUTO_2 1 3036178 ...
AUTO_3 1 3050521 ...
0.52user 0.06system 0:00.60elapsed 95%CPU (0avgtext+0avgdata 308112maxresident)k
0inputs+0outputs (0major+25291minor)pagefaults 0swaps
```
Each line is the result of 16 requests against faidx around a position increasing up to 10mb
https://gitlab.ebi.ac.uk/ensembl-gh-mirror/Bio-DB-HTS/-/issues/5CPAN docs mangled for Faidx.pm2016-03-14T11:14:26ZMarek SzubaCPAN docs mangled for Faidx.pm*Created by: keiranmraine*
As title
*Created by: keiranmraine*
As title
https://gitlab.ebi.ac.uk/ensembl-gh-mirror/XML-To-Blockly/-/issues/10Unattachable block2016-09-20T14:22:47ZMarek SzubaUnattachable block*Created by: anujk14*
The problem occurs if a ref contains a magic node, and it is called in the following way:
1) First it is called from a non-magic parent like <element>
2) Then, it is called by a magic parent, i.e., a <oneOrMore>/<c...*Created by: anujk14*
The problem occurs if a ref contains a magic node, and it is called in the following way:
1) First it is called from a non-magic parent like <element>
2) Then, it is called by a magic parent, i.e., a <oneOrMore>/<choice> etc.
If we reverse the order, the problem goes away.
Example to check the bug: bug_07_unattachable_block_for_oneOrMore_ref.rng
https://gitlab.ebi.ac.uk/ensembl-gh-mirror/XML-To-Blockly/-/issues/9Incorrect name in for repeated magic block 2016-08-23T22:07:53ZMarek SzubaIncorrect name in for repeated magic block *Created by: anujk14*
A choice/interleave that has a one/zeroOrMore as its child gets assigned the block number as part of its notch label and childrenInfo list. If this one/zeroOrMore turns out to be a repeated child for which a block ...*Created by: anujk14*
A choice/interleave that has a one/zeroOrMore as its child gets assigned the block number as part of its notch label and childrenInfo list. If this one/zeroOrMore turns out to be a repeated child for which a block has already been created, we get an incorrect notch label as well as childrenInfo list (since currently childrenInfo uses pretty names) .
Example to test bug : bug_05_choice_having_oneOrMore_having_wrong_name.rng
https://gitlab.ebi.ac.uk/ensembl-gh-mirror/XML-To-Blockly/-/issues/8Incorrect name in ehive_pipeline_schema2.rng2016-08-23T22:08:23ZMarek SzubaIncorrect name in ehive_pipeline_schema2.rng*Created by: anujk14*
The block that is created for "template" (block 22 in current version) has a field called "template|" as its notch label. However, this notch accepts block_8:hash , block_9:array and block_10:10 as its children. T...*Created by: anujk14*
The block that is created for "template" (block 22 in current version) has a field called "template|" as its notch label. However, this notch accepts block_8:hash , block_9:array and block_10:10 as its children. The name "template|" for such a notch is incorrect.
https://gitlab.ebi.ac.uk/ensembl-gh-mirror/XML-To-Blockly/-/issues/7Independent <text/> nodes not appearing in some cases2016-07-15T10:40:37ZMarek SzubaIndependent <text/> nodes not appearing in some cases*Created by: anujk14*
I don't know we ever encounter a schema like this or not, but the following structure produces an empty oneOrMore block:
oneOrMore -> text/ -> /oneOrMore
Would this be allowed?
*Created by: anujk14*
I don't know we ever encounter a schema like this or not, but the following structure produces an empty oneOrMore block:
oneOrMore -> text/ -> /oneOrMore
Would this be allowed?
https://gitlab.ebi.ac.uk/ensembl-gh-mirror/XML-To-Blockly/-/issues/6Some colours are skipped2016-07-14T23:10:11ZMarek SzubaSome colours are skipped*Created by: muffato*
createBlocks() takes a "colour" argument that defines the colour of the next block that will be created. At each recursion level, this is incremented.
The problem is that not all the recursion levels will create bl...*Created by: muffato*
createBlocks() takes a "colour" argument that defines the colour of the next block that will be created. At each recursion level, this is incremented.
The problem is that not all the recursion levels will create blocks, hence some colours are skipped
Example: open addressBook.rng and check the `this.setColour()` calls: only 0, 90 and 180 are used
https://gitlab.ebi.ac.uk/ensembl-gh-mirror/XML-To-Blockly/-/issues/5"#comment" nodes and number of children2016-06-23T15:10:42ZMarek Szuba"#comment" nodes and number of children*Created by: muffato*
"#comment" nodes are currently kept in the document and ignored since 95966e7dde9ed1ec981ed0aef71ce3d91a55a667
The problem is that there are still included in the number of children of their parents, potentially fo...*Created by: muffato*
"#comment" nodes are currently kept in the document and ignored since 95966e7dde9ed1ec981ed0aef71ce3d91a55a667
The problem is that there are still included in the number of children of their parents, potentially fooling tests like parser.js#L143
They should either be removed like "#text" nodes are, or added as comments to the blocks
https://gitlab.ebi.ac.uk/ensembl-gh-mirror/ensembl-rest/-/issues/367Server down?2019-07-17T14:16:34ZMarek SzubaServer down?*Created by: wdmeeste1*
Link doesn't seem to work anymore:
https://grch37.rest.ensembl.org/*Created by: wdmeeste1*
Link doesn't seem to work anymore:
https://grch37.rest.ensembl.org/https://gitlab.ebi.ac.uk/ensembl-gh-mirror/ensembl-rest/-/issues/366Database error when trying to access data2019-07-04T13:35:54ZMarek SzubaDatabase error when trying to access data*Created by: piotr-gawron*
When I try to access data with identifier (I'm not completely sure if it's existing identifier) I get a database error (or 504 Gateway Time-out).
Here is example of the url that produce error:
https://rest...*Created by: piotr-gawron*
When I try to access data with identifier (I'm not completely sure if it's existing identifier) I get a database error (or 504 Gateway Time-out).
Here is example of the url that produce error:
https://rest.ensembl.org/xrefs/id/ENSG00000000001?content-type=application/json
And here is the error message that I got:
{"error":"Could not connect to database bacteria_22_collection_core_44_97_1 as user ensro using [DBI:mysql:database=bacteria_22_collection_core_44_97_1;host=hh-mysql-eg-rest-web;port=4598] as a locator:DBI connect('database=bacteria_22_collection_core_44_97_1;host=hh-mysql-eg-rest-web;port=4598','ensro',...) failed: Can't connect to MySQL server on 'hh-mysql-eg-rest-web' (99) at /nfs/public/release/ensweb/live/rest/www_97/ensembl/modules/Bio/EnsEMBL/DBSQL/DBConnection.pm line 260."}
https://gitlab.ebi.ac.uk/ensembl-gh-mirror/ensembl-rest/-/issues/353VEP API query returns errors2019-05-13T08:50:36ZMarek SzubaVEP API query returns errors*Created by: sayonidas*
I have tried submitting queries to the VEP API to fetch variant consequences for multiple ids ([POST vep/:species/id](https://rest.ensembl.org/documentation/info/vep_id_post)).
I have a dataset of about ~400...*Created by: sayonidas*
I have tried submitting queries to the VEP API to fetch variant consequences for multiple ids ([POST vep/:species/id](https://rest.ensembl.org/documentation/info/vep_id_post)).
I have a dataset of about ~4000 variants for which I want to get VEP annotations.
The maximum POST size is written as 200, however, I am getting 504 Gateway time out errors if I query more than 50 at a time. Then, after getting two or three queries for 50 variants at a time, I am getting 503 Service Unavailable error.
Please could you advice how I can use the Ensembl API for fetching the data from VEP?
```
class EnsemblRestClient(object):
"""
Rest client for Ensembl API.
"""
def __init__(self, server='http://rest.ensembl.org', reqs_per_sec=15):
self.server = server
self.reqs_per_sec = reqs_per_sec
self.req_count = 0
self.last_req = 0
def perform_rest_action(self, endpoint, hdrs=None, params=None, data=None):
headers = { 'Content-Type': 'application/json', "Accept" : "application/json" }
if params:
endpoint += '?' + urlencode(params)
# check if we need to rate limit ourselves
if self.req_count >= self.reqs_per_sec:
delta = time.time() - self.last_req
if delta < 1:
time.sleep(1 - delta)
self.last_req = time.time()
self.req_count = 0
try:
request = Request(self.server + endpoint, headers=headers, data=data)
response = urlopen(request)
content = response.read()
if content:
data = json.loads(content)
self.req_count += 1
except HTTPError as e:
# check if we are being rate limited by the server
if e.code == 429:
if 'Retry-After' in e.headers:
retry = e.headers['Retry-After']
time.sleep(float(retry))
self.perform_rest_action(endpoint, hdrs, params)
else:
sys.stderr.write('Request failed for {0}: Status code: {1.code} Reason: {1.reason}\n'.format(endpoint, e))
return data
def get_vep(self, snp_ids):
id_dict= {}
id_dict["ids"] = snp_ids
#print(id_dict)
data_dict = json.dumps(id_dict)
#print(data_dict)
params = {'Blosum62': 1, 'Conservation': 1, 'protein': 1, 'domains': 1}
response = self.perform_rest_action(
endpoint='/vep/human/id',
data=data_dict.encode(),
params=params,
)
if response:
return response
return None
client = EnsemblRestClient()
n = 50
snp_chunks = [input_snp_list[i:i + n] for i in range(0, len(input_snp_list), n)]
for snp_list in snp_chunks:
client = EnsemblRestClient()
snp_response_list = client.get_vep(snp_list)
```https://gitlab.ebi.ac.uk/ensembl-gh-mirror/ensembl-rest/-/issues/331wiki example client python32019-01-16T10:10:23ZMarek Szubawiki example client python3*Created by: SimonCouv*
I found the Example Python Client on the wiki page to be written for python2. I made some minor changes for python3, relating to the urllib module.
```python
#!/usr/bin/env python
import sys
import urllib...*Created by: SimonCouv*
I found the Example Python Client on the wiki page to be written for python2. I made some minor changes for python3, relating to the urllib module.
```python
#!/usr/bin/env python
import sys
import urllib
import json
import time
class EnsemblRestClient(object):
def __init__(self, server='http://rest.ensembl.org', reqs_per_sec=15):
self.server = server
self.reqs_per_sec = reqs_per_sec
self.req_count = 0
self.last_req = 0
def perform_rest_action(self, endpoint, hdrs=None, params=None):
if hdrs is None:
hdrs = {}
if 'Content-Type' not in hdrs:
hdrs['Content-Type'] = 'application/json'
if params:
endpoint += '?' + urllib.parse.urlencode(params)
data = None
# check if we need to rate limit ourselves
if self.req_count >= self.reqs_per_sec:
delta = time.time() - self.last_req
if delta < 1:
time.sleep(1 - delta)
self.last_req = time.time()
self.req_count = 0
try:
request = urllib.request.Request(self.server + endpoint, headers=hdrs)
response = urllib.request.urlopen(request)
content = response.read()
if content:
data = json.loads(content)
self.req_count += 1
except urllib.error.HTTPError as e:
# check if we are being rate limited by the server
if e.code == 429:
if 'Retry-After' in e.headers:
retry = e.headers['Retry-After']
time.sleep(float(retry))
self.perform_rest_action(endpoint, hdrs, params)
else:
sys.stderr.write(
'Request failed for {0}: Status code: {1.code} Reason: {1.reason}\n'.format(endpoint, e))
return data
def get_variants(self, species, symbol):
genes = self.perform_rest_action(
endpoint='/xrefs/symbol/{0}/{1}'.format(species, symbol),
params={'object_type': 'gene'}
)
if genes:
stable_id = genes[0]['id']
variants = self.perform_rest_action(
'/overlap/id/{0}'.format(stable_id),
params={'feature': 'variation'}
)
return variants
return None
def run_variants(species, symbol):
client = EnsemblRestClient()
variants = client.get_variants(species, symbol)
if variants:
for v in variants:
print('{seq_region_name}:{start}-{end}:{strand} ==> {id} ({consequence_type})'.format(**v))
if __name__ == '__main__':
if len(sys.argv) == 3:
species, symbol = sys.argv[1:]
else:
species, symbol = 'human', 'BRAF'
run(species, symbol)
```https://gitlab.ebi.ac.uk/ensembl-gh-mirror/ensembl-rest/-/issues/311Is it possible to host stand alone instances of VEP, and not the whole REST s...2018-10-18T09:46:50ZMarek SzubaIs it possible to host stand alone instances of VEP, and not the whole REST server? Only interested in the VEP REST API*Created by: nmousavi*
Asking the same question as in
https://github.com/Ensembl/ensembl-rest-deploy/issues/6*Created by: nmousavi*
Asking the same question as in
https://github.com/Ensembl/ensembl-rest-deploy/issues/6https://gitlab.ebi.ac.uk/ensembl-gh-mirror/ensembl-rest/-/issues/297GET sequence: PB downloading2018-09-05T15:58:15ZMarek SzubaGET sequence: PB downloading*Created by: leapicard*
Hello,
I am using Ensembl-rest to download sequences using their CCDS ids.
However, for some of them, it is impossible to get the sequences, because the "Content-Type text/html is not supported."
The script ...*Created by: leapicard*
Hello,
I am using Ensembl-rest to download sequences using their CCDS ids.
However, for some of them, it is impossible to get the sequences, because the "Content-Type text/html is not supported."
The script (very simple & straightforward) works for the vast majority of the sequences, and I can't understand why I run into this problem.
Would you have any idea?
Best,
Leahttps://gitlab.ebi.ac.uk/ensembl-gh-mirror/ensembl-rest/-/issues/276Potential issue with LD functions2018-09-05T16:01:01ZMarek SzubaPotential issue with LD functions*Created by: hsinyen*
Thank you for this wonderful API, it's been very useful!
I am finding that it seems to "miss" some SNPs when I use _GET ld/:species/:id/:population_nam_e in humans.
For example, rs492400 and rs600057 are 100k...*Created by: hsinyen*
Thank you for this wonderful API, it's been very useful!
I am finding that it seems to "miss" some SNPs when I use _GET ld/:species/:id/:population_nam_e in humans.
For example, rs492400 and rs600057 are 100k bps apart and have high ld (r2=1.0, d_prime= 1.0 in CEU). However, when I try to run _GET ld/:species/:id/:population_name_ on rs492400, rs600057 never pops up in the result, regardless of which 1000G population I use. It also doesn't work if I try to run the function on rs600057.
I'd really appreciate your help!