This project is mirrored from https://*****:*****@github.com/PGScatalog/pgs-harmonizer.git.
Pull mirroring updated .
- Nov 17, 2021
-
-
smlmbrt authored
-
- Nov 12, 2021
-
-
smlmbrt authored
- Sort based on chromosome order when X/Y/MT are included - Prints warning re: duplicated IDs - Default function to drop duplicate variants from the file when splitting by mapped/unmapped
-
- Oct 07, 2021
-
-
smlmbrt authored
- Splits matched and unmapped variants into separate files (updates to README to reflect this) - Fixes double-quoting in hm_info
-
- Jul 01, 2021
-
-
Laurent Gil authored
-
- Apr 16, 2021
-
-
smlmbrt authored
-
- Apr 01, 2021
-
-
smlmbrt authored
First commit - refactored code to split adding chr/pos and VCF lookups into separate functionality. In this way it only has to be run once for HmPOS, and then only once per Cohort VCF. Should speed up the second step by not requiring any duplicate Ensembl lookups. The code should also be slightly faster with vectorized pandas loops/applys
-
- Mar 31, 2021
-
-
smlmbrt authored
the default behaviour is to indicate that they are not found in the VCF (e.g. Ensembl/dbSNP)
-
- Mar 23, 2021
- Mar 05, 2021
-
-
smlmbrt authored
Other changes: - Cleanup some of the methods names to be shorter - Better handle chr_position as a int, in the read score files method
-
- Mar 02, 2021
- Feb 22, 2021
- Feb 11, 2021
-
-
smlmbrt authored
- HM Code 4: This pair of alleles is found in the VCF and not strand ambiguous; however, there are other allele(s) at the locus that may cause it to be strand-ambiguous (usually in the case of an rsID in ENSMBL not being bi-allelic) - HM Code 3: This pair of alleles is found in the VCF and is strand ambiguous (e.g. A/T, C/G), we assume these are on the forward strand
-
- Jan 28, 2021
-
-
smlmbrt authored
-
- Jan 21, 2021
-
-
smlmbrt authored
Refactored code to have a more fine-grained view of the variant harmonization (e.g. where the information is from, and why it didn't map). This should make it easier to try and rescue variants that have an ambiguous _reference_allele_ using a reference set of genotyped/imputed variants (if available).
-
- Sep 16, 2020
-
-
smlmbrt authored
-
- Sep 07, 2020
-
-
smlmbrt authored
- Minor formatting updates. - Clearer mapping of source build - Added option to gzip output - Option to skip rsID mapping
-
- Aug 11, 2020
-
-
smlmbrt authored
-
- Aug 10, 2020
-
-
smlmbrt authored
- Assert that the chr_name is always handled as a str - Don't check alleles based on ENSEMBL mapping (this is because the allele notation is different than the VCF) - Can manually override the scoring file header's Genome Build with the parser/script
-
- Jul 07, 2020
-
-
smlmbrt authored
-
- Jun 24, 2020
-
-
smlmbrt authored
Code to check whether the scoring file alleles are present in the variant information obtained from ENSEMBL Variation (API or var2location) mappings. Added new harmonization codes to reflect alleles that are not consistent with mappings by rsID (e.g. when they do not overlap, or are possibly on the opposite strand).
-
- Jun 23, 2020
-
-
smlmbrt authored
-
- Jun 18, 2020
-
-
smlmbrt authored
-
- Apr 08, 2020
-
-
smlmbrt authored
-
- Apr 07, 2020
-
-
smlmbrt authored
- unharmonizable data is provided in a dictionary at the end of the line - where rsIDs are re-mapped we provide the updated one, along with the old ones in the unmapped dictionary
-
- Apr 06, 2020
-
-
smlmbrt authored
-
- Apr 03, 2020
-
-
smlmbrt authored
- ENSEMBL ones were not working, and the GWAS Catalog also uses UCSC chains from pyliftover tool
-
- Apr 02, 2020
-
-
smlmbrt authored
-
- Mar 16, 2020
-
-
Sam Lambert authored
-