Skip to content
Snippets Groups Projects
This project is mirrored from https://*****:*****@github.com/PGScatalog/pgs-harmonizer.git. Pull mirroring updated .
  1. Nov 17, 2021
  2. Nov 12, 2021
    • smlmbrt's avatar
      Features: · f8c25dc7
      smlmbrt authored
      - Sort based on chromosome order when X/Y/MT are included
      - Prints warning re: duplicated IDs
      - Default function to drop duplicate variants from the file when splitting by mapped/unmapped
      f8c25dc7
  3. Oct 07, 2021
    • smlmbrt's avatar
      Features: · 319e06d8
      smlmbrt authored
      - Splits matched and unmapped variants into separate files (updates to README to reflect this)
      - Fixes double-quoting in hm_info
      319e06d8
  4. Jul 01, 2021
  5. Apr 16, 2021
  6. Apr 01, 2021
    • smlmbrt's avatar
      First commit - refactored code to split adding chr/pos and VCF lookups into... · c8ef1080
      smlmbrt authored
      First commit - refactored code to split adding chr/pos and VCF lookups into separate functionality. In this way it only has to be run once for HmPOS, and then only once per Cohort VCF. Should speed up the second step by not requiring any duplicate Ensembl lookups. The code should also be slightly faster with vectorized pandas loops/applys
      c8ef1080
  7. Mar 31, 2021
  8. Mar 23, 2021
  9. Mar 05, 2021
  10. Mar 02, 2021
  11. Feb 22, 2021
  12. Feb 11, 2021
    • smlmbrt's avatar
      Distinguish between 2 types of palindromic variants: · 4707b1b4
      smlmbrt authored
      - HM Code 4: This pair of alleles is found in the VCF and not strand
      ambiguous; however, there are other allele(s) at the locus that may cause
      it to be strand-ambiguous (usually in the case of an rsID in ENSMBL not
      being bi-allelic)
      - HM Code 3: This pair of alleles is found in the VCF and is strand
      ambiguous (e.g. A/T, C/G), we assume these are on the forward strand
      4707b1b4
  13. Jan 28, 2021
  14. Jan 21, 2021
  15. Sep 16, 2020
  16. Sep 07, 2020
    • smlmbrt's avatar
      Updates: · 67af8260
      smlmbrt authored
      - Minor formatting updates.
      - Clearer mapping of source build
      - Added option to gzip output
      - Option to skip rsID mapping
      67af8260
  17. Aug 11, 2020
  18. Aug 10, 2020
    • smlmbrt's avatar
      Fixes discovered during tests: · 62ba7752
      smlmbrt authored
      - Assert that the chr_name is always handled as a str
      - Don't check alleles based on ENSEMBL mapping (this is because the allele notation is different than the VCF)
      - Can manually override the scoring file header's Genome Build with the parser/script
      62ba7752
  19. Jul 07, 2020
  20. Jun 24, 2020
    • smlmbrt's avatar
      Code to check whether the scoring file alleles are present in the variant... · 0d65462e
      smlmbrt authored
      Code to check whether the scoring file alleles are present in the variant information obtained from  ENSEMBL Variation (API or var2location) mappings. Added new harmonization codes to reflect alleles that are not consistent with mappings by rsID (e.g. when they do not overlap, or are possibly on the opposite strand).
      0d65462e
  21. Jun 23, 2020
  22. Jun 18, 2020
  23. Apr 08, 2020
  24. Apr 07, 2020
    • smlmbrt's avatar
      Re-write to have new output: · ec348d4f
      smlmbrt authored
      - unharmonizable data is provided in a dictionary at the end of the line
      - where rsIDs are re-mapped we provide the updated one, along with the old ones in the unmapped dictionary
      ec348d4f
  25. Apr 06, 2020
  26. Apr 03, 2020
  27. Apr 02, 2020
  28. Mar 16, 2020