Skip to content

Performance regression when reading scoring files during matching

Created by: nebfield

https://github.com/PGScatalog/pygscatalog/blob/e88b41f6a6bbe876644af9cdb54a30da4eab1702/pgscatalog.match/src/pgscatalog/match/lib/_arrow.py#L6-L21

We used to parse CSV files with polars and save IPC files, it's super fast 🚀

Streaming pyarrow batches is terribly slow in comparison (when working on UK Biobank). I think i was worried about memory usage when I wrote this.

This might also drop the pyarrow dependency in pgscatalog.core