Bugfix/delint MIM parser
- Oct 24, 2018
-
-
Marek Szuba authored5cbf499b
-
Marek Szuba authored9667e76b
-
- Oct 23, 2018
-
-
Marek Szuba authoredee56a2bd
-
Marek Szuba authoreda8c676c8
-
- Oct 18, 2018
-
-
Marek Szuba authored
Does not reduce complexity of run() by that much according to perlcritic, then again the record loop now fits on a single page. A single page in a 100ish-row terminal - but still.
189f7abb -
Marek Szuba authored
We abort if we cannot extract detailed information from TI so why should missing or malformed TI be any less fatal?
00f0afe9 -
Marek Szuba authored
Having confirmed with Mag the two numbers should always be the same, limit the parsing to TI - it contains everything we need.
9cff6b08 -
Marek Szuba authored
Move NO and TI extraction to a separate function, moreover we now skip ahead to the next record iteration if either of them does not exist instead of introducing two more nesting levels by only proceeding if they are defined.
5a594bfe -
Marek Szuba authored
Instead of a massive if-elsif cascade in which the only things different were the source id passed to add_xref and the counter to increment, create a generic object and choose the right source id / counter using lookup hashes. The processing of two-record insertions takes advantage of both counter selection and the generic object as well, albeit it is not fully automated yet. Moved/removed entries only use generic counters for now.
a2f3d023 -
Marek Szuba authoredf30ca640
-
Marek Szuba authored
Instead of repeating -1, use a descriptively named constant. This constant should be defined in BaseParser but for the time being let us not touch that module.
4cd79359 -
Marek Szuba authored
Previously we simply removed all newline characters throughout the TI field. This resulted in e.g. URINARY TRACT ABNORMALITY AND CRYPTORCHIDISM;; becoming URINARY TRACT ABNORMALITY ANDCRYPTORCHIDISM;; Now we only remove a newline if it is either immediately preceded or immediately followed by ;; (i.e. the separator); otherwise replace \n with a space instead.
952f35e7 -
Marek Szuba authored
The first case just IMHO reads better. The second/third were wasteful because they checked exactly the same condition twice instead of of just wrapping both print statements in the same block.
8f7f6bd1 -
Marek Szuba authored
Although the old ways of handling default values of $dbi and $verbose in principle worked, they both left something to be desired of: - for $verbose, using the bitwise OR was slightly confusing given verbosity levels of the xref pipeline are NOT bitmasks; - for $dbi, we used the frowned-upon (and somewhat noisy) postfix form of unless. Both assignments use the logical Defined-OR (//) operator now.
18cefbda -
Marek Szuba authoredf52fc20f
-
Marek Szuba authored17ca46d8
-
Marek Szuba authored
Without this, if the relevant metadata is not present in the xref database we are working with the parser will make a lot of noise but only actually fail upon trying to add a new xref to the database, i.e. much, much later. Let us fail when the problem actually occurs, shall we? Use croak() rather than 'return 1' because this is a set-up error rather than a data-processing one.
6a628949 -
Marek Szuba authored11633ec3
-
Marek Szuba authoreda5617284
-
Marek Szuba authored
Previously, if the "MOVED TO" match failed we assumed the ^ record in question indicated removal. Let us not be so optimistic any more and actually check this, aborting in the event of removal not being the case either.
0fd02142 -
Marek Szuba authored
Named or not, capture variables have negative impact on performance.
bc51c3a8 -
Marek Szuba authored
No need to reprocess the whole record, $long_desc already contains the part we need to look at to extract the new ID.
8e3f12c1 -
Marek Szuba authored
Could use something about the TI field as well.
f3f7a5ca -
Marek Szuba authored
1. Match $_ explicitly where it used to be matched implicitly. Note that it has NOT been confirmed at this point whether there are any other parts in the code operating implicitly on $_, which is why we still use that variable. Will try changing to a non-magic one later; 2. Add /msx to all regular expressions. Only one of them, the matching of the phrase "MOVED TO", required any modifications; 3. Take advantage of /x to unpack the regexes into multiple lines and add some comments; 4. Have the capture of $long_desc from TI already exclude the leading whitespace, thus doing away with a substitution previously needed to get rid of it.
86baf1a2 -
Marek Szuba authored
Short strings consisting entirely of punctuation marks stand out way more when in q form than when surrounded by another set of punctuation marks.
c6cc6af4 -
Marek Szuba authored
We already handle input through IO::Handle so let's be consistent, furthermore "input_record_separator()" is way more readable than the "$/".
3b013561 -
Marek Szuba authoredaf238eb7
-
Marek Szuba authoreda06d81f1
-
Marek Szuba authored
Those three were never used in the outer scope (i.e. that of the while loop) so let's just keep the inner declarations from the "if TI field" block.
12d90c56 -
Marek Szuba authored
Regexes in this parser are complicated enough even without backslashes all over the place... Try to avoid escapes wherever necessary: - do not escape characters which do not need to be escaped; - for all metacharacters, use single-character bracketed character classes instead; - in the event of a caret appearing in a multi-character bracketed character class, make sure it is not the first character - it loses its special meaning then; - finally, if the character to match is a caret itself (which would require escaping both in a regex string and in a single-character bracketed character class), use the named form (\N{CARET}) instead.
e55cb794 -
Marek Szuba authored
Fixes the mixing of tabs and spaces, trailing whitespace, and many others.
0594a5e5 -
Marek Szuba authoredb1ac21f4
-