Overview
3.X is a new vision for the scrubber. As we approached "diminishing returns" for improving REGEX and whitelists/black lists, we have shifted towards a statistical methods approach and learning from large bodies of medical information from publications and UMLS dictionaries.
Venn Diagram!scrubber-3.0-venn.jpg|border=1!
Use Case: Tagging Noun Phrases and UMLS concepts
Precondition:
- Training Data
- Software: cTakes using features POS tagger & UMLS CUID extractor
Steps:
- Block of text is sent to cTakes
- cTakes processing
- start & end position of all POS tags
- part of speech
- Most interested in Nouns because of PHI
Post-condition:
- Input document (either medical note OR publication) POS tagged and medical concept CUIDs.
Use Case: Meta-analysis of text
Precondition:
- Tagging Noun Phrases
- Scubber configured (with or without local dictionary/regex mods)
Steps:
- Each "scrubber" implementation procudes Recorder output
- Passthrough Imp
- Regex
- Word lists
- cTakes Impl (OpenNLP)
- Noun Phrases
- UMLS cuids
- Passthrough Imp
- Performance evaluation (ROC)
- Scrubber standalone
- Scrubber word lists limited by detected noun phrases
- Scrubber word lists limited by detected noun phrases and non-UMLS concepts
Post-Condition
- Text is processed by more than one algorithm "ham vs spam"