Overview 3.X

Overview

3.X is a new vision for the scrubber. As we approached "diminishing returns" for improving REGEX and whitelists/black lists, we have shifted towards a statistical methods approach and learning from large bodies of medical information from publications and UMLS dictionaries.

Venn Diagram!scrubber-3.0-venn.jpg|border=1!

Use Case: Tagging Noun Phrases and UMLS concepts

Precondition:

Training Data
Software: cTakes using features POS tagger & UMLS CUID extractor

Steps:

Block of text is sent to cTakes
cTakes processing
1. start & end position of all POS tags
2. part of speech
  1. Most interested in Nouns because of PHI

Post-condition:

Input document (either medical note OR publication) POS tagged and medical concept CUIDs.

Use Case: Meta-analysis of text

Precondition:

Tagging Noun Phrases
Scubber configured (with or without local dictionary/regex mods)

Steps:

Each "scrubber" implementation procudes Recorder output
1. Passthrough Imp
  1. Regex
  2. Word lists
2. cTakes Impl (OpenNLP)
  1. Noun Phrases
  2. UMLS cuids
Performance evaluation (ROC)
1. Scrubber standalone
2. Scrubber word lists limited by detected noun phrases
3. Scrubber word lists limited by detected noun phrases and non-UMLS concepts

Post-Condition

Text is processed by more than one algorithm "ham vs spam"

Example

Error rendering macro 'viewxls'

com.atlassian.confluence.macro.MacroExecutionException: com.atlassian.confluence.macro.MacroExecutionException: The viewfile macro is unable to locate the attachment "scrubber_classification_table.xls" on this page

Child pages