You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 39 Next »

Note : Scrubber 3.X is being ported to Apache cTAKES, this is an interim BETA release.



Intended usages


Default configuration

We recommend starting with the default properties and prebuilt train/test models.
The train and test models are anonymized feature sets generated by scrubber runtime (NOT text).

scrubber.properties : all supported config options and features in one place.
Apache UIMA, Apache cTAKES, and WEKA distribution jars are loaded dynamically.



Customize NLP pipeline

Scrubber uses Apache UIMA and Apache cTAKES packages, which together provide the NLP pipeline for lexical parsing and medical concept annotation. Generated feature sets are exported to the SQL database or model file (CSV, ARFF). The UIMA and cTAKES services used by Scrubber are defined and configured using scrubber.properties.

Customize Classifier

Scrubber can use different classifier implementations without recompiling the software.
By default scrubber dynamically loads the popular WEKA C4.5 decision tree classifier with multi-class support.



Software Features



Annotation

  • Annotate word tokens and redact PHI from physician notes
  • cTAKES lexical parsing and medical dictionary annotation
  • WEKA multi-class decision tree classifier (plugin default)
  • Protege UI support for human expert curators (reads output) 
  • Generate feature sets containing lexical properties, medical concept codes, and human defined rules

    Models

  • Prebuilt train and test models can be imported to Weka (default), Matlab, or R
  • (default) Test your local physician notes without retraining
  • (optional) Retrain model using local physician note samples, publications, and medical dictionaries.  

    Classification

  • Distinguish (classify) private patient data from coded medical concepts and commonly used words

    Compare Text

  • Compare lexical properties and distributions of public and private text sources

    How To


    Install / Train / Test / Scrub


    Error rendering macro 'viewdoc'

    com.atlassian.confluence.macro.MacroExecutionException: com.atlassian.confluence.macro.MacroExecutionException: The viewfile macro is unable to locate the attachment "scrubber-3.x-runtime-guide.doc" on this page

    Scrubber Property KEY = VALUE



scrubber.properties



Java Object

ScrubberProperties.java statically binds scrubber.properties at startup

Java Template

TemplateFileProcessor.java IO and token replacement of default configuration files 

Shell scripts

setClassPath.sh sets the java classpath and exports the shell variables

Shell UnitTest

ScrubberPropertiesTest.java demonstrates binding scrubber.properties to shell commands.

  • No labels