Text extractor that uses php function strip_tags to get just the text. OK for indexing, not the best for readable text.
Provides extensions to this object to integrate it with standard config API methods.
A class that can be instantiated or replaced via DI
Lower priority because its not the most clever HTML extraction. If there is something better, use it
Cache of extractor class names, sorted by priority
Get a configuration accessor for this class. Short hand for Config::inst()->get($this->class, .....).
Gets the uninherited value for the given config option
An implementation of the factory method, allows you to create an instance of a class
Creates a class instance by the "singleton" design pattern.
Gets the list of prioritised extractor classes
Get the text file extractor for the given class
Given a File object, decide which extractor instance to use to handle it
Some text extractors (like pdftotext) may require a physical file to read from, so write the current file contents to a temp file and return its path