PDFTextExtractor
class PDFTextExtractor extends FileTextExtractor (View source)
Text extractor that calls pdftotext to do the conversion.
Traits
Provides extensions to this object to integrate it with standard config API methods.
A class that can be instantiated or replaced via DI
Config options
priority | int | Set priority from 0-100. |
from FileTextExtractor |
binary_location | string | Set to bin path this extractor can execute |
|
search_binary_locations | array | Used if binary_location isn't set. |
Properties
protected static | array | $sorted_extractor_classes | Cache of extractor class names, sorted by priority |
from FileTextExtractor |
Methods
Get a configuration accessor for this class. Short hand for Config::inst()->get($this->class, .....).
Gets the uninherited value for the given config option
An implementation of the factory method, allows you to create an instance of a class
Creates a class instance by the "singleton" design pattern.
Gets the list of prioritised extractor classes
Get the text file extractor for the given class
Given a File object, decide which extractor instance to use to handle it
Some text extractors (like pdftotext) may require a physical file to read from, so write the current file contents to a temp file and return its path
Checks if the extractor is supported on the current environment, for example if the correct binaries or libraries are available.
Determine if this extractor supports the given extension.
Details
static Config_ForClass
config()
Get a configuration accessor for this class. Short hand for Config::inst()->get($this->class, .....).
mixed
uninherited(string $name)
Gets the uninherited value for the given config option
static Injectable
create(mixed ...$args)
An implementation of the factory method, allows you to create an instance of a class
This method will defer class substitution to the Injector API, which can be customised via the Config API to declare substitution classes.
This can be called in one of two ways - either calling via the class directly, or calling on Object and passing the class name as the first parameter. The following are equivalent: $list = DataList::create(SiteTree::class); $list = SiteTree::get();
static Injectable
singleton(string $class = null)
Creates a class instance by the "singleton" design pattern.
It will always return the same instance for this class, which can be used for performance reasons and as a simple way to access instance methods which don't rely on instance data (e.g. the custom SilverStripe static handling).
static protected array
get_extractor_classes()
Gets the list of prioritised extractor classes
static protected FileTextExtractor
get_extractor(string $class)
Get the text file extractor for the given class
static FileTextExtractor|null
for_file(File|string $file)
Given a File object, decide which extractor instance to use to handle it
static protected string
getPathFromFile(File $file)
Some text extractors (like pdftotext) may require a physical file to read from, so write the current file contents to a temp file and return its path
bool
isAvailable()
Checks if the extractor is supported on the current environment, for example if the correct binaries or libraries are available.
bool
supportsExtension(string $extension)
Determine if this extractor supports the given extension.
If support is determined by mime/type only, then this should return false.
bool
supportsMime(string $mime)
Determine if this extractor supports the given mime type.
Will only be called if supportsExtension returns false.
string
getContent(File|string $file)
Given a File instance, extract the contents as text.
protected string
bin(string $program = '')
Accessor to get the location of the binary
protected string
getRawOutput(File|string $file)
Invoke pdftotext with the given File object
protected string
cleanupLigatures(string $input)
Removes utf-8 ligatures.