SilverStripe\TextExtraction\Extractor\HTMLTextExtractor

class HTMLTextExtractor extends FileTextExtractor (View source)

Text extractor that uses php function strip_tags to get just the text. OK for indexing, not the best for readable text.

Traits

Configurable

Provides extensions to this object to integrate it with standard config API methods.

Injectable

A class that can be instantiated or replaced via DI

Config options

priority

int

Lower priority because its not the most clever HTML extraction. If there is something better, use it

Properties

protected static

array

$sorted_extractor_classes

Cache of extractor class names, sorted by priority

from FileTextExtractor

Methods

public static

Config_ForClass

config()

Get a configuration accessor for this class. Short hand for Config::inst()->get($this->class, .....).

from Configurable

public

mixed

stat(string $name) deprecated

Get inherited config value

from Configurable

public

mixed

uninherited(string $name)

Gets the uninherited value for the given config option

from Configurable

public

$this

set_stat(string $name, mixed $value) deprecated

Update the config value for a given property

from Configurable

public static

Injectable

create(mixed ...$args)

An implementation of the factory method, allows you to create an instance of a class

from Injectable

public static

Injectable

singleton(string $class = null)

Creates a class instance by the "singleton" design pattern.

from Injectable

protected static

array

get_extractor_classes()

Gets the list of prioritised extractor classes

from FileTextExtractor

protected static

FileTextExtractor

get_extractor(string $class)

Get the text file extractor for the given class

from FileTextExtractor

public static

FileTextExtractor|null

for_file(File|string $file)

Given a File object, decide which extractor instance to use to handle it

from FileTextExtractor

protected static

string

getPathFromFile(File $file)

Some text extractors (like pdftotext) may require a physical file to read from, so write the current file contents to a temp file and return its path

from FileTextExtractor

public

bool

isAvailable()

No description

public

bool

supportsExtension(string $extension)

No description

public

bool

supportsMime(string $mime)

No description

public

string

getContent(File|string $file)

Extracts content from regex, by using strip_tags() combined with regular expressions to remove non-content tags like