class TikaTextExtractor extends FileTextExtractor (View source)

Enables text extraction of file content via the Tika CLI

http://tika.apache.org/1.7/gettingstarted.html

Traits

Provides extensions to this object to integrate it with standard config API methods.

A class that can be instantiated or replaced via DI

Config options

priority int

Set priority from 0-100.

from  FileTextExtractor
output_mode string

Text extraction mode. Defaults to -t (plain text)

Properties

protected static array $sorted_extractor_classes

Cache of extractor class names, sorted by priority

from  FileTextExtractor

Methods

public static 
config()

Get a configuration accessor for this class. Short hand for Config::inst()->get($this->class, .....).

public
mixed
stat(string $name) deprecated

Get inherited config value

public
mixed
uninherited(string $name)

Gets the uninherited value for the given config option

public
$this
set_stat(string $name, mixed $value) deprecated

Update the config value for a given property

public static 
create(mixed ...$args)

An implementation of the factory method, allows you to create an instance of a class

public static 
singleton(string $class = null)

Creates a class instance by the "singleton" design pattern.

protected static 
array
get_extractor_classes()

Gets the list of prioritised extractor classes

protected static 
get_extractor(string $class)

Get the text file extractor for the given class

public static 
for_file(File|string $file)

Given a File object, decide which extractor instance to use to handle it

protected static 
string
getPathFromFile(File $file)

Some text extractors (like pdftotext) may require a physical file to read from, so write the current file contents to a temp file and return its path

public
bool
isAvailable()

No description

public
bool
supportsExtension(string $extension)

No description

public
bool
supportsMime(string $mime)

No description

public
string
getContent(File|string $file)

Given a File instance, extract the contents as text.

public
mixed
getVersion()

Get the version of tika installed, or 0 if not installed

protected
int
runShell(string $command, string $stdout = '', string $stderr = '', string $input = '')

Runs an arbitrary and safely escaped shell command

Details

static Config_ForClass config()

Get a configuration accessor for this class. Short hand for Config::inst()->get($this->class, .....).

Return Value

Config_ForClass

mixed stat(string $name) deprecated

deprecated 5.0 Use ->config()->get() instead

Get inherited config value

Parameters

string $name

Return Value

mixed

mixed uninherited(string $name)

Gets the uninherited value for the given config option

Parameters

string $name

Return Value

mixed

$this set_stat(string $name, mixed $value) deprecated

deprecated 5.0 Use ->config()->set() instead

Update the config value for a given property

Parameters

string $name
mixed $value

Return Value

$this

static Injectable create(mixed ...$args)

An implementation of the factory method, allows you to create an instance of a class

This method will defer class substitution to the Injector API, which can be customised via the Config API to declare substitution classes.

This can be called in one of two ways - either calling via the class directly, or calling on Object and passing the class name as the first parameter. The following are equivalent: $list = DataList::create(SiteTree::class); $list = SiteTree::get();

Parameters

mixed ...$args

Return Value

Injectable

static Injectable singleton(string $class = null)

Creates a class instance by the "singleton" design pattern.

It will always return the same instance for this class, which can be used for performance reasons and as a simple way to access instance methods which don't rely on instance data (e.g. the custom SilverStripe static handling).

Parameters

string $class

Optional classname to create, if the called class should not be used

Return Value

Injectable

The singleton instance

static protected array get_extractor_classes()

Gets the list of prioritised extractor classes

Return Value

array

static protected FileTextExtractor get_extractor(string $class)

Get the text file extractor for the given class

Parameters

string $class

Return Value

FileTextExtractor

static FileTextExtractor|null for_file(File|string $file)

Given a File object, decide which extractor instance to use to handle it

Parameters

File|string $file

Return Value

FileTextExtractor|null

static protected string getPathFromFile(File $file)

Some text extractors (like pdftotext) may require a physical file to read from, so write the current file contents to a temp file and return its path

Parameters

File $file

Return Value

string

Exceptions

Exception

bool isAvailable()

No description

Return Value

bool

bool supportsExtension(string $extension)

No description

Parameters

string $extension

Return Value

bool

bool supportsMime(string $mime)

No description

Parameters

string $mime

Return Value

bool

string getContent(File|string $file)

Given a File instance, extract the contents as text.

Parameters

File|string $file

Either the File instance, or a file path for a file to load

Return Value

string

mixed getVersion()

Get the version of tika installed, or 0 if not installed

Return Value

mixed

float | int The version of tika

protected int runShell(string $command, string $stdout = '', string $stderr = '', string $input = '')

Runs an arbitrary and safely escaped shell command

Parameters

string $command

Full command including arguments

string $stdout

Standand output

string $stderr

Standard error

string $input

Content to pass via standard input

Return Value

int

Exit code. 0 is success