class TikaServerTextExtractor extends FileTextExtractor (View source)

Enables text extraction of file content via the Tika Rest Server

http://tika.apache.org/1.7/gettingstarted.html

Traits

Provides extensions to this object to integrate it with standard config API methods.

A class that can be instantiated or replaced via DI

Config options

priority int

Tika server is pretty efficient so use it immediately if available

server_endpoint string

Server endpoint

Properties

protected static array $sorted_extractor_classes

Cache of extractor class names, sorted by priority

from  FileTextExtractor
protected TikaRestClient $client
protected array $supportedMimes

Cache of supported mime types

Methods

public static 
config()

Get a configuration accessor for this class. Short hand for Config::inst()->get($this->class, .....).

public
mixed
stat(string $name) deprecated

Get inherited config value

public
mixed
uninherited(string $name)

Gets the uninherited value for the given config option

public
$this
set_stat(string $name, mixed $value) deprecated

Update the config value for a given property

public static 
create(mixed ...$args)

An implementation of the factory method, allows you to create an instance of a class

public static 
singleton(string $class = null)

Creates a class instance by the "singleton" design pattern.

protected static 
array
get_extractor_classes()

Gets the list of prioritised extractor classes

protected static 
get_extractor(string $class)

Get the text file extractor for the given class

public static 
for_file(File|string $file)

Given a File object, decide which extractor instance to use to handle it

protected static 
string
getPathFromFile(File $file)

Some text extractors (like pdftotext) may require a physical file to read from, so write the current file contents to a temp file and return its path

public
bool
isAvailable()

No description

public
bool
supportsExtension(string $extension)

No description

public
bool
supportsMime(string $mime)

No description

public
string
getContent(File|string $file)

Given a File instance, extract the contents as text.

public
getClient()

No description

public
string
getServerEndpoint()

No description

public
float
getVersion()

Get the version of Tika installed, or 0 if not installed

Details

static Config_ForClass config()

Get a configuration accessor for this class. Short hand for Config::inst()->get($this->class, .....).

Return Value

Config_ForClass

mixed stat(string $name) deprecated

deprecated 5.0 Use ->config()->get() instead

Get inherited config value

Parameters

string $name

Return Value

mixed

mixed uninherited(string $name)

Gets the uninherited value for the given config option

Parameters

string $name

Return Value

mixed

$this set_stat(string $name, mixed $value) deprecated

deprecated 5.0 Use ->config()->set() instead

Update the config value for a given property

Parameters

string $name
mixed $value

Return Value

$this

static Injectable create(mixed ...$args)

An implementation of the factory method, allows you to create an instance of a class

This method will defer class substitution to the Injector API, which can be customised via the Config API to declare substitution classes.

This can be called in one of two ways - either calling via the class directly, or calling on Object and passing the class name as the first parameter. The following are equivalent: $list = DataList::create(SiteTree::class); $list = SiteTree::get();

Parameters

mixed ...$args

Return Value

Injectable

static Injectable singleton(string $class = null)

Creates a class instance by the "singleton" design pattern.

It will always return the same instance for this class, which can be used for performance reasons and as a simple way to access instance methods which don't rely on instance data (e.g. the custom SilverStripe static handling).

Parameters

string $class

Optional classname to create, if the called class should not be used

Return Value

Injectable

The singleton instance

static protected array get_extractor_classes()

Gets the list of prioritised extractor classes

Return Value

array

static protected FileTextExtractor get_extractor(string $class)

Get the text file extractor for the given class

Parameters

string $class

Return Value

FileTextExtractor

static FileTextExtractor|null for_file(File|string $file)

Given a File object, decide which extractor instance to use to handle it

Parameters

File|string $file

Return Value

FileTextExtractor|null

static protected string getPathFromFile(File $file)

Some text extractors (like pdftotext) may require a physical file to read from, so write the current file contents to a temp file and return its path

Parameters

File $file

Return Value

string

Exceptions

Exception

bool isAvailable()

No description

Return Value

bool

bool supportsExtension(string $extension)

No description

Parameters

string $extension

Return Value

bool

bool supportsMime(string $mime)

No description

Parameters

string $mime

Return Value

bool

string getContent(File|string $file)

Given a File instance, extract the contents as text.

Parameters

File|string $file

Either the File instance, or a file path for a file to load

Return Value

string

TikaRestClient getClient()

No description

Return Value

TikaRestClient

string getServerEndpoint()

No description

Return Value

string

float getVersion()

Get the version of Tika installed, or 0 if not installed

Return Value

float

version of Tika