HTMLTextExtractor
class HTMLTextExtractor extends FileTextExtractor (View source)
Text extractor that uses php function strip_tags to get just the text. OK for indexing, not the best for readable text.
Properties
public | string | $class | from SS_Object | |
protected | array | $extension_instances | from SS_Object | |
protected | $beforeExtendCallbacks | List of callbacks to call prior to extensions having extend called on them, each grouped by methodName. |
from SS_Object | |
protected | $afterExtendCallbacks | List of callbacks to call after extensions having extend called on them, each grouped by methodName. |
from SS_Object | |
protected static | array | $sorted_extractor_classes | Cache of extractor class names, sorted by priority |
from FileTextExtractor |
Methods
Get a configuration accessor for this class. Short hand for Config::inst()->get($this->class, .....).
Allows user code to hook into Object::extend prior to control being delegated to extensions. Each callback will be reset once called.
Allows user code to hook into Object::extend after control being delegated to extensions. Each callback will be reset once called.
An implementation of the factory method, allows you to create an instance of a class
Creates a class instance by the "singleton" design pattern.
Create an object from a string representation. It treats it as a PHP constructor without the 'new' keyword. It also manages to construct the object without the use of eval().
Parses a class-spec, such as "Versioned('Stage','Live')", as passed to create_from_string().
Similar to Object::create(), except that classes are only overloaded if you set the $strong parameter to TRUE when using Object::useCustomClass()
This class allows you to overload classes with other classes when they are constructed using the factory method Object::create()
If a class has been overloaded, get the class name it has been overloaded with - otherwise return the class name
Get the value of a static property of a class, even in that property is declared protected (but not private), without any inheritance, merging or parent lookup if it doesn't exist on the given class.
Return TRUE if a class has a specified extension.
Add an extension to a specific class.
No description
Attemps to locate and call a method dynamically added to a class at runtime if a default cannot be located
Return the names of all the methods available on this object
Adds any methods from Extension instances attached to this object.
Add all the methods from an object property (which is an Extension) to this object.
Add all the methods from an object property (which is an Extension) to this object.
Add a wrapper method - a method which points to another method with a different name. For example, Thumbnail(x) can be wrapped to generateThumbnail(x)
Add an extra method using raw PHP code passed as a string
Check if this class is an instance of a specific class, or has that class as one of its parents
Calls a method if available on both this object and all applied Extensions, and then attempts to merge all results into an array
Run the given function on all of this object's extensions. Note that this method originally returned void, so if you wanted to return results, you're hosed
Get an extension instance attached to this object by name.
Returns TRUE if this object instance has a specific extension applied in $extension_instances. Extension instances are initialized at constructor time, meaning if you use add_extension() afterwards, the added extension will just be added to new instances of the extended class. Use the static method has_extension() to check if a class (not an instance) has a specific extension.
Get all extension instances for this specific object instance.
Cache the results of an instance method in this object to a file, or if it is already cache return the cached results
Clears the cache for the given cacheToFile call
Loads a cache from the filesystem if a valid on is present and within the specified lifetime
Save a piece of cached data to the file system
Strip a file name of special characters so it is suitable for use as a cache file name
Gets the list of prioritised extractor classes
Get the text file extractor for the given class
Attempt to detect mime type for given file
Checks if the extractor is supported on the current environment, for example if the correct binaries or libraries are available.
Determine if this extractor supports the given extension.
Extracts content from regex, by using strip_tags() combined with regular expressions to remove non-content tags like