Interface ITextExtractorProcessor
public interface ITextExtractorProcessor
Represents the text extractor processor used to extract text from various contents.
Extraction on one or more file extensions can be disabled using the configuration key "index.system.extractor.text.disable.document.types" by listing the disabled file extensions separated by commas.
-
Method Summary
Modifier and TypeMethodDescriptionboolean
canHandleExtractionFor
(String fileExtension) Indicates if a file extension can be processed to extract text fromvoid
extract
(BufferedInputStream in, BufferedOutputStream out, String fileExtension) Extracts the text from the content read in the input stream.
-
Method Details
-
extract
void extract(BufferedInputStream in, BufferedOutputStream out, String fileExtension) throws ExtractorNotFoundException, IOException Extracts the text from the content read in the input stream. The text resulting from the processing is written to the output stream. A suitable extractor will be used depending on the file extension indicated in parameter.- Parameters:
in
- the input stream from which to extract the textout
- the output stream to write the text result tofileExtension
- the file extension indicating the type of content- Throws:
ExtractorNotFoundException
- if no suitable extractor can be used for the type of contentIOException
- a potentialIOException
in case of error
-
canHandleExtractionFor
Indicates if a file extension can be processed to extract text from- Parameters:
fileExtension
- the file extension- Returns:
- true if it is possible to extract text from this type of file otherwise false
-