Interface ITextExtractorProcessor


public interface ITextExtractorProcessor
Represents the text extractor processor used to extract text from various contents.

Extraction on one or more file extensions can be disabled using the configuration key "index.system.extractor.text.disable.document.types" by listing the disabled file extensions separated by commas.

  • Method Details

    • extract

      Extracts the text from the content read in the input stream. The text resulting from the processing is written to the output stream. A suitable extractor will be used depending on the file extension indicated in parameter.
      Parameters:
      in - the input stream from which to extract the text
      out - the output stream to write the text result to
      fileExtension - the file extension indicating the type of content
      Throws:
      ExtractorNotFoundException - if no suitable extractor can be used for the type of content
      IOException - a potential IOException in case of error
    • canHandleExtractionFor

      boolean canHandleExtractionFor(String fileExtension)
      Indicates if a file extension can be processed to extract text from
      Parameters:
      fileExtension - the file extension
      Returns:
      true if it is possible to extract text from this type of file otherwise false