Interface ITextExtractor


public interface ITextExtractor
Interface for text extractor implementations. Each ITextExtractor implementation is compatible with a specific list of file extensions. If multiple implementation are found with compatibility on the same file extension an exception is thrown and extraction cannot be proceeded.

An ITextExtractor implementation must be declared using Service Provider Interface ( @see SPI ).

  • Method Details

    • retrieveCompatibleFileExtensions

      Collection<String> retrieveCompatibleFileExtensions()
      Returns a collection of compatible file extensions
      Returns:
      the compatible file extensions collection
    • extract

      void extract(BufferedInputStream in, BufferedOutputStream out) throws IOException
      Extracts the text from the content read from the input stream. The text is written to the output stream.

      This extract method can be called several time on the same ITextExtractor instance with different content to process

      Parameters:
      in - the input stream from which to extract the text
      out - the output stream to write the text result to
      Throws:
      IOException - a potential IOException in case of error