Supported File Formats

Supported File Formats

This is a list of file formats that Fess has been verified to crawl and search.

  • text (txt)

  • XML (xml, xhtml, mm, etc.)

  • HTML (html, htm)

  • MS Office (doc, xls, ppt, docx, xlsx, pptx, etc.)

  • PDF (pdf, etc.)

  • Source Code (js, c, h, java, etc.)

  • Compressed Files (gz, tar, zip, etc.)

  • Rich text (rtf)

  • ePub

  • Audio/Image/Video (metadata extraction)

  • mbox

  • ai files (PDF compatible)

Fess extracts text from various types of unknown files. Files not listed above can also be crawled and searched. If you have files you would like to verify, please submit a pull request to the Test Data Repository for Search Systems.

Other

The following files are supported through commercial support:

  • Ichitaro

  • OASYS for Windows

  • DocuWorks

  • AutoCAD