A repository of test data for verifying whether search systems can crawl and index various file types. Feel free to submit a pull request if you have files you want to test.
fess-testdata/
├── files/ # Test data files
├── docker/ # Docker configurations for crawling environments
├── tools/ # Utility scripts
└── build/ # Build-related files
Add the prefix "test" and use the appropriate file extension.
Include the text "Lorem ipsum. (ロレム・イプサム) 吾輩は猫である。" in the content section of the file. Do not include this text in metadata sections (to clearly identify where content was extracted from).
Place files in the appropriate category directory under files/.
| Type | Location |
|---|---|
| text | files/text/test_utf8.txt |
| HTML | files/html/test.html |
| HTML | files/html/test_utf8.html |
| HTML | files/html/test_sjis.html |
| HTML | files/html/test_hankaku.html |
| HTML | files/html/test_nocharset.html |
| XML | files/xml/test_utf8.xml |
| XML | files/xml/test_sjis.xml |
| XML | files/xml/test_entity.xml |
| XML | files/xml/test.mm |
| files/pdf/test.pdf | |
| files/pdf/test.ps | |
| Markdown | files/markdown/test.md |
| AsciiDoc | files/markdown/test.adoc |
| reStructuredtext | files/markdown/test.rst |
| LaTeX | files/latex/test.tex |
| EPUB | files/ebook/test.epub |
| CHM | files/help/test.chm |
| Type | Location |
|---|---|
| MS Word | files/msoffice/test.doc |
| MS Word | files/msoffice/test.docx |
| MS Excel | files/msoffice/test.xls |
| MS Excel | files/msoffice/test.xlsx |
| MS PowerPoint | files/msoffice/test.ppt |
| MS PowerPoint | files/msoffice/test.pptx |
| MS Visio | files/msoffice/test.vsdx |
| MS Project | files/msoffice/test.mpp |
| MS Publisher | files/msoffice/test.pub |
| RTF | files/msoffice/test.rtf |
| OpenDocument text | files/opendocument/test.odt |
| OpenDocument Spreadsheet | files/opendocument/test.ods |
| OpenDocument Presentation | files/opendocument/test.odp |
| Apple Pages | files/iwork/test.pages |
| Apple Numbers | files/iwork/test.numbers |
| Apple Keynote | files/iwork/test.key |
| Lotus 1-2-3 | files/lotus/test.123 |
| Hancom | files/hancom/test.hwp |
| Ichitaro | files/ichitaro/ |
| DocuWorks | files/docuworks/ |
| Type | Location |
|---|---|
| MS Access | files/database/test.accdb |
| MS Access (Legacy) | files/database/test.mdb |
| FileMaker | files/database/test.fmp12 |
| dBase | files/database/test.dbf |
| Type | Location |
|---|---|
| PNG | files/images/test.png |
| JPEG | files/images/test.jpg |
| GIF | files/images/test.gif |
| BMP | files/images/test.bmp |
| TIFF | files/images/test.tiff |
| SVG | files/images/test.svg |
| MP3 | files/media/test.mp3 |
| Type | Location |
|---|---|
| C | files/source_code/test.c |
| C++ | files/source_code/test.cpp |
| Java | files/source_code/test.java |
| JavaScript | files/source_code/test.js |
| TypeScript | files/source_code/test.ts |
| Python | files/source_code/test.py |
| Ruby | files/source_code/test.rb |
| Go | files/source_code/test.go |
| Rust | files/source_code/test.rs |
| Swift | files/source_code/test.swift |
| Kotlin | files/source_code/test.kt |
| PHP | files/source_code/test.php |
| SQL | files/source_code/test.sql |
| CSS | files/source_code/test.css |
| SCSS | files/source_code/test.scss |
| Type | Location |
|---|---|
| Bash | files/scripts/test.bash |
| Perl | files/scripts/test.pl |
| Lua | files/scripts/test.lua |
| PowerShell | files/scripts/test.ps1 |
| JSON | files/config/test.json |
| YAML | files/config/test.yaml |
| TOML | files/config/test.toml |
| INI | files/config/test.ini |
| Properties | files/config/test.properties |
| Type | Location |
|---|---|
| ZIP | files/archive/test.zip |
| TAR | files/archive/test.tar |
| TAR.GZ | files/archive/test.tar.gz |
| BZ2 | files/archive/test.txt.bz2 |
| XZ | files/archive/test.txt.xz |
| Type | Location |
|---|---|
| EML | files/email/test.eml |
| MSG | files/email/test.msg |
| Type | Location |
|---|---|
| CSV | files/data/test.csv |
| TSV | files/data/test.tsv |
| GeoJSON | files/geodata/test.geojson |
| KML | files/geodata/test.kml |
| Jupyter Notebook | files/notebooks/test.ipynb |
| Log | files/logs/test.log |
| Type | Location |
|---|---|
| Adobe Illustrator | files/ai/test.ai |
| AutoCAD | files/cad/ |
| Font (TTF) | files/fonts/test.ttf |
| ISO | files/disk-images/test.iso |
| Patch | files/patches/test.patch |
| Diff | files/patches/test.diff |
| Old-style Characters | files/other/old_style.txt |
The docker/ directory contains Docker Compose configurations for setting up various data source crawling environments.
| Environment | Description |
|---|---|
| basic | Basic Authentication |
| digest | Digest Authentication |
| ldap | LDAP |
| ftp | FTP |
| samba | Samba |
| webdav | WebDAV |
| mysql | MySQL |
| postgresql | PostgreSQL |
| mariadb | MariaDB |
| oracle | Oracle |
| mssql | SQL Server |
| db2 | DB2 |
| mongodb | MongoDB |
| elasticsearch | Elasticsearch |
| solr | Solr |
| redis | Redis |
| cassandra | Cassandra |
| couchdb | CouchDB |
| minio | MinIO (S3 Compatible) |
| gitlab | GitLab |
| gitea | Gitea |
| redmine | Redmine |
| wordpress | WordPress |
| bugzilla | Bugzilla |
| mantis | MantisBT |
| taiga | Taiga |
| keycloak | Keycloak |
| authentik | Authentik |
The tools/ directory contains utility scripts for data store operations.
| Script | Description |
|---|---|
| csvdatastore.sh | CSV Data Store |
| csvlistdatastore.sh | CSV List Data Store |
| csvgeodatastore.sh | CSV Geo Data Store |
| esdatastore.sh | Elasticsearch Data Store |
| eslistdatastore.sh | Elasticsearch List Data Store |
| create_roledata.sh | Role Data Creation |
| encrypt_roles.sh | Role Encryption |
| thumbnail_check.sh | Thumbnail Check |