Part 15: File server crawls that require authentication
fess can crawl local files as well as files on file servers. Crawling is possible even when authentication is required to access the file server.
This time, I will introduce how to crawl a file server that requires authentication.
File authentication
The file authentication function in fess is a setting required when crawling to a file server. It supports authentication for SMB or FTP.
Windows shared folders use the SMB (Server Message Block) protocol for communication. SMB has versions such as SMBv1, SMBv2, and SMBv3, but the latest version at the time of this writing, fess 13.2.0, supports SMBv1 or SMBv2.
When setting file authentication, specify “Samba” as the scheme. Specify “FTP” as the scheme when setting the file authentication to the FTP server.
Creating a crawl setting
First, create a crawl setting.
Log in to the management screen of fess and set the information of the crawl destination in “Crawl”> “File system” on the left menu.
This time, set as below to crawl \\localhost\share\
. When specifying the path, specify smb: //
at the beginning, and specify \
with /
. Specify /
at the end of the path.
item | Set value |
---|---|
name | smb_crawl |
path | smb://localhost/share/ |
When crawling a file server that supports only SMBv1, specify smb1://
instead of smb://
for the protocol.
File authentication settings
After creating the crawl settings, select [Crawl]> [File Authentication] in the left menu to set the file authentication.
The main setting items are explained below.
item | Explanation |
---|---|
hostname | Host name of the target server (arbitrary host name if omitted) |
port | Port number of target site (arbitrary port number if omitted) |
scheme | Authentication method |
username | User name for logging in to the target server |
password | Password for logging in to the target server |
The parameter | Set when there is a setting value required to log in to the target server |
File crawl settings | Crawl settings that use this authentication setting |
This time, set below. When crawling a Windows shared folder, specify “Samba” as the scheme value.
item | Set value |
---|---|
hostname | localhost |
scheme | Samba |
username | hoge |
password | hoge password |
File crawl settings | smb_crawl |
Execution
After the crawl settings and file authentication settings are registered, crawl is executed. Click [System]> [Scheduler]> [Default Crawler] on the left menu, and click the [Start Now] button. Wait a while for the crawl to finish.
After crawling, go to http://localhost:8080 and search. It will be successful if the files in the shared folder can be searched.
Active Directory integration
With fess, if you crawl a Windows shared folder, you can automatically get permission information for the shared folder at the crawl destination.
By linking with Active Directory, you can easily implement different search results for each user according to their access privileges. fess 12.2 and later support nested groups.
For details on how to link Active Directory, refer to the [9th] Active Directory work together in fess article.
FTP server crawl
To crawl the FTP server, you also need an account that has access to the FTP server.
The file crawl of the FTP server is the same as that of SMB, so create the file crawl settings first. When crawling an FTP server, the path protocol should be ftp://
instead of file://
.
After creating the crawl settings, select [Crawl]> [File Authentication] in the left menu to set the file authentication. If you want to crawl the FTP server, set the scheme to “FTP”.
After the crawl settings and file authentication settings are registered, crawl is executed.
This time, I introduced how to crawl file authentication with fess.
You can easily crawl shared folders simply by setting crawl settings and credentials, so please give it a try.