Part 18: Google Drive Crawl
This time I will show you how to search files in Google Drive using fess 13.4.2. Get the fess ZIP file from the download page.
Install plugin
In order to crawl files in Google Drive, you need to add a plugin to fess.
Add the plugin from the management screen. After starting fess and logging in to the management screen, click [System]> [Plug-ins]> [Install]. Select “fess-ds-gsuite-13.4.0” on the remote tab and click “Install” to install the plug-in.
Google Drive API settings
In order to crawl Google Drive content, you need to enable the Google Drive API and obtain your credentials.
First, create a project and enable the API. Visit https://console.developers.google.com/ and create a new project.
Click “Library” from the left column of the dashboard and enter “Google Drive API” in the search box to search for the API. Select “Google Drive API” from the search results and click “Enable”.
Next, create the service account key. Open “Credentials” in the left column of the dashboard, click “Create credentials” and select “Service account key”.
Select “New Service Account” from the “Service Account” pull-down. Enter the service account name (optional), specify “JSON” as the key type, and click the “Create” button.
The following message will be displayed, but this time select “Create without roles” to proceed.
After creating the service account key, JSON containing the authentication information will be downloaded. The downloaded file will be used for sharing settings and crawl settings for the crawl destination.
Finally, open Google Drive and share the folder you want to search with your service account. Right-click the folder to be crawled and click “Share”, then enter the value of “client_email” in the JSON downloaded earlier. If you do not want to be notified of sharing settings, clear the check box of the notification and click the “OK” button.
Setting up and running the crawler
Set up to crawl Google Drive.
Crawler settings
Log in to the management screen of fess, open [Crawler]> [Datastore]> [New] and create a crawl setting. The following four items need to be set.
name
handler
The parameter
script
Enter an arbitrary character string for “Name”. Select “GoogleDriveDataStore” for “Handler”.
Enter “Parameter” as follows.
private_key=-----BEGIN PRIVATE KEY-----\nMIIEv ... =\n-----END PRIVATE KEY-----\n
private_key_id=46812 ... b33f8
client_email=****@****.iam.gserviceaccount.com
default_permissions={role}guest
“Private_key”, “private_key_id” and “client_email” correspond to the JSON content downloaded in “Google Drive API settings”.
Enter “script” as follows.
title=file.name
content=file.description+"\n"+file.contents
mimetype=file.mimetype
created=file.created_time
last_modified=file.modified_time
url=file.url
thumbnail=file.thumbnail_link
content_length=file.size
filetype=file.filetype
role=file.roles
filename=file.name
The keys of the values that can be obtained and the explanation of the values are as follows. “File” refers to one file in Google Drive.
Key | value |
---|---|
files.name | File name |
files.description | File description |
files.contents | Contents of File (text) |
files.mimetype | MIME type of File |
files.created_time | File creation date and time |
files.modified_time | File last edit date |
files.web_view_link | Link to browse File on the Web |
files.thumbnail_link | Link to get thumbnail of File |
file.size | File size |
file.filetype | File type |
file.roles | File permission information |
Crawl execution
After registering crawl settings, click [Start Now] from [System]> [Scheduler]> [Default Crawler]. After the crawl is complete, let’s access the search screen and search. It is successful if the specified content can be searched.
This time, I introduced how to search files in Google Drive by using fess. Please refer to it when searching for files in Google Drive.