This is a cache of https://fess.codelibs.org/articles/18/document.html. It is a snapshot of the page at 2024-11-23T00:09:26.425+0000.
Part 18: Google Drive Crawl

Menu

Part 18: Google Drive Crawl

This time I will show you how to search files in Google Drive using Fess 13.4.2. Get the Fess ZIP file from the download page.

Install plugin

In order to crawl files in Google Drive, you need to add a plugin to Fess.

Add the plugin from the management screen. After starting Fess and logging in to the management screen, click [System]> [Plug-ins]> [Install]. Select “fess-ds-gsuite-13.4.0” on the remote tab and click “Install” to install the plug-in.

|images0|

Google Drive API settings

In order to crawl Google Drive content, you need to enable the Google Drive API and obtain your credentials.

First, create a project and enable the API. Visit https://console.developers.google.com/ and create a new project.

Click “Library” from the left column of the dashboard and enter “Google Drive API” in the search box to search for the API. Select “Google Drive API” from the search results and click “Enable”.

|images1|

Next, create the service account key. Open “Credentials” in the left column of the dashboard, click “Create credentials” and select “Service account key”.

|images2|

Select “New Service Account” from the “Service Account” pull-down. Enter the service account name (optional), specify “JSON” as the key type, and click the “Create” button.

|images3|

The following message will be displayed, but this time select “Create without roles” to proceed.

|images4|

After creating the service account key, JSON containing the authentication information will be downloaded. The downloaded file will be used for sharing settings and crawl settings for the crawl destination.

Finally, open Google Drive and share the folder you want to search with your service account. Right-click the folder to be crawled and click “Share”, then enter the value of “client_email” in the JSON downloaded earlier. If you do not want to be notified of sharing settings, clear the check box of the notification and click the “OK” button.

|images5|

Setting up and running the crawler

Set up to crawl Google Drive.

Crawler settings

Log in to the management screen of Fess, open [Crawler]> [Datastore]> [New] and create a crawl setting. The following four items need to be set.

  • name

  • handler

  • The parameter

  • script

Enter an arbitrary character string for “Name”. Select “GoogleDriveDataStore” for “Handler”.

Enter “Parameter” as follows.

private_key=-----BEGIN PRIVATE KEY-----\nMIIEv ... =\n-----END PRIVATE KEY-----\n
private_key_id=46812 ... b33f8
client_email=****@****.iam.gserviceaccount.com
default_permissions={role}guest

“Private_key”, “private_key_id” and “client_email” correspond to the JSON content downloaded in “Google Drive API settings”.

Enter “script” as follows.

title=file.name
content=file.description+"\n"+file.contents
mimetype=file.mimetype
created=file.created_time
last_modified=file.modified_time
url=file.url
thumbnail=file.thumbnail_link
content_length=file.size
filetype=file.filetype
role=file.roles
filename=file.name

The keys of the values ​​that can be obtained and the explanation of the values ​​are as follows. “File” refers to one file in Google Drive.

Key value
files.name File name
files.description File description
files.contents Contents of File (text)
files.mimetype MIME type of File
files.created_time File creation date and time
files.modified_time File last edit date
files.web_view_link Link to browse File on the Web
files.thumbnail_link Link to get thumbnail of File
file.size File size
file.filetype File type
file.roles File permission information

Crawl execution

After registering crawl settings, click [Start Now] from [System]> [Scheduler]> [Default Crawler]. After the crawl is complete, let’s access the search screen and search. It is successful if the specified content can be searched.

|images6|

This time, I introduced how to search files in Google Drive by using Fess. Please refer to it when searching for files in Google Drive.