This is a cache of https://www.elastic.co/search-labs/blog/elastic-jira-connector-optimization. It is a snapshot of the page at 2025-01-18T00:48:06.769+0000.
Jira connector tutorial part II: 6 optimization tips - Elasticsearch Labs

Jira connector tutorial part II: 6 optimization tips

After connecting Jira to Elasticsearch, we'll now review best practices to escalate this deployment.

In part I of this series, we configured the Jira connector and indexed objects into Elasticsearch. In this second part, we'll review some best practices and advanced configurations to escalate the connector. These practices complement the current documentation and are to be used during the indexing phase.

Having a connector running was just the first step. When you want to index large amounts of data, every detail counts and there are many optimization points you can use when you index documents from Jira.

Optimization points

  1. Index only the documents you'll need by applying advanced sync filters
  2. Index only the fields you'll use
  3. Refine mappings based on your needs
  4. Automate Document Level security
  5. Offload attachment extraction
  6. Monitor the connector's logs

1. Index only the documents you'll need by applying advanced sync filters

By default, Jira sends all projects, issues, and attachments. If you're only interested in some of these or, for example, just issues "In Progress", we recommend not to index everything.

There are three instances to filter documents before we put them in Elasticsearch:

Remote: We can use a native Jira filter to get only what we need. This is the best option and you should try to use this option any time you can since with this, the documents don't even come out of the source before getting into Elasticsearch. We'll use advanced sync rules for this.

Integration: If the source does not have a native filter to provide what we need, we can still filter at an integration level before ingesting into Elasticearch by using basic sync rules.

Ingest Pipelines: The last option to handle data before indexing it is using Elasticsearch ingest pipelines. By using Painless scripts, we get great flexibility to filter or manipulate documents. The downside to this is that the data has already left the source and been through the connector, thus potentially putting a heavy load on the system and creating security issues.

Let's do a quick review of the Jira issues:

GET bank/_search
{
  "_source": ["Issue.status.name", "Issue.summary"],
  "query": {
    "exists": {
      "field": "Issue.status.name"
    }
  }
}

Note: We use "exists" query to only return the documents with the field we are filtering.

You can see there are many issues in "To Do" that we don't need:

{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 2,
    "successful": 2,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 6,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "bank",
        "_id": "Marketing Mars-MM-1",
        "_score": 1,
        "_source": {
          "Issue": {
            "summary": "Conquer Mars",
            "status": {
              "name": "To Do"
            }
          }
        }
      },
      {
        "_index": "bank",
        "_id": "Marketing Mars-MM-3",
        "_score": 1,
        "_source": {
          "Issue": {
            "summary": "Conquering Earth",
            "status": {
              "name": "In Progress"
            }
          }
        }
      },
      {
        "_index": "bank",
        "_id": "Marketing Mars-MM-2",
        "_score": 1,
        "_source": {
          "Issue": {
            "summary": "Conquer the moon",
            "status": {
              "name": "To Do"
            }
          }
        }
      },
      {
        "_index": "bank",
        "_id": "Galactic Banking Project-GBP-3",
        "_score": 1,
        "_source": {
          "Issue": {
            "summary": "Intergalactic Security and Compliance",
            "status": {
              "name": "In Progress"
            }
          }
        }
      },
      {
        "_index": "bank",
        "_id": "Galactic Banking Project-GBP-2",
        "_score": 1,
        "_source": {
          "Issue": {
            "summary": "Bank Application Frontend",
            "status": {
              "name": "To Do"
            }
          }
        }
      },
      {
        "_index": "bank",
        "_id": "Galactic Banking Project-GBP-1",
        "_score": 1,
        "_source": {
          "Issue": {
            "summary": "Development of API for International Transfers",
            "status": {
              "name": "To Do"
            }
          }
        }
      }
    ]
  }
}

To only get the issues "In Progress", we'll create an advanced sync rule using a JQL query (Jira query language):

Go to the connector and click on the sync rules tab and then on Draft Rules. Once inside, go to Advanced Sync Rules and add this:

  [
    {
      "query": "status IN ('In Progress')"
    }
  ]

Once the rule has been applied, run a Full Content Sync.

This rule will exclude all issues that are not "In Progress". You can check by running the query again:

GET bank/_search
{
  "_source": ["Issue.status.name", "Issue.summary"],
  "query": {
    "exists": {
      "field": "Issue.status.name"
    }
  }
}

Here's the new response:

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 2,
    "successful": 2,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 2,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "bank",
        "_id": "Marketing Mars-MM-3",
        "_score": 1,
        "_source": {
          "Issue": {
            "summary": "Conquering Earth",
            "status": {
              "name": "In Progress"
            }
          }
        }
      },
      {
        "_index": "bank",
        "_id": "Galactic Banking Project-GBP-3",
        "_score": 1,
        "_source": {
          "Issue": {
            "summary": "Intergalactic Security and Compliance",
            "status": {
              "name": "In Progress"
            }
          }
        }
      }
    ]
  }
}

2. Index only the fields you'll use

Now that we have only the documents we want, you can see that we're still getting a lot of fields that we don't need. We can hide them when we run the query by using _source, but the best option is to simply not index them.

To do so, we'll use the ingest pipelines. We can create a pipeline that drops all the fields we won't use. Let's say we only want this info from an issue:

  • Assignee
  • Title
  • Status

We can create a new ingest pipeline that only gets those fields by using the ingest pipelines' Content UI:

Click on Copy and customize and then modify the pipeline called index-name@custom that should have just been created and empty. We can do it using Kibana DevTools console, running this command:

PUT _ingest/pipeline/bank@custom
{
  "description": "Only keep needed fields for jira issues and move them to root",
  "processors": [
    {
      "remove": {
        "keep": [
          "Issue.assignee.displayName",
          "Issue.summary",
          "Issue.status.name"
        ],
        "ignore_missing": true
      }
    },
    {
      "rename": {
        "field": "Issue.assignee.displayName",
        "target_field": "assignee",
        "ignore_missing": true
      }
    },
    {
      "rename": {
        "field": "Issue.summary",
        "target_field": "summary",
        "ignore_missing": true
      }
    },
    {
      "rename": {
        "field": "Issue.status.name",
        "target_field": "status",
        "ignore_missing": true
      }
    },
    {
      "remove": {
        "field": "Issue"
      }
    }
  ]
}

Let's remove the fields that we don't need and also move the ones that we need to the root of the document.

The remove processor with the keep parameter, will delete all the fields but the ones within the keep array from the document.

We can check this is working by running a simulation. Add the content of one of the documents from the index:

POST /_ingest/pipeline/bank@custom/_simulate
{
  "docs": [
    {
      "_index": "bank",
      "_id": "Galactic Banking Project-GBP-3",
      "_score": 1,
      "_source": {
        "Type": "Epic",
        "Custom_Fields": {
          "Satisfaction": null,
          "Approvals": null,
          "Change reason": null,
          "Epic Link": null,
          "Actual end": null,
          "Design": null,
          "Campaign assets": null,
          "Story point estimate": null,
          "Approver groups": null,
          "[CHART] Date of First Response": null,
          "Request Type": null,
          "Campaign goals": null,
          "Project overview key": null,
          "Related projects": null,
          "Campaign type": null,
          "Impact": null,
          "Request participants": [],
          "Locked forms": null,
          "Time to first response": null,
          "Work category": null,
          "Audience": null,
          "Open forms": null,
          "Details": null,
          "Sprint": null,
          "Stakeholders": null,
          "Marketing asset type": null,
          "Submitted forms": null,
          "Start date": null,
          "Actual start": null,
          "Category": null,
          "Change risk": null,
          "Target start": null,
          "Issue color": "purple",
          "Parent Link": {
            "hasEpicLinkFieldDependency": false,
            "showField": false,
            "nonEditableReason": {
              "reason": "EPIC_LINK_SHOULD_BE_USED",
              "message": "To set an epic as the parent, use the epic link instead"
            }
          },
          "Format": null,
          "Target end": null,
          "Approvers": null,
          "Team": null,
          "Change type": null,
          "Satisfaction date": null,
          "Request language": null,
          "Amount": null,
          "Rank": "0|i0001b:",
          "Affected services": null,
          "Type": null,
          "Time to resolution": null,
          "Total forms": null,
          "[CHART] Time in Status": null,
          "Organizations": [],
          "Flagged": null,
          "Project overview status": null
        },
        "Issue": {
          "statuscategorychangedate": "2024-11-07T16:59:54.786-0300",
          "issuetype": {
            "avatarId": 10307,
            "hierarchyLevel": 1,
            "name": "Epic",
            "self": "https://tomasmurua.atlassian.net/rest/api/2/issuetype/10008",
            "description": "Epics track collections of related bugs, stories, and tasks.",
            "entityId": "f5637521-ec75-48b8-a1b8-de18520807ca",
            "id": "10008",
            "iconUrl": "https://tomasmurua.atlassian.net/rest/api/2/universal_avatar/view/type/issuetype/avatar/10307?size=medium",
            "subtask": false
          },
          "components": [],
          "timespent": null,
          "timeoriginalestimate": null,
          "project": {
            "simplified": true,
            "avatarUrls": {
              "48x48": "https://tomasmurua.atlassian.net/rest/api/2/universal_avatar/view/type/project/avatar/10415",
              "24x24": "https://tomasmurua.atlassian.net/rest/api/2/universal_avatar/view/type/project/avatar/10415?size=small",
              "16x16": "https://tomasmurua.atlassian.net/rest/api/2/universal_avatar/view/type/project/avatar/10415?size=xsmall",
              "32x32": "https://tomasmurua.atlassian.net/rest/api/2/universal_avatar/view/type/project/avatar/10415?size=medium"
            },
            "name": "Galactic Banking Project",
            "self": "https://tomasmurua.atlassian.net/rest/api/2/project/10001",
            "id": "10001",
            "projectTypeKey": "software",
            "key": "GBP"
          },
          "description": null,
          "fixVersions": [],
          "aggregatetimespent": null,
          "resolution": null,
          "timetracking": {},
          "security": null,
          "aggregatetimeestimate": null,
          "attachment": [],
          "resolutiondate": null,
          "workratio": -1,
          "summary": "Intergalactic Security and Compliance",
          "watches": {
            "self": "https://tomasmurua.atlassian.net/rest/api/2/issue/GBP-3/watchers",
            "isWatching": true,
            "watchCount": 1
          },
          "issuerestriction": {
            "issuerestrictions": {},
            "shouldDisplay": true
          },
          "lastViewed": "2024-11-08T02:04:25.247-0300",
          "creator": {
            "accountId": "712020:88983800-6c97-469a-9451-79c2dd3732b5",
            "emailAddress": "contornan_cliche.0y@icloud.com",
            "avatarUrls": {
              "48x48": "https://secure.gravatar.com/avatar/f098101294d1a0da282bb2388df8c257?d=https%3A%2F%2Favatar-management--avatars.us-west-2.prod.public.atl-paas.net%2Finitials%2FTM-3.png",
              "24x24": "https://secure.gravatar.com/avatar/f098101294d1a0da282bb2388df8c257?d=https%3A%2F%2Favatar-management--avatars.us-west-2.prod.public.atl-paas.net%2Finitials%2FTM-3.png",
              "16x16": "https://secure.gravatar.com/avatar/f098101294d1a0da282bb2388df8c257?d=https%3A%2F%2Favatar-management--avatars.us-west-2.prod.public.atl-paas.net%2Finitials%2FTM-3.png",
              "32x32": "https://secure.gravatar.com/avatar/f098101294d1a0da282bb2388df8c257?d=https%3A%2F%2Favatar-management--avatars.us-west-2.prod.public.atl-paas.net%2Finitials%2FTM-3.png"
            },
            "displayName": "Tomas Murua",
            "accountType": "atlassian",
            "self": "https://tomasmurua.atlassian.net/rest/api/2/user?accountId=712020%3A88983800-6c97-469a-9451-79c2dd3732b5",
            "active": true,
            "timeZone": "Chile/Continental"
          },
          "subtasks": [],
          "created": "2024-10-29T15:52:40.306-0300",
          "reporter": {
            "accountId": "712020:88983800-6c97-469a-9451-79c2dd3732b5",
            "emailAddress": "contornan_cliche.0y@icloud.com",
            "avatarUrls": {
              "48x48": "https://secure.gravatar.com/avatar/f098101294d1a0da282bb2388df8c257?d=https%3A%2F%2Favatar-management--avatars.us-west-2.prod.public.atl-paas.net%2Finitials%2FTM-3.png",
              "24x24": "https://secure.gravatar.com/avatar/f098101294d1a0da282bb2388df8c257?d=https%3A%2F%2Favatar-management--avatars.us-west-2.prod.public.atl-paas.net%2Finitials%2FTM-3.png",
              "16x16": "https://secure.gravatar.com/avatar/f098101294d1a0da282bb2388df8c257?d=https%3A%2F%2Favatar-management--avatars.us-west-2.prod.public.atl-paas.net%2Finitials%2FTM-3.png",
              "32x32": "https://secure.gravatar.com/avatar/f098101294d1a0da282bb2388df8c257?d=https%3A%2F%2Favatar-management--avatars.us-west-2.prod.public.atl-paas.net%2Finitials%2FTM-3.png"
            },
            "displayName": "Tomas Murua",
            "accountType": "atlassian",
            "self": "https://tomasmurua.atlassian.net/rest/api/2/user?accountId=712020%3A88983800-6c97-469a-9451-79c2dd3732b5",
            "active": true,
            "timeZone": "Chile/Continental"
          },
          "aggregateprogress": {
            "total": 0,
            "progress": 0
          },
          "priority": {
            "name": "Medium",
            "self": "https://tomasmurua.atlassian.net/rest/api/2/priority/3",
            "iconUrl": "https://tomasmurua.atlassian.net/images/icons/priorities/medium.svg",
            "id": "3"
          },
          "labels": [],
          "environment": null,
          "timeestimate": null,
          "aggregatetimeoriginalestimate": null,
          "versions": [],
          "duedate": null,
          "progress": {
            "total": 0,
            "progress": 0
          },
          "issuelinks": [],
          "votes": {
            "hasVoted": false,
            "self": "https://tomasmurua.atlassian.net/rest/api/2/issue/GBP-3/votes",
            "votes": 0
          },
          "comment": {
            "total": 0,
            "comments": [],
            "maxResults": 0,
            "self": "https://tomasmurua.atlassian.net/rest/api/2/issue/10008/comment",
            "startAt": 0
          },
          "assignee": {
            "accountId": "712020:88983800-6c97-469a-9451-79c2dd3732b5",
            "emailAddress": "contornan_cliche.0y@icloud.com",
            "avatarUrls": {
              "48x48": "https://secure.gravatar.com/avatar/f098101294d1a0da282bb2388df8c257?d=https%3A%2F%2Favatar-management--avatars.us-west-2.prod.public.atl-paas.net%2Finitials%2FTM-3.png",
              "24x24": "https://secure.gravatar.com/avatar/f098101294d1a0da282bb2388df8c257?d=https%3A%2F%2Favatar-management--avatars.us-west-2.prod.public.atl-paas.net%2Finitials%2FTM-3.png",
              "16x16": "https://secure.gravatar.com/avatar/f098101294d1a0da282bb2388df8c257?d=https%3A%2F%2Favatar-management--avatars.us-west-2.prod.public.atl-paas.net%2Finitials%2FTM-3.png",
              "32x32": "https://secure.gravatar.com/avatar/f098101294d1a0da282bb2388df8c257?d=https%3A%2F%2Favatar-management--avatars.us-west-2.prod.public.atl-paas.net%2Finitials%2FTM-3.png"
            },
            "displayName": "Tomas Murua",
            "accountType": "atlassian",
            "self": "https://tomasmurua.atlassian.net/rest/api/2/user?accountId=712020%3A88983800-6c97-469a-9451-79c2dd3732b5",
            "active": true,
            "timeZone": "Chile/Continental"
          },
          "worklog": {
            "total": 0,
            "maxResults": 20,
            "startAt": 0,
            "worklogs": []
          },
          "updated": "2024-11-07T16:59:54.786-0300",
          "status": {
            "name": "In Progress",
            "self": "https://tomasmurua.atlassian.net/rest/api/2/status/10004",
            "description": "",
            "iconUrl": "https://tomasmurua.atlassian.net/",
            "id": "10004",
            "statusCategory": {
              "colorName": "yellow",
              "name": "In Progress",
              "self": "https://tomasmurua.atlassian.net/rest/api/2/statuscategory/4",
              "id": 4,
              "key": "indeterminate"
            }
          }
        },
        "id": "Galactic Banking Project-GBP-3",
        "_timestamp": "2024-11-07T16:59:54.786-0300",
        "Key": "GBP-3",
        "_allow_access_control": [
          "account_id:63c04b092341bff4fff6e0cb",
          "account_id:712020:88983800-6c97-469a-9451-79c2dd3732b5",
          "name:Gustavo",
          "name:Tomas-Murua"
          ]
      }
    }
    ]
}

The response will be:

{
  "docs": [
    {
      "doc": {
        "_index": "bank",
        "_version": "-3",
        "_id": "Galactic Banking Project-GBP-3",
        "_source": {
          "summary": "Intergalactic Security and Compliance",
          "assignee": "Tomas Murua",
          "status": "In Progress"
        },
        "_ingest": {
          "timestamp": "2024-11-10T06:58:25.494057572Z"
        }
      }
    }
  ]
}

This looks much better! Now, let's run a full content sync to apply the changes.

3. Refine mappings based on your needs

The document is clean. However, we can optimize things more. We can go into “it depends” territory. Some mappings can work for your use case while others will not. The best way to find out is by experimenting.

Let's say we tested and got to this mappings design:

  • assignee: full text search and filters
  • summary: full text search
  • status: filters and sorting

By default, the connector will create mappings using dynamic_templates that will configure all text fields for full-text search, filtering and sorting, which is a solid baseline but it can be optimized if we know what we want to do with our fields.

This is the rule:

{
  "all_text_fields": {
    "match_mapping_type": "string",
    "mapping": {
      "analyzer": "iq_text_base",
      "fields": {
        "delimiter": {
          "analyzer": "iq_text_delimiter",
          "type": "text",
          "index_options": "freqs"
        },
        "joined": {
          "search_analyzer": "q_text_bigram",
          "analyzer": "i_text_bigram",
          "type": "text",
          "index_options": "freqs"
        },
        "prefix": {
          "search_analyzer": "q_prefix",
          "analyzer": "i_prefix",
          "type": "text",
          "index_options": "docs"
        },
        "enum": {
          "ignore_above": 2048,
          "type": "keyword"
        },
        "stem": {
          "analyzer": "iq_text_stem",
          "type": "text"
        }
      }
    }
  }
}

Let's create different subfields for different purposes for all text fields. You can find additional information about the analyzers in the documentation.

To use these mappings you must:

  1. Create the index before you create the connector
  2. When you create the connector, select that index instead of creating a new one
  3. Create the ingest pipeline to get the fields you want
  4. Run a Full Content Sync*

*A Full Content Sync will send all documents to Elasticsearch. Incremental Sync will only send to Elasticsearch documents that changed after the last Incremental, or Full Content Sync. Both methods will fetch all the data from the data source.

Our optimized mappings are below:

PUT bank-optimal
{
  "mappings": {
    "properties": {
      "assignee": {
        "type": "text",
        "fields": {
          "delimiter": {
            "type": "text",
            "index_options": "freqs",
            "analyzer": "iq_text_delimiter"
          },
          "enum": {
            "type": "keyword",
            "ignore_above": 2048
          },
          "joined": {
            "type": "text",
            "index_options": "freqs",
            "analyzer": "i_text_bigram",
            "search_analyzer": "q_text_bigram"
          },
          "prefix": {
            "type": "text",
            "index_options": "docs",
            "analyzer": "i_prefix",
            "search_analyzer": "q_prefix"
          },
          "stem": {
            "type": "text",
            "analyzer": "iq_text_stem"
          }
        },
        "analyzer": "iq_text_base"
      },
      "summary": {
        "type": "text",
        "fields": {
          "delimiter": {
            "type": "text",
            "index_options": "freqs",
            "analyzer": "iq_text_delimiter"
          },
          "joined": {
            "type": "text",
            "index_options": "freqs",
            "analyzer": "i_text_bigram",
            "search_analyzer": "q_text_bigram"
          },
          "prefix": {
            "type": "text",
            "index_options": "docs",
            "analyzer": "i_prefix",
            "search_analyzer": "q_prefix"
          },
          "stem": {
            "type": "text",
            "analyzer": "iq_text_stem"
          }
        },
        "analyzer": "iq_text_base"
      },
      "status": {
        "type": "keyword"
      }
    }
  }
}

For assignee, we kept the mappings as they are because we want this field to be optimized for both search and filters. For summary, we removed the “enum” keyword field because we don’t plan to filter on summaries. We mapped status as a keyword because we only plan to filter on that field.

Note: If you're not sure how you will use your fields, the baselines analyzers should be fine.

4. Automate Document Level security

In the first section, we learned to manually create API keys for a user and limit access based on it using Document Level Security (DLS). However, if you want to automatically create an API Key with permissions every time a user visits our site, you need to create a script that takes the request, generates an API Key using the user ID and then uses it to search in Elasticsearch.

Here's a reference file in Python:

import os
import requests
class ElasticsearchKeyGenerator:
   def __init__(self):
       self.es_url = "https://xxxxxxx.es.us-central1.gcp.cloud.es.io" # Your Elasticsearch URL
       self.es_user = "" # Your Elasticsearch User
       self.es_password = "" # Your Elasticsearch password

       # Basic configuration for requests
       self.auth = (self.es_user, self.es_password)
       self.headers = {'Content-Type': 'application/json'}

   def create_api_key(self, user_id, index, expiration='1d', metadata=None):
       """
       Create an Elasticsearch API key for a single index with user-specific filters.

       Args:
           user_id (str): User identifier on the source system
           index (str): Index name
           expiration (str): Key expiration time (default: '1d')
           metadata (dict): Additional metadata for the API key

       Returns:
           str: Encoded API key if successful, None if failed
       """
       try:
           # Get user-specific ACL filters
           acl_index = f'.search-acl-filter-{index}'
           response = requests.get(
               f'{self.es_url}/{acl_index}/_doc/{user_id}',
               auth=self.auth,
               headers=self.headers
           )
           response.raise_for_status()

           # Build the query
           query = {
               'bool': {
                   'must': [
                       {'term': {'_index': index}},
                       response.json()['_source']['query']
                   ]
               }
           }

           # Set default metadata if none provided
           if not metadata:
               metadata = {'created_by': 'create-api-key'}

           # Prepare API key request body
           api_key_body = {
               'name': user_id,
               'expiration': expiration,
               'role_descriptors': {
                   f'jira-role': {
                       'index': [{
                           'names': [index],
                           'privileges': ['read'],
                           'query': query
                       }]
                   }
               },
               'metadata': metadata
           }

           print(api_key_body)

           # Create API key
           api_key_response = requests.post(
               f'{self.es_url}/_security/api_key',
               json=api_key_body,
               auth=self.auth,
               headers=self.headers
           )
           api_key_response.raise_for_status()

           return api_key_response.json()['encoded']

       except requests.exceptions.RequestException as e:
           print(f"Error creating API key: {str(e)}")
           return None

# Example usage
if __name__ == "__main__":
   key_generator = ElasticsearchKeyGenerator()

   encoded_key = key_generator.create_api_key(
       user_id="63c04b092341bff4fff6e0cb", # User id on Jira
       index="bank",
       expiration="1d",
       metadata={
           "application": "my-search-app",
           "namespace": "dev",
           "foo": "bar"
       }
   )

   if encoded_key:
       print(f"Generated API key: {encoded_key}")
   else:
       print("Failed to generate API key")

You can call this create_api_key function on each API request to generate an API Key the user can use to query Elasticsearch in the subsequent requests. You can set expiration, and also arbitrary metadata in case you want to register some info about the user or the API that generated the key.

5. Offload attachment extraction

For content extraction, like extracting text from PDF and Powerpoint files, Elastic provides an out of the box service that works fine but has a size limitation.

By default, the extraction service of the native connectors supports 10MB max per attachment. If you have bigger attachments like a PDF with big images inside or you want to host the extraction service, Elastic offers a tool that lets you deploy your own extraction service.

This option is only compatible with Connector Clients, so if you're using a Native connector you will need to convert it to a connector client and host it in your own infrastructure.

Follow these steps to do it:

a. Configure custom extraction service and run it with Docker

docker run \
  -p 8090:8090 \
  -it \
  --name extraction-service \
  docker.elastic.co/enterprise-search/data-extraction-service:$EXTRACTION_SERVICE_VERSION

EXTRACTION_SERVICE_VERSION you should use 0.3.x for Elasticsearch 8.15

b. Configure yaml con extraction service custom and run

Go to the connector client and add the following to the config.yml file to use the extraction service:

extraction_service:
  host: http://localhost:8090

c. Follow steps to run connector client

After configuring you can run the connector client with the connector you want to use.

docker run \
-v "</absolute/path/to>/connectors-config:/config" \ # NOTE: change absolute path to match where config.yml is located on your machine
--tty \
--rm \
docker.elastic.co/enterprise-search/elastic-connectors:{version}.0 \
/app/bin/elastic-ingest \
-c /config/config.yml # Path to your configuration file in the container

You can refer to the full process in the docs.

6. Monitor Connector's logs

It's important to have visibility of the connector's logs in case there's an issue and Elastic offers this out of the box.

The first step is to activate logging in the cluster. The recommendation is to send logs to an additional cluster (Monitoring deployment), but in a development environment, you can send the logs to the same cluster where you're indexing documents too.

By default, the connector will send the logs to the elastic-cloud-logs-8 index. If you're using Cloud, you can check the logs in the new Logs Explorer:

Conclusion

In this article, we learned different strategies to consider when we take the next step in using a connector in a production environment. Optimizing resources, automating security, and cluster monitoring are key mechanisms to properly run a large-scale system.

Want to get Elastic certified? Find out when the next Elasticsearch Engineer training is running!

Elasticsearch is packed with new features to help you build the best search solutions for your use case. Dive into our sample notebooks to learn more, start a free cloud trial, or try Elastic on your local machine now.

Ready to build state of the art search experiences?

Sufficiently advanced search isn’t achieved with the efforts of one. Elasticsearch is powered by data scientists, ML ops, engineers, and many more who are just as passionate about search as your are. Let’s connect and work together to build the magical search experience that will get you the results you want.

Try it yourself