Skip to main content

AEM Connector

The AEM Connector indexes content from Adobe Experience Manager (AEM) author and publish instances. It consists of two components: an AEM server-side bundle (OSGi event listeners installed inside AEM) and a connector plugin (Java JAR loaded into dumont-connector.jar).


How It Works

The AEM connector receives indexing requests, then accesses AEM to traverse the content tree, fetch page data, extract tags as facets, and optionally call .model.json for custom attributes.

Three Ways to Trigger Indexing

MethodHowWhen to use
AEM Event ListenersInstall the aem-server OSGi bundle inside AEM — it automatically sends indexing requests when content is published, modified, or deletedProduction — real-time content sync
Manual API CallSend a POST request to /api/v2/aem/index/{source} with a JSON payload containing paths and event typeDevelopment, testing, one-off re-indexing
Turing ES Admin ConsoleUse Enterprise Search → Integration → Indexing Manager to select paths and trigger indexing/deindexing/publishing operationsOperations — selective re-indexing via UI

Content Discovery Strategies

The AEM connector supports two strategies for discovering content during a full Index All operation:

StrategyPropertyHow it discovers content
Tree Traversal (default)dumont.aem.querybuilder=falseRecursively walks the content tree from the root path using infinity.json — follows parent→child relationships
QueryBuilderdumont.aem.querybuilder=trueUses AEM's native QueryBuilder API (/bin/querybuilder.json) to find all content matching the configured content type in paginated batches

QueryBuilder Discovery

When enabled, the connector queries the QueryBuilder endpoint instead of walking the tree:

GET http://localhost:4502/bin/querybuilder.json?path=/content/wknd&type=cq:Page&p.hits=slim&p.limit=500&p.offset=0

Each page of discovered paths is immediately processed in parallel using a configurable thread pool — the full path list is never held in memory.

Enable it by adding the following properties to application.yaml or as JVM arguments:

dumont:
aem.querybuilder: true
aem.querybuilder.parallelism: 10 # number of parallel threads (default)

Or via command-line:

java -Ddumont.aem.querybuilder=true -Ddumont.aem.querybuilder.parallelism=10 -jar dumont-connector.jar
Content Type required

QueryBuilder discovery requires a Content Type (e.g., cq:Page) configured in the AEM source. Without it, the command logs a warning and skips processing.

When to use QueryBuilder

QueryBuilder is recommended for large content trees where recursive traversal is slow. It reduces the number of HTTP round-trips to AEM by discovering all paths in bulk, then fetching content in parallel. For small sites (< 1 000 pages), the default tree traversal is typically sufficient.


The Indexing Flow (Step by Step)

When the connector receives an indexing request (from any of the three triggers), it processes each path as follows:

1. Fetch the Content Node

The connector calls AEM's infinity.json endpoint to get the full JCR node tree:

GET http://localhost:4502/content/wknd/us/en/my-page.infinity.json

This returns the complete node hierarchy as JSON — all properties, child nodes, and metadata. The connector filters out internal nodes (prefixed with jcr:, rep:, cq:).

2. Extract Tags as Facets

For each page, the connector fetches tags:

GET http://localhost:4502/content/wknd/us/en/my-page/jcr:content.tags.json

Tags are automatically converted to facets in the search index — no manual configuration needed. Each tag becomes a filterable value in the Turing ES facet panel.

3. Fetch Model JSON (Optional — Requires Configuration)

The connector does not call .model.json by default. To enable it, you must configure a DumAemExtContentInterface implementation in the models section of the export JSON:

"models": [
{
"type": "cq:Page",
"className": "com.viglet.dumont.connector.aem.sample.ext.DumAemExtSampleModelJson",
"targetAttrs": [ ... ]
}
]

When this is configured, the extension class fetches the Sling Model exporter:

GET http://localhost:4502/content/wknd/us/en/my-page.model.json

This returns structured content from AEM's Sling Models — useful for extracting custom attributes like content fragment paths, component data, or experience fragment references. See Extending the AEM Connector for how to implement DumAemExtContentInterface and the full JSON configuration reference.

4. Traverse the Content Tree

Starting from the configured root path (e.g., /content/wknd), the connector recursively traverses all child nodes that match the configured content type (e.g., cq:Page). Each matching node goes through steps 1–3 above.

5. Map Attributes and Index

For each page, the connector:

  • Applies attribute mappings from the configuration (global attributes + model-specific source→target mappings)
  • Runs custom extension classes (if configured via className)
  • Creates a Job Item for both author and publish environments (if enabled)
  • Sends the Job Item through the Dumont DEP pipeline to Turing ES

AEM Server-Side Bundle (Event Listeners)

The aem-server module is an OSGi bundle installed inside AEM. It provides event listeners that automatically notify the Dumont connector when content changes.

Events Captured

Event ListenerAEM EventDumont Action
DumAemPageEventHandlerPage created / modifiedINDEXING
DumAemPageReplicationEventHandlerPage activated (published)PUBLISHING
DumAemPageReplicationEventHandlerPage deactivated (unpublished)UNPUBLISHING
DumAemResourceEventHandlerDAM asset created / modifiedINDEXING

OSGi Configuration

The event listeners are configured in AEM via OSGi Configuration (AEM → Web Console → Configuration):

SettingDescription
EnabledToggle to enable/disable automatic indexing
HostDumont connector URL (e.g., http://dumont-server:30130)
Config NameSource name configured in the Dumont connector

HTTP Payload

When an event fires, the bundle sends:

POST http://dumont-server:30130/api/v2/aem/index/{configName}
Content-Type: application/json

{
"paths": ["/content/wknd/us/en/my-page"],
"event": "INDEXING"
}

Event types: INDEXING, DEINDEXING, PUBLISHING, UNPUBLISHING.


Manual API Triggering

You can trigger indexing manually via HTTP (Postman, curl, etc.):

Index Specific Paths

curl -X POST http://localhost:30130/api/v2/aem/index/WKND \
-H "Content-Type: application/json" \
-d '{
"paths": ["/content/wknd/us/en/about"],
"event": "INDEXING",
"recursive": true
}'

Deindex Specific Paths

curl -X POST http://localhost:30130/api/v2/aem/index/WKND \
-H "Content-Type: application/json" \
-d '{
"paths": ["/content/wknd/us/en/old-page"],
"event": "DEINDEXING"
}'

Request Body Fields

FieldTypeDefaultDescription
pathsstring[](required)AEM content paths to process
eventstringINDEXINGINDEXING, DEINDEXING, PUBLISHING, or UNPUBLISHING
recursivebooleanfalseTraverse child nodes recursively
attributestringIDID (path-based) or URL (URL-based)

Source Configuration

Each AEM source defines connection details, content scope, author/publish environments, locale mappings, and delta tracking. Sources are configured in the Turing ES Admin Console under Enterprise Search → Integration → [your AEM instance] → Sources.

For the JSON configuration file used by custom extensions (attributes, models, locale paths), see Extending the AEM Connector.

General

FieldDescription
NameSource identifier
EndpointURL of the AEM instance (e.g., http://localhost:4502)
Username / PasswordCredentials for authenticated access to the AEM instance

Root Path

Defines the root content path within the AEM repository from which content is traversed (e.g., /content/wknd). All child nodes matching the configured content type are indexed recursively from this path.

Content Types

FieldDescription
Content TypePrimary content type to be indexed (e.g., cq:Page)
Sub TypeOptional sub-type filter within the content type

Delta Tracking

Controls incremental indexing — how the connector detects which content has changed since the last run.

FieldDescription
Once PatternPattern used to identify content that should only be indexed once
Delta ClassFully-qualified Java class name responsible for detecting changed content since the last run (see Extending AEM for custom implementations)

Author / Publish

Configures which AEM environments are indexed and how they map to Turing ES Semantic Navigation Sites.

FieldDescription
AuthorEnable indexing from the AEM author environment
PublishEnable indexing from the AEM publish environment
SN Site (Author)Semantic Navigation Site that receives author content
SN Site (Publish)Semantic Navigation Site that receives publish content
URL Prefix (Author)URL prefix prepended to document paths in the author index
URL Prefix (Publish)URL prefix prepended to document paths in the publish index

Locales

Maps content language codes to repository paths.

FieldDescription
Default LocaleLocale used when no language-specific path is matched
Locale ClassFully-qualified Java class name responsible for resolving document locale (see Extending AEM for custom implementations)
Locale → PathDynamic list mapping each locale code (e.g., en_US) to its root path in the repository

Source Actions

Each source has two action buttons available in the Turing ES admin console:

  • Index All — triggers a full indexing run for all content in this source
  • Reindex All — forces a full reindexation, replacing all previously indexed content

Indexing Rules

Indexing Rules allow you to filter content during indexing — for example, to exclude error pages or draft content before it reaches the search index. Rules are configured in the Turing ES Admin Console under Enterprise Search → Integration → [your AEM instance] → Indexing Rules.

FieldDescription
NameRule identifier (required)
DescriptionPurpose of this rule
SourceThe source this rule applies to
AttributeDocument field to evaluate (e.g., template)
Rule TypeHow the rule is applied — currently supports IGNORE (skip documents that match)
ValuesDynamic list of values that trigger the rule (add or remove entries)

Example: A rule with Attribute = template, Rule Type = IGNORE, and Values = [error-page] will prevent any document with template:error-page from being indexed.


Indexing Manager

The Indexing Manager provides a stepper form in the Turing ES Admin Console for targeting specific documents for manual operations.

OperationDescriptionColour
INDEXINGIndex specific contentBlue
DEINDEXINGRemove specific content from the indexRed
PUBLISHINGPublish contentGreen
UNPUBLISHINGUnpublish contentOrange

Each operation step allows you to:

  • Select the Source to operate on
  • Choose the attribute to identify documents: ID or URL
  • Enter one or more specific values (IDs or URLs)
  • Expand Advanced Settings to toggle Recursive mode, which traverses child content in hierarchical repositories

Concurrency

The connector supports two execution modes:

ModeWhenBehavior
ExclusiveFull crawl (indexAll)Only one full crawl per source at a time
StandaloneSpecific paths (event-driven / manual)Multiple concurrent updates allowed

Reactive (parallel) processing can be enabled for large sites:

dumont.reactive.indexing=true
dumont.reactive.parallelism=10

When using QueryBuilder discovery, parallelism is controlled separately via dumont.aem.querybuilder.parallelism (default 10). See Content Discovery Strategies above.


Customizing the AEM Connector

Need custom attribute extractors, delta date logic, or content processors? See Extending the AEM Connector for the full extension system, configuration JSON reference, and step-by-step guide.


PageDescription
AEM Event ListenerInstall the OSGi event listener bundle inside AEM for real-time indexing
Extending the AEM ConnectorCustom attribute extractors, content processors, and configuration JSON reference
Turing ES — IntegrationGeneral integration management — monitoring, indexing stats, and system information
Turing ES — AEM ConnectorAEM integration overview from the Turing ES perspective
Turing ES — Semantic NavigationConfigure the SN Sites that receive indexed content