Skip to main content

Database Connector

The Database Connector extracts content from any JDBC-compatible database by executing SQL queries and mapping each result row to a searchable document. It supports Oracle, PostgreSQL, MariaDB, MySQL, and any database with a JDBC driver.


How It Works

  1. Execute the configured SQL query against the database
  2. Iterate through the result set
  3. Map each column to a search index field based on the attribute mapping
  4. Create a Job Item with the mapped fields
  5. Submit to the pipeline in configurable chunks
  6. Repeat until all rows are processed

Key Features

FeatureDescription
Any JDBC databaseOracle, PostgreSQL, MariaDB, MySQL, and any JDBC-compatible source
SQL flexibilityUse any SELECT query — joins, aggregations, subqueries, functions
External SQL filesLoad queries from files using the file:// protocol
Batch processingConfigurable chunk size for memory-efficient processing
Max content sizeLimits per-document content size (default: 5 MB) to prevent oversized documents
Standalone CLIRun imports from the command line without the full Dumont DEP application
Custom extensionsDumDbExtCustomImpl interface for custom row processing logic
Locale supportConfigurable default locale for all extracted documents

CLI Parameters

The standalone Database Connector accepts the following command-line parameters:

ParameterRequiredDefaultDescription
--driverYesJDBC driver class name
--connectYesJDBC connection string
--query / -qYesSQL query or file://path/to/query.sql
--siteYesTarget Semantic Navigation Site name
--server / -sYesDumont DEP server URL
--api-key / -aYesAPI key for authentication
--localeNoen_USDefault locale for documents
--chunk / -zNo(varies)Batch size for processing
--max-content-sizeNo5Maximum content size in MB
--deindex-before-importingNofalseRemove all existing documents before import

Example: Indexing a Product Catalog

SQL Query

SELECT
p.id,
p.name AS title,
p.description AS text,
p.price,
c.name AS category,
p.updated_at AS date,
CONCAT('https://shop.example.com/products/', p.slug) AS url
FROM products p
JOIN categories c ON p.category_id = c.id
WHERE p.active = 1

Standalone Import

java -cp dumont-db.jar com.viglet.dumont.db.DumDbImportTool \
--server http://localhost:30130 \
--api-key your-api-key \
--driver org.mariadb.jdbc.Driver \
--connect "jdbc:mariadb://localhost:3306/shop?user=reader&password=secret" \
--query "file:///opt/queries/products.sql" \
--site ProductCatalog \
--locale en_US \
--chunk 100

This will:

  1. Connect to the MariaDB database
  2. Execute the SQL query from the external file
  3. Map each row to a document (columns become fields)
  4. Send documents in batches of 100 to the ProductCatalog SN Site

Supported Databases

DatabaseDriverConnection String Example
MariaDBorg.mariadb.jdbc.Driverjdbc:mariadb://host:3306/dbname
MySQLcom.mysql.cj.jdbc.Driverjdbc:mysql://host:3306/dbname
PostgreSQLorg.postgresql.Driverjdbc:postgresql://host:5432/dbname
Oracleoracle.jdbc.OracleDriverjdbc:oracle:thin:@host:1521/service

Tips

  • Use aliases in your SQL query to match the field names expected by Turing ES (e.g., SELECT name AS title).
  • External SQL files (file:// protocol) keep complex queries maintainable and version-controlled.
  • De-index before importing (--deindex-before-importing) is useful for full refreshes — it removes stale documents that no longer exist in the database.
  • Chunk size affects memory usage: larger chunks are faster but use more memory. Start with 100 and adjust based on document size.

Customizing the Database Connector

Need to transform or enrich rows during import? See Extending the Database Connector for the DumDbExtCustomImpl interface and step-by-step guide.