Skip to main content

SFTP Bulk

This page contains the setup guide and reference information for the SFTP Bulk source connector.

The SFTP Bulk connector offers several features that are not available in the standard SFTP source connector:

  • Bulk ingestion of files: This connector can consolidate and process multiple files as a single data stream in your destination system.
  • Incremental loading: This connector supports incremental loading, allowing you to sync files from the SFTP server to your destination based on their creation or last modification time.
  • Load most recent file: You can choose to load only the most recent file from the designated folder path. This feature is particularly useful when dealing with snapshot files that are regularly added and contain the latest data.

Prerequisites

  • Access to a remote server that supports SFTP
  • Host address
  • Valid username and authentication credentials (password or SSH private key)

Setup guide

Set up SFTP Bulk

Step 1: Set up SFTP authentication

To set up the SFTP connector, select one of the following authentication methods:

  • Your username and password credentials associated with the server.
  • A private/public key pair.

To set up key pair authentication, follow these steps:

  1. Open your terminal or command prompt and use the ssh-keygen command to generate a new key pair.

    note

    If your operating system does not support the ssh-keygen command, you can use a third-party tool like PuTTYgen to generate the key pair instead.

  2. You will be prompted for a location to save the keys, and a passphrase to secure the private key. You can press enter to accept the default location and opt out of a passphrase if desired. Your two keys will be generated in the designated location as two separate files. The private key will usually be saved as id_rsa, while the public key will be saved with the .pub extension (id_rsa.pub).

  3. Use the ssh-copy-id command in your terminal to copy the public key to the server.

    ssh-copy-id <username>@<server_ip_address>

    Be sure to replace your specific values for your username and the server's IP address.

    note

    Depending on factors such as your operating system and the specific SSH implementation your remote server uses, you may not be able to use the ssh-copy-id command. If so, please consult your server administrator for the appropriate steps to copy the public key to the server.

  4. You should now be able to connect to the server via the private key. You can test this by using the ssh command:

ssh <username>@<server_ip_address>

For more information on SSH key pair authentication, please refer to the official documentation.

Set up the SFTP Bulk connector in Airbyte

For Airbyte Cloud

  1. Log into your Airbyte Cloud account.

  2. Click Sources and then click + New source.

  3. On the Set up the source page, select SFTP Bulk from the Source type dropdown.

  4. Enter a name for the SFTP Bulk connector.

  5. Choose a delivery method for your data.

  6. Enter the Host Address.

  7. Enter your Username

  8. Enter your authentication credentials for the SFTP server (Password or Private Key). If using Private Key authentication, see the SSH Key Authentication Setup section below for detailed instructions.

  9. In the section titled The list of streams to sync, enter a Stream Name. This is the name of the stream that is created in your destination. Add additional streams by clicking Add.

  10. For each stream, select in the dropdown menu the File Type you wish to sync. Depending on the format chosen, you'll see a set of options specific to the file type. You can read more about specifics to each file type below.

  11. (Optional) Provide a Start Date using the provided datepicker, or by entering the date in the format YYYY-MM-DDTHH:mm:ss.SSSSSSZ. Incremental syncs will only sync files modified/added after this date.

  12. (Optional) Specify the Port. The default port for SFTP is 22. If your remote server is using a different port, enter it here.

  13. (Optional) Determine the Folder Path. This determines the directory to search for files in, and defaults to "/". If you prefer to specify a specific folder path, specify the directory on the remote server to be synced. For example, given the file structure:

    Root
    | - logs
    | | - 2021
    | | - 2022
    |
    | - files
    | | - 2021
    | | - 2022

    An input of /logs/2022 replicates only data contained within the specified folder, ignoring the /files and /logs/2021 folders. Leaving this field blank replicates all applicable files in the remote server's designated entry point. You may choose to enter a regular expression to specify a naming pattern for the files to be replicated. Consider the following example:

    log-([0-9]{4})([0-9]{2})([0-9]{2})

    This pattern filters for files that match the format log-YYYYMMDD, where YYYY, MM, and DD represent four-digit, two-digit, and two-digit numbers, respectively. For example, log-20230713. Leaving this field blank replicates all files not filtered by the previous two fields.

  14. Click Set up source to complete setup. A test runs to verify the configuration.

SSH Key Authentication Setup

If your SFTP server uses SSH key-based authentication, you'll need to provide your private key file (.pem or similar format) during setup. Follow these steps to create and upload it correctly:

  1. Locate your private key text. This is the block of text that begins with -----BEGIN OPENSSH PRIVATE KEY----- or -----BEGIN RSA PRIVATE KEY----- and ends with -----END OPENSSH PRIVATE KEY----- or -----END RSA PRIVATE KEY-----.

  2. Create a PEM file:

    1. Open any text editor or IDE (for example, PyCharm, VS Code, or a terminal text editor).
    2. Create a new file named ssh.pem.
    3. Paste the entire private key text into the file, including the BEGIN and END lines.
    4. Make sure there are no quotes or extra spaces before or after the key.
    5. Save the file.
  3. (Optional but recommended) If you're on macOS or Linux, set restricted permissions so only you can read it:

    chmod 600 ssh.pem
  4. Upload the file in Airbyte Cloud:

    1. In the SFTP Bulk source setup form, find the Private key field.
    2. Click Upload file and select your saved ssh.pem file.

Once uploaded, Airbyte uses this file to authenticate securely with your SFTP server.

note

The file must be in PEM format, a plain text file containing your private key between the BEGIN and END lines. Do not paste the key directly into the field; Airbyte requires a file upload.

For Airbyte Open Source

  1. Navigate to the Airbyte Open Source dashboard.
  2. Click Sources and then click + New source.
  3. On the Set up the source page, select SFTP Bulk from the Source type dropdown.
  4. Enter a name for the SFTP Bulk connector.

Delivery Method

Choose a delivery method for your data.

Preserve Sub-Directories in File Paths

If enabled, sends subdirectory folder structure along with source file names to the destination. Otherwise, files are synced by their names only. This option is ignored when file-based replication is not enabled.

File-specific Configuration

Depending on your File Type selection, you are presented with a few configuration options specific to that file type.

For JSONL, Parquet, and Document File Type formats, you can specify the Glob pattern used to specify which files should be selected from the file system. If your provided Folder Path already ends in a slash, you need to add that double slash to the glob where appropriate.

For example, assuming your folder path is not set in the connector configuration and your files are located in the root folder, use a glob pattern like //my_prefix_*.csv to specify your file. If your files are in a folder, include the folder in your glob pattern, like //my_folder/my_prefix_*.csv.

If your files are in a folder, include the folder in your glob pattern, like my_folder/my_prefix_*.csv.

Supported sync modes

The SFTP Bulk source connector supports the following sync modes:

FeatureSupportNotes
Full Refresh - Overwrite
Full Refresh - Append Sync
Incremental - Append
Incremental - Append + Deduped

Supported Streams

This source provides a single stream per file with a dynamic schema. The current supported type files are Avro, CSV, JSONL, Parquet, and Document File Type Format.

File Size Limitations

When using the SFTP Bulk connector with the Copy Raw Files delivery method, individual files are subject to a maximum size limit of 1.5 GB (1,500,000,000 bytes) per file. This limitation applies to the raw file transfer process where files are copied without parsing their contents.

If you need to sync files larger than 1.5 GB, consider the following approach:

  • Split large files into smaller chunks before uploading them to your SFTP server

The Replicate Records delivery method is not a workaround for large file sizes. Replicate Records only works with structured file formats (CSV, JSONL, Parquet, Avro, etc.) that the connector can parse into individual records. It does not support unstructured files or binary formats, and files processed through Replicate Records are still subject to the same size limitations.

For more information about delivery methods and their limitations, see the Delivery Methods documentation.

Reference

Config fields reference

Field
Type
Property name
object
credentials
string
host
array<object>
streams
string
username
object
delivery_method
string
folder_path
integer
port
string
start_date

Changelog

Expand to review
VersionDatePull RequestSubject
1.8.92025-11-24Increase maxSecondsBetweenMessages to 3 hours
1.8.82025-11-1069257Update error message when file exceeds size limit
1.8.62025-10-1467923Update dependencies
1.8.52025-10-0767234Update dependencies
1.8.42025-09-3066868Update dependencies
1.8.32025-09-1266197Update to CDK v7
1.8.22025-08-2460498Update dependencies
1.8.12025-05-1058962Update dependencies
1.8.02025-05-0757514Adapt file-transfer records to latest protocol, requires platform >= 1.7.0, destination-s3 >= 1.8.0
1.7.82025-04-1958448Update dependencies
1.7.72025-04-0557475Update dependencies
1.7.62025-03-2956898Update dependencies
1.7.52025-03-2254083Update dependencies
1.7.42025-02-0853570Update dependencies
1.7.32025-02-0152971Update dependencies
1.7.22025-01-2552470Update dependencies
1.7.12025-01-1843821Starting with this version, the Docker image is now rootless. Please note that this and future versions will not be compatible with Airbyte versions earlier than 0.64
1.7.02025-01-1751611Promoting release candidate 1.7.0-rc.1 to a main version.
1.7.0-rc.12025-01-1650972Include option to not mirroring subdirectory structure.
1.6.02024-12-1749826Increase individual file size limit.
1.5.02024-12-0248434Add get_file method for file-transfer feature.
1.4.02024-10-3146739make private key an airbyte secret.
1.3.02024-10-3147703Update dependency to CDK v6 with ability to transfer files.
1.2.02024-09-0346323Update dependency to CDK v5
1.1.02024-08-1444028Update dependency to CDK v4
1.0.12024-05-2938703Avoid error on empty stream when running discover
1.0.02024-03-2236256Migrate to File-Based CDK. Manage dependencies with Poetry.
0.1.22023-04-1919224Support custom CSV separators
0.1.12023-03-1724180Fix field order
0.1.02021-24-0517691Initial version