Capacity Planning
This document provides a comprehensive capacity planning reference to help users effectively allocate resources based on specific requirements in their environment. The actual system requirements may vary due to workload characteristics, network conditions, server specifications, and other factors. Therefore, we recommend conducting performance tests in the specific environment to obtain more accurate configuration data.
Terminology
- Data Pipeline: A data pipeline can replicate one or multiple tables from a source database to a target database. During the synchronization process, data can also be transformed and processed (e.g., data filtering) to ensure that the target database receives accurate and optimized data.
- RPS (Records Per Second): A metric that measures data transfer speed and system processing capability, reflecting the number of records the system processes per second.
Pipeline Resource Requirements
-
Memory Requirements:
(read_batch * 8 + 10240 + write_batch * (2 + threads)) * (10 * row_size + 5KB) + log buffer ≈ 1GB / 1KB row sizetipThe read and write batch sizes can be adjusted in the data pipeline configuration through the basic parameters of the source and target nodes.
-
CPU Requirements: The requirements for computing resources vary under different business load scenarios, as a general reference:
- Total Threads:
Server Cores * 2 - Average Threads Required per Data Pipeline: 1 ~ 8
- CPU Cores Required per Data Pipeline: 0.5 ~ 4
- Total Threads:
Quick Reference Table
| Category | Business Load | CPU Cores Required | Memory Required | Number of Pipelines per 16-core Server |
|---|---|---|---|---|
| Full Synchronization | Large Data Volume (Table Data > 1 TB) | 4 | 1 GB per 1KB row size | 8 |
| Medium/Small Data Volume (Table Data < 1TB) | 2 | 16 | ||
| Incremental Replication | High Throughput (RPS > 10,000) | 2 | 8 | |
| Medium Throughput (1,000 ~ 9,999 RPS) | 1 | 16 | ||
| Low Throughput (RPS < 1,000) | 0.5 | 32 |
High Availability Configuration Recommendations
In High Availability (HA) deployment scenarios, at least two TapData instances are typically deployed to ensure failover and business continuity. During failover, all pipelines from one instance will automatically transfer to the other instance. In this case, the remaining instance will bear additional load. To avoid excessive load, it is recommended to configure the number of pipelines at 50% ~ 75% of the server capacity to maintain the necessary performance buffer.
For example, if a 16-core server is configured to run 16 pipelines, in an HA setup, it is advisable to run only 8 ~ 12 pipelines to ensure system stability and high availability.
Performance Monitoring and Adjustment
- Real-Time Task Monitoring: Observe task operation details, such as synchronization rate and latency during full/incremental phases, through the task monitoring page.
- Cluster Metrics Monitoring: Monitor the operating status of all components within the cluster and the number of external connections through the cluster management page. Use third-party performance monitoring tools to track CPU, memory, network, and other resource usage of the cluster.
Based on the above monitoring data, dynamically adjust pipeline configuration and resource allocation to ensure the system remains stable and efficient under high load conditions.