Sample Data (Faker)
The Sample Data source generates realistic fake data using the Python mimesis library. It produces an e-commerce-like dataset useful for testing, demos, and development.
Prerequisites
None. This connector generates data locally and does not connect to an external API.
Supported streams
This source has three streams: users, products, and purchases. All streams support full refresh and incremental sync, with id as the primary key and updated_at as the cursor field.
Users
Each user record contains identity and demographic fields: name, title, email, telephone, age, gender, language, academic_degree, nationality, occupation, height, blood_type, weight, and an embedded address object with street_number, street_name, city, state, province, postal_code, and country_code. The number of user records is controlled by the count configuration option.
Products
Product records represent vehicles with fields: make, model, year, and price. The products stream draws from a fixed catalog of 100 products. The count option limits how many of those 100 products are emitted; setting count higher than 100 still produces at most 100 products.
Purchases
Purchase records link users to products through user_id and product_id fields. Each record also includes timestamps: created_at, added_to_cart_at, purchased_at, and returned_at. Not every cart addition results in a purchase (approximately 70% do), and not every purchase results in a return (approximately 15% do). The purchased_at and returned_at fields are nullable. The connector generates roughly one purchase per user, so the total number of purchases scales with count.
Features
| Feature | Supported? |
|---|---|
| Full Refresh Sync | Yes |
| Incremental Sync | Yes |
| Namespaces | No |
Configuration
| Parameter | Type | Default | Description |
|---|---|---|---|
| Count | integer | 1000 | The total number of user records to generate. The purchases stream scales proportionally. Does not affect the products stream beyond its 100-product catalog. |
| Seed | integer | -1 | Controls random data generation. Set a specific value to produce the same records on each sync. Leave at -1 for random data. |
| Always Updated | boolean | true | When true, every sync emits all records with fresh updated_at timestamps. When false, the connector stops emitting records after the initial sync produces count records. |
| Records Per Stream Slice | integer | 1000 | The number of records per stream slice before a state checkpoint is emitted. |
| Parallelism | integer | 4 | The number of parallel workers for data generation. Set this to the number of CPUs allocated to the connector. |
Reference
Config fields reference
Changelog
Expand to review
| Version | Date | Pull Request | Subject |
|---|---|---|---|
| 7.0.1 | 2026-03-13 | 74818 | Patch version bump (publish test) |
| 7.0.0 | 2026-03-05 | 74318 | Test breaking change to validate breaking change infrastructure |
| 6.2.38 | 2025-11-12 | 69289 | Add externalDocumentationUrls field to metadata |
| 6.2.37 | 2025-10-21 | 68572 | Update dependencies |
| 6.2.36 | 2025-10-14 | 67806 | Update dependencies |
| 6.2.35 | 2025-10-07 | 67290 | Update dependencies |
| 6.2.34 | 2025-09-30 | 65779 | Update dependencies |
| 6.2.33 | 2025-09-03 | 65914 | Upgrade CDK to 6.28.0 and remove pendulum dependency |
| 6.2.32 | 2025-08-23 | 65273 | Update dependencies |
| 6.2.31 | 2025-08-16 | 65006 | Update dependencies |
| 6.2.30 | 2025-08-09 | 64799 | Update dependencies |
| 6.2.29 | 2025-07-26 | 63953 | Update dependencies |
| 6.2.28 | 2025-07-19 | 63534 | Update dependencies |
| 6.2.27 | 2025-07-17 | 63354 | Updated icon |
| 6.2.26 | 2025-07-16 | 63342 | Rendered name changed to Sample Data |
| 6.2.26-rc.1 | 2025-06-16 | 61645 | Update for testing |
| 6.2.25-rc.1 | 2025-04-07 | 57500 | Update for testing |
| 6.2.24 | 2025-04-05 | 57263 | Update dependencies |
| 6.2.23 | 2025-03-29 | 56502 | Update dependencies |
| 6.2.22 | 2025-03-22 | 46821 | Update dependencies |
| 6.2.21 | 2025-03-11 | 55705 | Promoting release candidate 6.2.21-rc.1 to a main version. |
| 6.2.21-rc.1 | 2024-11-13 | 48013 | Update for testing. |
| 6.2.20 | 2024-10-30 | 48013 | Promoting release candidate 6.2.20-rc.1 to a main version. |
| 6.2.20-rc.1 | 2024-10-21 | 46678 | Testing release candidate with RC suffix versioning. |
| 6.2.19-rc.1 | 2024-10-21 | 47221 | Testing release candidate with RC suffix versioning. |
| 6.2.18-rc.1 | 2024-10-09 | 46678 | Testing release candidate with RC suffix versioning. |
| 6.2.17 | 2024-10-05 | 46398 | Update dependencies |
| 6.2.16 | 2024-09-28 | 46207 | Update dependencies |
| 6.2.15 | 2024-09-21 | 45740 | Update dependencies |
| 6.2.14 | 2024-09-14 | 45567 | Update dependencies |
| 6.2.13 | 2024-09-07 | 45327 | Update dependencies |
| 6.2.12 | 2024-09-04 | 45126 | Test a release candidate release |
| 6.2.11 | 2024-08-31 | 45025 | Update dependencies |
| 6.2.10 | 2024-08-24 | 44659 | Update dependencies |
| 6.2.9 | 2024-08-17 | 44221 | Update dependencies |
| 6.2.8 | 2024-08-12 | 43753 | Update dependencies |
| 6.2.7 | 2024-08-10 | 43570 | Update dependencies |
| 6.2.6 | 2024-08-03 | 43102 | Update dependencies |
| 6.2.5 | 2024-07-27 | 42682 | Update dependencies |
| 6.2.4 | 2024-07-20 | 42367 | Update dependencies |
| 6.2.3 | 2024-07-13 | 41848 | Update dependencies |
| 6.2.2 | 2024-07-10 | 41467 | Update dependencies |
| 6.2.1 | 2024-07-09 | 41180 | Update dependencies |
| 6.2.0 | 2024-07-07 | 39935 | Update CDK to 2.0. |
| 6.1.6 | 2024-07-06 | 40956 | Update dependencies |
| 6.1.5 | 2024-06-25 | 40426 | Update dependencies |
| 6.1.4 | 2024-06-21 | 39935 | Update dependencies |
| 6.1.3 | 2024-06-04 | 39029 | [autopull] Upgrade base image to v1.2.1 |
| 6.1.2 | 2024-06-03 | 38831 | Bump CDK to allow and prefer versions 1.x |
| 6.1.1 | 2024-05-20 | 38256 | Replace AirbyteLogger with logging.Logger |
| 6.1.0 | 2024-04-08 | 36898 | Update car prices and years |
| 6.0.3 | 2024-03-15 | 36167 | Make 'count' an optional config parameter. |
| 6.0.2 | 2024-02-12 | 35174 | Manage dependencies with Poetry. |
| 6.0.1 | 2024-02-12 | 35172 | Base image migration: remove Dockerfile and use the python-connector-base image |
| 6.0.0 | 2024-01-30 | 34644 | Declare 'id' columns as primary keys. |
| 5.0.2 | 2024-01-17 | 34344 | Ensure unique state messages |
| 5.0.1 | 2023-01-08 | 34033 | Add standard entrypoints for usage with AirbyteLib |
| 5.0.0 | 2023-08-08 | 29213 | Change all *id fields and products.year to be integer |
| 4.0.0 | 2023-07-19 | 28485 | Bump to test publication |
| 3.0.2 | 2023-07-07 | 28060 | Bump to test publication |
| 3.0.1 | 2023-06-28 | 27807 | Fix bug with purchase stream updated_at |
| 3.0.0 | 2023-06-23 | 27684 | Stream cursor is now updated_at & remove records_per_sync option |
| 2.1.0 | 2023-05-08 | 25903 | Add user.address (object) |
| 2.0.3 | 2023-02-20 | 23259 | bump to test publication |
| 2.0.2 | 2023-02-20 | 23259 | bump to test publication |
| 2.0.1 | 2023-01-30 | 22117 | source-faker goes beta |
| 2.0.0 | 2022-12-14 | 20492 and 20741 | Decouple stream states for better parallelism |
| 1.0.0 | 2022-11-28 | 19490 | Faker uses the CDK; rename streams to be lower-case (breaking), add determinism to random purchases, and rename |
| 0.2.1 | 2022-10-14 | 19197 | Emit AirbyteEstimateTraceMessage |
| 0.2.0 | 2022-10-14 | 18021 | Move to mimesis for speed! |
| 0.1.8 | 2022-10-12 | 17889 | Bump to test publish command (2) |
| 0.1.7 | 2022-10-11 | 17848 | Bump to test publish command |
| 0.1.6 | 2022-09-07 | 16418 | Log start of each stream |
| 0.1.5 | 2022-06-10 | 13695 | Emit timestamps in the proper ISO format |
| 0.1.4 | 2022-05-27 | 13298 | Test publication flow |
| 0.1.3 | 2022-05-27 | 13248 | Add options for records_per_sync and page_size |
| 0.1.2 | 2022-05-26 | 13293 | Test publication flow |
| 0.1.1 | 2022-05-26 | 13235 | Publish for AMD and ARM (M1 Macs) & remove User.birthdate |
| 0.1.0 | 2022-04-12 | 11738 | The Faker Source is created |