Skip to main content

Sample Data (Faker)

The Sample Data source generates realistic fake data using the Python mimesis library. It produces an e-commerce-like dataset useful for testing, demos, and development.

Prerequisites

None. This connector generates data locally and does not connect to an external API.

Supported streams

This source has three streams: users, products, and purchases. All streams support full refresh and incremental sync, with id as the primary key and updated_at as the cursor field.

Users

Each user record contains identity and demographic fields: name, title, email, telephone, age, gender, language, academic_degree, nationality, occupation, height, blood_type, weight, and an embedded address object with street_number, street_name, city, state, province, postal_code, and country_code. The number of user records is controlled by the count configuration option.

Products

Product records represent vehicles with fields: make, model, year, and price. The products stream draws from a fixed catalog of 100 products. The count option limits how many of those 100 products are emitted; setting count higher than 100 still produces at most 100 products.

Purchases

Purchase records link users to products through user_id and product_id fields. Each record also includes timestamps: created_at, added_to_cart_at, purchased_at, and returned_at. Not every cart addition results in a purchase (approximately 70% do), and not every purchase results in a return (approximately 15% do). The purchased_at and returned_at fields are nullable. The connector generates roughly one purchase per user, so the total number of purchases scales with count.

Features

FeatureSupported?
Full Refresh SyncYes
Incremental SyncYes
NamespacesNo

Configuration

ParameterTypeDefaultDescription
Countinteger1000The total number of user records to generate. The purchases stream scales proportionally. Does not affect the products stream beyond its 100-product catalog.
Seedinteger-1Controls random data generation. Set a specific value to produce the same records on each sync. Leave at -1 for random data.
Always UpdatedbooleantrueWhen true, every sync emits all records with fresh updated_at timestamps. When false, the connector stops emitting records after the initial sync produces count records.
Records Per Stream Sliceinteger1000The number of records per stream slice before a state checkpoint is emitted.
Parallelisminteger4The number of parallel workers for data generation. Set this to the number of CPUs allocated to the connector.

Reference

Config fields reference

Field
Type
Property name
boolean
always_updated
integer
count
integer
parallelism
integer
records_per_slice
integer
seed

Changelog

Expand to review
VersionDatePull RequestSubject
7.0.12026-03-1374818Patch version bump (publish test)
7.0.02026-03-0574318Test breaking change to validate breaking change infrastructure
6.2.382025-11-1269289Add externalDocumentationUrls field to metadata
6.2.372025-10-2168572Update dependencies
6.2.362025-10-1467806Update dependencies
6.2.352025-10-0767290Update dependencies
6.2.342025-09-3065779Update dependencies
6.2.332025-09-0365914Upgrade CDK to 6.28.0 and remove pendulum dependency
6.2.322025-08-2365273Update dependencies
6.2.312025-08-1665006Update dependencies
6.2.302025-08-0964799Update dependencies
6.2.292025-07-2663953Update dependencies
6.2.282025-07-1963534Update dependencies
6.2.272025-07-1763354Updated icon
6.2.262025-07-1663342Rendered name changed to Sample Data
6.2.26-rc.12025-06-1661645Update for testing
6.2.25-rc.12025-04-0757500Update for testing
6.2.242025-04-0557263Update dependencies
6.2.232025-03-2956502Update dependencies
6.2.222025-03-2246821Update dependencies
6.2.212025-03-1155705Promoting release candidate 6.2.21-rc.1 to a main version.
6.2.21-rc.12024-11-1348013Update for testing.
6.2.202024-10-3048013Promoting release candidate 6.2.20-rc.1 to a main version.
6.2.20-rc.12024-10-2146678Testing release candidate with RC suffix versioning.
6.2.19-rc.12024-10-2147221Testing release candidate with RC suffix versioning.
6.2.18-rc.12024-10-0946678Testing release candidate with RC suffix versioning.
6.2.172024-10-0546398Update dependencies
6.2.162024-09-2846207Update dependencies
6.2.152024-09-2145740Update dependencies
6.2.142024-09-1445567Update dependencies
6.2.132024-09-0745327Update dependencies
6.2.122024-09-0445126Test a release candidate release
6.2.112024-08-3145025Update dependencies
6.2.102024-08-2444659Update dependencies
6.2.92024-08-1744221Update dependencies
6.2.82024-08-1243753Update dependencies
6.2.72024-08-1043570Update dependencies
6.2.62024-08-0343102Update dependencies
6.2.52024-07-2742682Update dependencies
6.2.42024-07-2042367Update dependencies
6.2.32024-07-1341848Update dependencies
6.2.22024-07-1041467Update dependencies
6.2.12024-07-0941180Update dependencies
6.2.02024-07-0739935Update CDK to 2.0.
6.1.62024-07-0640956Update dependencies
6.1.52024-06-2540426Update dependencies
6.1.42024-06-2139935Update dependencies
6.1.32024-06-0439029[autopull] Upgrade base image to v1.2.1
6.1.22024-06-0338831Bump CDK to allow and prefer versions 1.x
6.1.12024-05-2038256Replace AirbyteLogger with logging.Logger
6.1.02024-04-0836898Update car prices and years
6.0.32024-03-1536167Make 'count' an optional config parameter.
6.0.22024-02-1235174Manage dependencies with Poetry.
6.0.12024-02-1235172Base image migration: remove Dockerfile and use the python-connector-base image
6.0.02024-01-3034644Declare 'id' columns as primary keys.
5.0.22024-01-1734344Ensure unique state messages
5.0.12023-01-0834033Add standard entrypoints for usage with AirbyteLib
5.0.02023-08-0829213Change all *id fields and products.year to be integer
4.0.02023-07-1928485Bump to test publication
3.0.22023-07-0728060Bump to test publication
3.0.12023-06-2827807Fix bug with purchase stream updated_at
3.0.02023-06-2327684Stream cursor is now updated_at & remove records_per_sync option
2.1.02023-05-0825903Add user.address (object)
2.0.32023-02-2023259bump to test publication
2.0.22023-02-2023259bump to test publication
2.0.12023-01-3022117source-faker goes beta
2.0.02022-12-1420492 and 20741Decouple stream states for better parallelism
1.0.02022-11-2819490Faker uses the CDK; rename streams to be lower-case (breaking), add determinism to random purchases, and rename
0.2.12022-10-1419197Emit AirbyteEstimateTraceMessage
0.2.02022-10-1418021Move to mimesis for speed!
0.1.82022-10-1217889Bump to test publish command (2)
0.1.72022-10-1117848Bump to test publish command
0.1.62022-09-0716418Log start of each stream
0.1.52022-06-1013695Emit timestamps in the proper ISO format
0.1.42022-05-2713298Test publication flow
0.1.32022-05-2713248Add options for records_per_sync and page_size
0.1.22022-05-2613293Test publication flow
0.1.12022-05-2613235Publish for AMD and ARM (M1 Macs) & remove User.birthdate
0.1.02022-04-1211738The Faker Source is created
Was this page helpful?