Strategies¶
In datafuzz
, strategies are used to define ways to fuzz or add noise to data. There are currently three types of strategies which reflect three different classes: Duplicator
, NoiseMaker
and Fuzzer
.
Required Initialization Values¶
To use each strategy, you must define certain attributes, as follows.
For all strategies, you need to define:
- dataset:
- a
datafuzz.DataSet
object to apply the strategy to- percentage:
- percentage of rows to fuzz, noise or duplicate (0-100)
The NoiseMaker
class has some additional requirements:
- columns:
- a list of columns to apply the noise to (this will be chosen at random if not provided)
- noise:
a list of possible noise to apply. Options are:
- ‘add_nulls’: add null values
- ‘string_permutation’: apply string transformations
- ‘random’: generate some random values based on col type
- ‘range’: change values into given or column range
- ‘type_transform’: apply type transformations
The Fuzzer
class has one additional requirements:
- columns:
- a list of columns to apply the fuzz to (this will be chosen at random if not provided)
The Duplicator
class has one additional options:
- add_noise:
- boolean to signify if random noise should be applied to the duplicated rows
Running the strategy¶
For each strategy class, you can run the strategy using self.run_strategy
. This will apply the transformation directly to the dataset records.