Generators¶
Currently, there is only one generator DatasetGenerator
which allows you to define schema for new datasets. This generator requires use of the `faker
library <http://faker.readthedocs.io/>`_.
Generator Initialization Requirements¶
To initialize a generator you need the following:
- parser:
- a
parsers.SchemaYAMLParser
orparsers.SchemaCLIParser
or a dictionary with the appropriate keys (see Required Parser Arguments below)- output:
- a string specifying output (see: I/O Options)
Required Parser Arguments¶
The SchemaYAMLParser
and SchemaCLIParser
as well as any dictionary you use in leiu of a parser object need to have certain keys in order to generate the data. The required arguments are as follows:
- schema:
- a dictionary of column names and values to use. Optional values are any of the faker providers defined like
'faker.name'
, as well as lists or iterators of options orrange
andarange
objects.- num_rows:
- an integer for the number of rows to generate
The parser objects (or a dictionary you use) may also have the following optional arguments which are used to create timeseries:
- start_time:
- isoformat start time which will be used to generate a timeseries. Currently, the timeseries column defaults to ‘timestamp’.
- end_time:
- if desired, set an end_time in isoformat. Note: this may or may not be reached given the number of rows requirement.
- increments:
- choice from: ‘days’, ‘hours’, ‘seconds’ and ‘random’ for timestamp increments. Default is random which is a mix of days, hours and seconds.
For more examples on how to utilize these generators, check the Usage documentation.