I/O Options¶

Whether you are using a generator or a strategy, you will want to define inputs and outputs for your data. These are standard across datafuzz as they are defined in the DataSet class. The current supported data types for a dataset are:

lists
numpy 2D arrays
pandas dataframes

datafuzz will utilize pandas dataframes internally to represent and modify the records if you have pandas installed. If you want to avoid this, you may pass pandas=False in your initialization of your DataSet object.

Input options¶

You can read in several additional data formats, which will be used to create a DataSet object. These are normally defined in the Parser object, or are passed in the DataSet object itself. Options are as follows:

files:

defined by specifying file://$PATH_AND_FILENAME. Currently, only CSV and JSON files are supported.

sql queries:

defined by passing 'sql' as input. You must then also pass optional arguments for your parser (db_uri and query)

Output options¶

Output can be generated from every DataSet object by calling the to_output method, which will return either the output string or the output object. It will return a string for non-Python objects (such as sql tables and files) and an object for all native objects.

For output, you can define the following options:

files:

defined by specifying file://$PATH_AND_FILENAME. Currently, only CSV and JSON files are supported.

sql table:

defined by passing 'sql' as output. You must then also pass optional arguments for your output (db_uri and table)

pandas dataframe:

defined by passing 'pandas'

numpy 2D array:

defined by passing 'numpy'

list:

defined by passing 'list'

If you are interested in an example of using datafuzz as a stream, please see the streaming example in the example directory.