I/O Options¶
Whether you are using a generator or a strategy, you will want to define inputs and outputs for your data. These are standard across datafuzz as they are defined in the DataSet class. The current supported data types for a dataset are:
- lists
- numpy 2D arrays
- pandas dataframes
datafuzz will utilize pandas dataframes internally to represent and modify the records if you have pandas installed. If you want to avoid this, you may pass pandas=False in your initialization of your DataSet object.
Input options¶
You can read in several additional data formats, which will be used to create a DataSet object. These are normally defined in the Parser object, or are passed in the DataSet object itself. Options are as follows:
- files:
- defined by specifying
file://$PATH_AND_FILENAME. Currently, only CSV and JSON files are supported.- sql queries:
- defined by passing
'sql'as input. You must then also pass optional arguments for your parser (db_uriandquery)
Output options¶
Output can be generated from every DataSet object by calling the to_output method, which will return either the output string or the output object. It will return a string for non-Python objects (such as sql tables and files) and an object for all native objects.
For output, you can define the following options:
- files:
- defined by specifying
file://$PATH_AND_FILENAME. Currently, only CSV and JSON files are supported.- sql table:
- defined by passing
'sql'as output. You must then also pass optional arguments for your output (db_uriandtable)- pandas dataframe:
- defined by passing
'pandas'- numpy 2D array:
- defined by passing
'numpy'- list:
- defined by passing
'list'
If you are interested in an example of using datafuzz as a stream, please see the streaming example in the example directory.