A simple introduction to synthetic data generation with Falso.

When we develop an application, we need data. This might be a confusing statement because the data itself is generated by an application after it is developed. But In this post, we are not talking about that but rather the data needed in developing an application(data generation), for example, data used in testing, training a modal(In the case of ML application), Sample data(For demo), and many more.

These kinds of data are highly underrated as often these define the application as a whole. For instance, the quality of the application depends on testing, in turn, testing depends on the data and the data quality. When it is an ML application, well the data we use while developing the app is everything.

In this article, we will explore Falso, an open-source and free( Thank god ๐Ÿ™ ) npm library to generate a massive amount of fake data. Let’s jump right in.

Difference between data collection and data generation:

Both data collection and data generation are important when it comes to building an application, but these are often confused by people.

Data collectionData Generation
Data collection is the process of collecting data through a medium or source.Data generation is the process of the creation of new data from the data source.
Collected data should be validated and authentic.Data collection is followed by data generation.
Example: through an interface(Forms and usage).Example: analytics and generation of data using tools like Falso, DATPROF, EMS Data Generator, or manual.
Difference between data collection and data generation

Data generation:

So, data generation is the process of the creation of new data either from existing data or from a new source. In this article, the context we are speaking of is to generate data to use in applications for example in the case of a web app it is used for testing, or in the case of ML/AI applications used for training modal.

Data generation tool(Falso):

As mentioned before we have a lot of different open-source and paid data generation tools unless you are naive you will use this. One such tool that is loved by many developers is Falso.

Falso is basically an NPM library(For non-javascript developers, if there is any, NPM is a HUUUGE collection of js libraries to save your day) to generate fake data.

Basic usage:

To install the library in your project,

npm i @ngneat/falso

Note: This article dosn’t cover the basic of setting up a npm project. Well Actually, just create a folder and in the terminal navigate to the folder and give “npm init -y” you are good to go.

Synthetic Data Generation - Falso usage
Synthetic Data GenerationFalso usage

After installing you can use the library as in the above image. We can see that the library packs a lot of methods that are used to generate different types of random data.

Curious what are all the functions available?

Synthetic Data Generation - Falso available random methods.
Synthetic Data Generation – Falso available random methods.

These are all the methods from which you can generate random data. Cool right.

In the next section, we will create a simple application to generate 100 personal information entries.

Data generation example:

Now that we have learned what is data generation and the tool that is used for the process, let us put that knowledge to use.

Create a simple NPM project with only one file. Your project should look like this.

Synthetic Data Generation - Folder structure
Synthetic Data Generation – Folder structure

We will be using only one file(Index.js).

Index.js:

Synthetic Data Generation - Index.js
Synthetic Data Generation – Index.js

All the functions in Falso return new random data every time they are called, So if we call ten times, ten random data will be generated. In the above file, we call all the required functions 100 times to generate our 100 personal information data.

Result:

After running the file, you should see the 100 random personal data in the console.

Synthetic Data Generation - the result
Synthetic Data Generation – the result

Note: All these data are fake. So, please fit these data in your usecase accordingly.

Conclusion:

This is a sample article to give an introduction to data generation. But, in reality, this is a huge topic and still, research and development are in progress. When you say data generation, data management also comes into play and both cannot exist without each other. So next topic I would suggest is to learn how to manage data, if you are from an AI/ML background you can specifically search data generation and machine learning.

Thanks! happy coding โœŒ๏ธ.

You can do many things with the data you have, one is to build a recommender system. Do check out my blog on building a Recommendation System with NodeJS and MongoDB.

2 thoughts on “A simple introduction to synthetic data generation with Falso.

  1. Itรญs difficult to find educated people about this topic, but you seem like you know what youรญre talking about! Thanks

Leave a Reply

Your email address will not be published. Required fields are marked *

Pin It on Pinterest