Using ChatGPT as a data analyst

Using ChatGPT as a data analyst
Image generated by Midjourney

While AI-based intelligent chatbots are popular among internet users, none of them have achieved the immense popularity that ChatGPT has achieved within months of its launch. As of January 2023, ChatGPT has over 100 million active users, making it the fastest-growing app in the history of the internet. One of the key reasons behind ChatGPT’s widespread popularity is its versatility.

Unlike most other AI chatbots available online, ChatGPT wasn’t created for a particular industry. It is a Natural Language Processing (NLP) trained chatbot that can converse with users and answer their requests in a simple, human dialogue-based format. According to the official website, ChatGPT has been trained to

  • Answer follow-up questions
  • Admit its mistakes
  • Challenge incorrect premises
  • Reject inappropriate requests

This makes it capable of assisting users with a wide variety of tasks, including research, writing, and data analysis. ChatGPT’s intelligent data analysis capabilities make it an exciting new tool for data science professionals. It can be used to fetch vast datasets from the internet or to analyse custom datasets to answer direct questions. In this article, we discuss four ways in which ChatGPT can be used for data analysis.

Four ways ChatGPT can be used for data analysis

Here are some exciting ways in which ChatGPT can assist data analysts in everyday tasks.

  • It can explain complex code

Deciphering a long piece of code that you haven't seen before can be difficult. This is especially true for unstructured code, where the standard format for writing code isn't followed. In such cases, you might have to spend hours carefully reading each line to put the dots together. Understanding a complex piece of code this way can prove tedious and inefficient if you're working under a strict deadline.

ChatGPT can be helpful in such scenarios. It is trained to pick up on specific words and phrases mentioned in user queries and use them to gain better context in the conversation. You'll need to simply type in the query: "Explain this (programming language) code" and copy-paste the code you're trying to understand. ChatGPT will tell you what the code is used for and what functions it uses to produce the desired result. For example, if you ask ChatGPT to explain the code for a simple calculator, it might phrase its reply like this:

"This code is a simple calculator. It can be used to add, subtract, multiply, or divide two numbers. The user is prompted to enter two numbers and enter a mathematical operator. The program uses a switch() case to produce a result based on user input."

  • It can write data scraping and data collection code for you

Just like it can be used to decipher a given piece of code, ChatGPT can write elaborate data scraping programs. All you need to do is mention the language you’d like the code to be written in and any specific restrictions you’d like it to follow. For example, you can type in a query that says, “Write a Python code to scrape data from Facebook without my IP address being blocked.”

ChatGPT is likely to respond with a Python script that uses the Facebook Graph API to scrape Facebook data through IP address rotation. The query will also contain a brief explanation of the technique used (IP rotation in this case) and a usage warning wherever necessary. You can make your query more complex and ask for a web scraper code that does not use the Facebook Graph API if needed.

One thing you should keep in mind while using ChatGPT to write code is that it might not always answer with the most efficient alternative. It can reply with both dynamic and hardcoded scripts depending on your requests and how it perceives them. This is why it is better to use ChatGPT as a launchpad for your project and make changes to its code.

  • It can write comments for a given piece of code

Commenting on an existing piece of code is arguably one of the smartest ways to use ChatGPT in data analysis. This is because while the application cannot be trusted with creative tasks such as writing code, it can perform organizational tasks like commenting with great accuracy and efficiency. If you want ChatGPT to add comments to your code, your query should mention this as an explicit instruction along with the language your code is written in. For example, your query can be phrased as, “Can you write comments for this SQL code?”

ChatGPT would reply to this with a fully commented program that you can copy and share with new developers or save for documentation purposes.

  • It can create data dictionaries

Another organizational task that ChatGPT can perform with great efficiency is organizing a given dataset into a data dictionary. When simply asked to create a data dictionary, the application organizes data into different rows and columns and labels each one based on context. For example, if a specific column contains entries like “Alaska,” “Nevada,” and “Maryland,” ChatGPT automatically names the column as “States” or “State names” based on the context. It also converts column entries such as “ROI.Age<30” into simple descriptions like “Rate of interest for people below 30 years of age.” With the right amount of data and context, the program can also be made to do more complex tasks, such as creating Entity-Relationship (ER) models that abide by specified rules. The accuracy of the data dictionary ChatGPT creates depends on the accuracy of the data and prompt being provided. While the program can spot discrepancies like a primary key being used twice, it might not always be able to do so.

Using AI tools as a data analyst

AI tools like ChatGPT are changing the way data analysts approach everyday tasks like conducting research and organizing datasets. When used well, these tools can streamline development workflows and help analysts develop and train datasets much faster. The key is to be aware of the capabilities and limitations of the tool and closely monitor all development and training processes yourself. ChatGPT should only be used as a starting point for writing and optimizing code and not as a fool-proof solution.

Knowing how to develop and train data analysis models is still a crucial skill that everyone working with data intelligence should be aware of. At Turing College, we empower students to learn professional data analysis skills through a holistic course curriculum and hands-on training. Our focus on practical, peer review-based learning ensures that all our students graduate as capable professionals that are ready to join the global workforce.