Pandas to PySpark conversion — how ChatGPT saved my day!

Prosenjit Chakraborty
6 min readMar 28, 2023

I recently assisted a team in delivering an MVP to convert a portion of an existing business process from Python/Pandas to PySpark to run on cloud managed services. Unfortunately, I only had one day to convert a complex piece of functionality, and my PySpark coding skills were rusty. Halfway through, I realized that I had only a few hours left to deliver the converted code to the team. That’s when I decided to give ChatGPT a try, as I had heard about its capability to convert programming languages.

To give a short background of the process, we were getting streaming data captured by sensors, a small sample has been provided below. The functionality I was converting from Pandas/Python to PySpark was to get the streaming data and aggregate in different granularities for some downstream processing.

Code block 1: Median value calculation based on hourly data frequency for each sites

Me: “Can you please convert the following pandas code into pyspark?

ChatGPT: “Sure, I can help you convert the Pandas code into PySpark code. Please share the Pandas code so

--

--