Python Availability In Excel – A New Era Of Data Analysis

Published by Sohhom on

Microsoft announced on August 22, 2023 that Excel will support Python functions. What does it mean for you, a data wrangler? Let’s find out what the official docs say and how other people have reacted.

Microsoft Excel needs no introduction to anyone who has used tabular data. Spreadsheets were introduced was (and continues to be) the workhorse of data storage, manipulation and behind businesses at all levels.

Python, along with the scientific computing packages NumPy, SciPy and Pandas, is the go-to toolkit for data analysis for more complex and non-routine workflows. It has been reportedly used for optimizing click through rates of websites to plotting the data from black holes. Although these tools can do everything Excel can, the lack of visual layout and the requirement of writing code can be off-putting for many users.

In this article, we discuss some of the relevant aspects of this (probably) historical announcement and what it means for the data community.

Python and Excel – Two Cultures Of Data Analysis

Excel and Python have been long cherished within their own user communities, but they represent two distinct cultures of data analysis. They are like the yin and yang of number crunching, each with its own believers, skeptics, myths, and rituals.

On one side, the Excel crowd consists of business professionals, accountants, financial analysts. Excel is their go-to tool for everything from budgeting and financial modeling to scheduling their kid’s soccer games. If you ask them why? They would simply say, “That’s how we have always done it. If it ain’t broke, don’t fix it.” The Excel people love the ready-to-go, user-friendly interface, which can be reassuringly tangible. No codes, no debugging, just formulas and cells.

On the other side the Python gang is comprised mainly of data scientists, programmers, and every cat meme lover on the internet. They see Python as their Swiss Army knife for data manipulation, slicing through vast datasets with ease. They scoff at gigabyte files and laugh at the face of data cleaning challenges. For them, saying Python is slow is like telling a cat person that dogs make better pets. Embracing Python is akin to joining a secret society, complete with its own language, symbols, and rites of passage. Unlike Excel fans, Python enthusiasts revel in their complex and abstract interfaces. They find the grid lines of spreadsheets mind-numbingly boring and would sooner slice and dice a multidimensional array.

So why would Python users learn to use Excel? Because that’s where a lot of money is, duh. And why would Excel users need Python if they are so happy with their tool? We shall let Anaconda’s blog post about the same event explain:

Python in Excel. This marks a transformation in how Excel users and Python practitioners approach their work. 

For Excel users, this opens a new world of data analysis potential previously limited to data scientists and developers. Within your familiar spreadsheet environment, you can now harness Python’s power to perform complex statistical analyses with popular packages such as pandas and statsmodels and create sophisticated visualizations using Matplotlib and Seaborn.

Python practitioners can now marry scripts and rich visualizations with the widespread accessibility of Excel, enabling an uninterrupted workflow and making your work easier to share with colleagues who primarily use Excel.

(emphasis not original)
https://www.anaconda.com/blog/announcing-python-in-excel-next-level-data-analysis-for-all

Nitty gritty – what can be done with the Python exactly?

The announcement and other articles related to it contain quite some flowery marketing talk, like “making Python better and more accessible for everyone”.

We distill out some of the essential details below:

  • How to use Python in Excel: type “=PY(” in your Excel cell, followed by your Python code.
  • Who can use it: “Python in Excel is rolling out to Public Preview for those in the Microsoft 365 Insiders program, using the Beta Channel in Excel for Windows.”
  • Can you do lightweight machine learning, such as scikit-learn model fitting? Yes.
  • Which Python packages are available? Probably most of the Anaconda distribution, but definitely the popular ones like Pandas, Matplotlib, Seaborn etc.

Where does the Python code run? Is there a time or memory limit?

“Python in Excel runs securely on the Microsoft Cloud, with no setup required” says a heading in the Microsoft announcement.

Then they elaborate:

The Python code runs in its own hypervisor isolated container using Azure Container Instances and secure, source-built packages from Anaconda through a secure software supply chain.

Python in Excel keeps your data private by preventing the Python code from knowing who you are, and opening workbooks from the internet in further isolation within their own separate containers. Data from your workbooks can only be sent via the built-in xl() Python function, and the output of the Python code can only be returned as the result of the =PY() Excel function.

emphasis not in original
https://techcommunity.microsoft.com/t5/excel-blog/announcing-python-in-excel-combining-the-power-of-python-and-the/ba-p/3893439

Safety is of paramount importance, and Microsoft seems to be taking all precautions for ensuring that Excel users do not get hacked too easily while trying to run Python from their sheets. However, from a tinkerers point of view, we think this means users cannot install their own packages, or do other shenanigans like call shell scripts.

The Python code will get its own isolated runtime, just-in-time provisioned (?) in the Microsoft Azure cloud. The users will have limited to no control over other aspects of the environment such as CPU speeds, memory or other details. We believe that the containers will not have Internet access, so doing things like calling GPT-4 at the OpenAI API via a Python function from Excel … will not be possible.

We also expect that there will be a memory limit but it is unclear.

Reactions and Comments

Positive, excited

Negative, worried

Conclusion

Guido van Rossum, the creator of the Python programming language, said the following:

I expect that both communities will find interesting new uses in this collaboration, amplifying each partner’s abilities. When I joined Microsoft three years ago, I would not have dreamed this would be possible.

https://www.theverge.com/2023/8/22/23841167/microsoft-excel-python-integration-support

We think that despite all the proliferation of headache-inducing workflows involving Python scripts, spreadsheets, APIs of various kinds and so on, this will be an interesting development. Who knows, maybe this will mark a new era of doing data work where the two crowds of spreadsheet fans and programming aficionados will learn to get along better!