Financial Analytics Using Python and AI Tools
Overview
This course has been thoroughly revised and updated for Spring 2025.
A revolution in data analysis is underway. Analysts who downloaded data manually and used Excel for analysis must learn to use Python API and data scrapers to download exploding volumes of data and use Numpy, Pandas Dataframes, and associated packages for analysis. This course teaches how to analyze financial data using the Python API, scrapers, Pandas data frames, and related packages while leveraging cutting-edge AI tools such as ChatGPT as assistants.
MBA ACCT-GB-3328 specializations
- Accounting
- Business Analytics
- Financial Systems and Analytics
Undergrad ACCT-GB-6028 concentrations
- Accounting
- New: Computing and Data Science
Takeaways
Use Python to access data needed for financial analysis
- Learn Python APIs to access financial data
- Reading and writing files in various formats
- Understanding eXtensible Business Reporting Language (XBRL)
- Standardizing financial statements with different line items
Use Python for financial statement analysis
- Compute and plot frequently used financial statement metrics such as sales growth, margins, invested capital turnover, earnings volatility, leverage, and liquidity
- Compute basic valuation metrics: price-to-earnings ratios, price-to-book ratios
- Use statistical tools to identify trends, correlations, and outliers
Use Python to build simple financial statement models in Python
- Building a simple three-statement financial statement model
- Monte Carlo simulations to plot outcomes for a large number of scenarios
Use Python to run finance simulations
- Learn Numpy matrix operations
- Analyze properties of stock returns
- Simulate portfolio returns and plot the efficient frontier
- Simulate capital asset pricing model (CAPM)
Required Prerequisites
All ACCT courses have the core courses in Financial Accounting as a prerequisite.
- Master's students
- COR1-GB 1306: Financial Accounting and Reporting
- COR1-GB2206: Accounting (Tech & Luxury)
- ACCT-GB-2103: Financial Statement Analysis (MS in Accounting)
- Undergraduate students
- ACCT-UB.0001: Principles of Financial Accounting
Recommended Background
The course assumes you have taken a statistics course in your graduate or undergraduate program. It will also be extremely helpful if you have taken a half-semester Python course.
Materials
I write and distribute my materials. Therefore, no textbook is required, and you need not purchase anything.
ChatGPT
The course will use ChatGPT extensively. If you use VSCode, you can sign up for GitHub Copilot for free. Otherwise, I strongly recommend that you subscribe to ChatGPT 4.0.
Attendance and penalty for missing classes
Requiring attendance is necessary for several reasons. First, you incorrectly assume you can catch up on a missed class by watching a recording (if available). Videos do not engage your brain as much as a live class. Second, less than 20% of you watch the recording (if available). You are then lost in class, which provides wrong signals to me as an instructor. Third, your absence hurts class discussions. Fourth, you miss out on feedback if you do not work through the questions I pose in class. Fifth, I lose the feedback since there are fewer questions.
The policy below will be in effect only after the add/drop period.
Without mandatory attendance, attendance is often below 50%. Therefore, though I dislike doing this, I penalize absences. If you anticipate being absent for good reasons, please email me well in advance. Please enter "Excused" on the attendance sheet described below to avoid the penalty if I approve. If you miss a class due to emergencies and cannot tell me in advance, do not panic. Take care of the emergency first, and then email me. I will permit you to change the "Absent" to "Excused." But, if you miss a class without a valid reason, there is a penalty, as stated below.
For sections meeting in 150-190 minute sessions, you will lose one grade (A to A-, A- to B+, B+ to B, B to B-, and so on) for EVERY missed session unless you were explicitly excused via email. Thus, if you miss two class sessions, you will lose two grades, and so on.
For sections meeting in 75-80 minute sessions, you will lose one grade (A to A-, A- to B+, B+ to B, B to B-, and so on) for EVERY TWO missed sessions unless you were explicitly excused via email. Thus, if you miss four class sessions, you will lose two grades, and so on.
Please sit in the same seat in every class and display your name tags. For Zoom classes, you must keep your video on AT ALL TIMES. You must also have a good working headset or mic, as it is extremely rude to be inaudible and force me to ask you to repeat yourself.
After entering the class, please mark yourself present in the first 20 minutes on the OneDrive sheet (link posted on Brightspace). You will be marked absent if you are more than 20 minutes late unless it is because of factors beyond your control (traffic, subway, interviews running late). You will also be marked absent if you leave the class early unless you have my permission or get it afterward. You will get an F in the course if you are caught cheating on the attendance sheet.
Exams and Grading
There are no in-class quizzes, midterms, or final exams.
- Please read about the penalty for missing classes above.
- Assignments: 50%
- Final project: 50%
System Requirements
- You need to be in the following systems before the start of the first class:
- Albert
- NYU Brightspace
- If you are a non-Stern student, Stern automatically creates a Stern account for you when registering for a Stern course. All class emails are sent to your Stern email, not NYU email. Please forward your Stern email to your NYU email.
- Only registered students can attend. I cannot override this NYU rule.
Help and Office
- Me: dgode@stern.nyu.edu,
212-998-0021, Office: KMC 10-86.
- Teaching assistant: Please check NYU Brightspace.
Assignments
- Online Jupyter assignments.
Analytical concepts
Organization of financial data: Row versus column orientation
- The typical format of financial data: Accounts or financial statement items are row headings, while column headings are dates
- Optimal organization of Pandas data frames: Why do we transpose financial statement data so that accounts or financial statement items are in columns and dates are in rows?
Python skills
Numpy versus Pandas
- Named rows and columns
- Inhomogeneous data and missing data
- Input and output
- Merging and grouping
- Speed and memory
Pandas essentials
- Series versus data frames
- Rows, Columns, Size, Size in memory
- Head, tail, and random sampling
Understand data types and simple operators
- Numbers, strings, and dates
- Broadcast operators
- Simple vectorized operations
Manipulate rows and columns in Pandas
- Select rows and columns via slices: brackets, loc, and iloc
- Add and delete rows and columns
Topic 2: Access external financial data and save it in files
Analytical concepts
Understanding XBRL
- What is structured data?
- What is the XBRL taxonomy?
- Current financial reporting landscape and the limits of XBRL
Python skills
Application Programming Interfaces [API]
- Understanding how to access financial data using Python APIs.
- How to use external parsers for XBRL
File formats
- Reading and writing Excel files, formatting the output of Excel files
- CSV files
- Binary files
Handling dates
- Parsing dates in Python, Numpy, and Pandas
Data structures
Topic 3: ROIC and free cash flow drivers: Size, growth, margins, and NOA turnover
Analytical concepts
Sales growth
- Sequential growth
- Year-over-year growth
- Compounded annual growth rate
ROIC drivers
- Operating margin after tax and its components: Various expense ratios
- Net operating assets intensity and balance sheet subtotals such as current and non-current operating assets and liabilities, operating working capital, fixed capital, total capital, and invested capital
- Computing ROIC as net operating profit after tax divided by invested capital
Unlevered free cash flows
- Computing unlevered free cash flows
- Understanding how ROIC and growth affect unlevered free cash flows
Python skills
Loops versus vectorized and broadcast operations
- Simple row and column operations
- Python loops versus vectorized and broadcast operations in Pandas
- Why you should avoid writing loops in Pandas
- How to avoid loops using lead-lag differences
Topic 4: Plotting ROIC and free cash flow drivers
Analytical concepts
Cognitive factors
- What are the design principles for displaying quantitative information? We will use the guidelines in Edward Tufte's book “Visual Display of Quantitative Information.”
Peer company analysis
- Comparing and plotting sales, sales growth, expense ratios, and net operating asset ratios for a selected company and its peers
Python skills
Types of charts
- Lines, bars, scatter charts, histograms, area charts
- Dual axis charts
Pandas plotting
- Concise Pandas plotting commands
- State-machine approach
Full power of Matplotlib plotting
- Object-oriented approach for Matplotlib plots
- Customizing charts
Topic 5: Discount rates, time value of money, loans, and bonds
Analytical concepts
Time value functions
- Compute present value and future value
- Infer internal rate of return
- Compute installment payments
Simple financial instruments
- Bonds
- Loan amortization tables
Python skills
Numpy
- Limitations of numpy_financial
- Bonds
- Loan amortization tables
Date manipulation in Python
- Bonds
- Loan amortization tables
XLSXWriter
- Bonds
- Loan amortization tables
scipy.optimize
- Bonds
- Loan amortization tables
Topic 6: How business risk raises discount rate
Analytical concepts
Operating leverage and business risk
- Business cycles and sales variability
- Operating leverage and earnings variability
- Opex versus capex commodities
Identifying time series patterns
Identifying discrete events
- Restructurings
- Acquisitions and dispositions
Python skills
Matplotlib
- Visualizing trends and outliers
Statmodels
- Using statistical functions in Statmodels
Sci-Kit Learn
- Machine learning with Sci-Kit Learn
Topic 7: Business risk drivers: Cyclicality and seasonality
Analytical concepts
Statistical techniques
- A simple and brief introduction to time series analysis of financial statement data
Python skills
Introduction to statistical packages for time series analysis
- A brief overview of the following packages:
Package |
Primary Purpose |
Pandas |
Data manipulation, resampling, rolling averages |
Matplotlib |
Visualization of time series trends |
Seaborn |
Enhanced plotting for time series analysis |
Statsmodels |
Time series decomposition, ARIMA, SARIMA |
SciPy |
Fourier analysis for detecting cycles |
Prophet |
Forecasting with strong seasonal patterns |
TensorFlow |
Deep learning for complex cyclical and seasonal patterns |
PyCaret |
Automated machine learning for time series forecasting |
Topic 8: Liquidity, leverage, and ROE
Analytical concepts
Liquidity
- Financial assets to sales
Leverage
- Debt/EBITA, Debt/EBIT
- Debt/Equity
Return on equity
- Net income/Equity
- Sharpe ratio
Python skills
Advanced plotting with Matplotlib and Plotly
- Visualizing the higher volatility of ROE vis-a-vis ROIC due to leverage
- Making interactive plots with Plotly
Statmodels
- Visualizing the higher volatility of ROE vis-a-vis ROIC due to leverage
- Making interactive plots with Plotly
Topic 9: Three-statement model of growth and ROIC
Analytical concepts
Three-statement financial model
- Income statement inputs: Size, growth, and margins
- Balance sheet operating inputs: Net operating asset intensity
- Operating working capital intensity
- Fixed capital intensity
- Balance sheet financial inputs: Liquidity and leverage
- Business risk, unlevered and levered cost of capital
Monte Carlo simulations
- Monte Carlo simulations to plot outcomes for a large number of scenarios
- Demonstrating the advantage of Python over Excel
Python skills
Comparing Numpy, Pandas data series, and Pandas data frames
- Implementing financial statement models using Numpy, Pandas data series, and Pandas data frames
Challenges of developing iterative models in Python
- Ease of developing iterative models in Excel
- Difficulty in developing iterative models in Python
Topic 10: Valuation multiples and stock prices
Analytical concepts
Stock returns
- Dividend yield versus capital gain
- Using adjusted stock prices to measure total return
- Arithmetic returns versus geometric returns
- Cumulative returns
Macroeconomic effects: Quantifying systematic business risk
- Correlations among stock returns
- Measuring beta
Identifying discrete events
- Identifying days or weeks with high volatility
Key valuation multiples
- Price-to-book and price-to-earnings ratios
Python skills
Using Python for regressions
- scipy.stats
- statsmodels.api
- statsmodels.formula
Topic 11: Simulating portfolio returns
Analytical concepts
Simulating correlated stock returns
- Variance-covariance matrix
- Portfolio returns with correlated securities
- Possible returns and the efficient frontier
Optimal portfolios and capital asset pricing model (CAPM)
- Quadratic utility functions
- Covariance
- Beta
- Verifying the CAPM relationships
Python skills
Numpy matrices
- Numpy matrix manipulations