Back to Projects
Reinforcement Learning
2025

Reinforcement Learning for Portfolio Management

Dynamic asset allocation research using DDPG, PPO, and embedding-driven RL on Indian equities.

723.5% total return
Sharpe 1.78
NSE 2012 - 2023
Overview

I benchmarked multiple reinforcement learning strategies for portfolio management under transaction costs and market friction, then compared them with passive baselines on long-horizon NSE data.

Language DNA

Python

The project is experimentation-heavy, so Python drives the training loop, environment logic, config-driven workflows, and result visualization. That makes the work feel closer to a research lab than a trading dashboard.

PyTorchTensorFlow
DRL portfolio-management loop
1

The project is structured as an autonomous trading-agent system trained on historical market data.

2

It supports DDPG, PPO, and DERL across a simulated market environment with configurable experiments.

3

Agents, data preparation, config, training, testing, and results are separated cleanly so experiments can be rerun and extended.

Algorithms and setup

Implements DDPG for continuous-action optimization, PPO for policy stability, and DERL for embedding-driven state representation.

Uses configurable experiments through a central config file instead of hardcoded runs.

Includes data handling, training, testing, and result visualization workflows.

Research framing

Focuses on maximizing returns while handling financial risk and market adaptation.

Treats the work as applied AI research rather than a deploy-now trading bot.

Keeps results and plots as first-class artifacts for analysis.

System design

The architecture is extensible enough to support additional agents without reworking the full project structure.

Results are treated as first-class outputs through both tabular metrics and visual performance plots.

Product capabilities

Benchmarked DDPG, PPO, and dynamic embedding RL approaches on Indian equity allocation.

Modeled proportional transaction costs and realistic market friction.

Outperformed the passive benchmark with strong return and Sharpe outcomes.

Workflow

Experiment path

1

Download and clean historical market data.

2

Train an RL agent variant under the chosen configuration.

3

Evaluate the strategy on held-out periods and compare metrics against baselines.

Execution model

The system is built around config-driven execution, agent comparison, results storage, and repeatable experimentation.

It is framed as applied research, which keeps the focus on evaluation quality rather than pretending to be a live trading product.

Actions
Case study

Environment design

The portfolio-management environment was designed to stay anchored in market friction instead of assuming idealized trading conditions. Transaction costs, changing asset weights, and longer-horizon evaluation make the setup more realistic than a clean-room benchmark. That framing matters because it puts the emphasis on robust strategy behavior rather than headline returns alone.

What it shows

The project brings together sequential decision-making, state representation, and disciplined evaluation in a finance context that is concrete enough to be judged on outcomes. Comparing multiple RL approaches under the same market setting makes the tradeoffs easier to read, especially around stability, return quality, and adaptability. It works as both an applied AI study and a well-structured experimental system.