fractal-rl

v0.0.2

Published

7 months ago

Fractal-RL is a lightweight and easy-to-use reinforcement learning library for TypeScript and JavaScript. It provides implementations of popular deep reinforcement learning algorithms, including Deep Q-Network (DQN) and Proximal Policy Optimization (PPO).

Downloads

0High
0Medium
0Low

$fractal-solutions$ fractal-solutions

Fractal-RL

Installation

To install the library, use npm or bun:

npm install fractal-rl

bun add fractal-rl

Deep Q-Network (DQN) Agent

The DQN agent is a value-based reinforcement learning algorithm that is well-suited for environments with discrete action spaces. The implementation in Fractal-RL is designed to be easy to use and configure.

DQN Usage Example

Here is a basic example of how to use the DQNAgent to solve the CartPole environment:

import { DQNAgent } from 'fractal-rl';

// Define the environment
class CartPoleEnvironment {
    // ... (environment implementation)
}

// Configure the DQN agent
const config = {
    stateDim: 4,
    actionDim: 2,
    learningRate: 0.01,
    gamma: 0.99,
    epsilonStart: 1.0,
    epsilonEnd: 0.01,
    epsilonDecay: 0.9995,
    batchSize: 64,
    targetUpdateFrequency: 50,
    replayBufferSize: 10000,
    dropoutRate: 0.0,
    l2RegularizationRate: 0.0001,
    clipNorm: 1.0,
    hiddenActivation: 'leakyRelu',
};

const agent = new DQNAgent(config);
const env = new CartPoleEnvironment();

// Training loop
for (let episode = 0; episode < 5000; episode++) {
    let state = env.reset();
    let done = false;
    while (!done) {
        const action = agent.selectAction(state);
        const { state: nextState, reward, done: envDone } = env.step(action);
        agent.remember(state, action, reward, nextState, envDone);
        if (agent.replayBuffer.size() > config.batchSize) {
            agent.learn();
        }
        state = nextState;
        done = envDone;
    }
}

DQN Hyperparameters

stateDim: The dimension of the state space.
actionDim: The dimension of the action space.
learningRate: The learning rate for the optimizer.
gamma: The discount factor for future rewards.
epsilonStart: The initial value of epsilon for the epsilon-greedy exploration strategy.
epsilonEnd: The final value of epsilon.
epsilonDecay: The decay rate of epsilon.
batchSize: The size of the mini-batch for training.
targetUpdateFrequency: The frequency (in steps) at which the target network is updated.
replayBufferSize: The maximum size of the replay buffer.
dropoutRate: The dropout rate for regularization.
l2RegularizationRate: The L2 regularization rate.
clipNorm: The maximum norm for gradient clipping.
hiddenActivation: The activation function for the hidden layers.

Proximal Policy Optimization (PPO) Agent

The PPO agent is a policy-based reinforcement learning algorithm that is well-suited for environments with continuous or discrete action spaces. The implementation in Fractal-RL is designed to be easy to use and configure.

PPO Usage Example

Here is a basic example of how to use the PPOAgent to solve the CartPole environment:

import { PPOAgent } from 'fractal-rl';

// Define the environment
class CartPoleEnvironment {
    // ... (environment implementation)
}

// Configure the PPO agent
const config = {
    stateDim: 4,
    actionDim: 2,
    learningRateActor: 0.0003,
    learningRateCritic: 0.0001,
    gamma: 0.95,
    lambda: 0.9,
    epsilon: 0.02,
    ppoEpochs: 5,
    miniBatchSize: 64,
    dropoutRate: 0.0,
    l2RegularizationRate: 0.000001,
    clipNorm: 1.0,
    hiddenActivation: 'leakyRelu',
};

const agent = new PPOAgent(config);
const env = new CartPoleEnvironment();

// Training loop
for (let episode = 0; episode < 5000; episode++) {
    let state = env.reset();
    let done = false;
    while (!done) {
        const { action, logProb, value } = agent.selectAction(state);
        const { state: nextState, reward, done: envDone } = env.step(action);
        agent.remember(state, action, reward, nextState, envDone, logProb, value);
        state = nextState;
        done = envDone;
    }
    agent.update();
}

PPO Hyperparameters

stateDim: The dimension of the state space.
actionDim: The dimension of the action space.
learningRateActor: The learning rate for the actor network.
learningRateCritic: The learning rate for the critic network.
gamma: The discount factor for future rewards.
lambda: The lambda parameter for Generalized Advantage Estimation (GAE).
epsilon: The clipping parameter for the PPO objective function.
ppoEpochs: The number of epochs to run when updating the policy.
miniBatchSize: The size of the mini-batch for training.
dropoutRate: The dropout rate for regularization.
l2RegularizationRate: The L2 regularization rate.
clipNorm: The maximum norm for gradient clipping.
hiddenActivation: The activation function for the hidden layers.

PPO CartPole GUI

To run the PPO CartPole GUI, goto:

https://fractal-solutions.github.io/fractal-RL/

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme