超参数优化工具笔记 Tune with PyTorch Quick Start+基础概念

超参数优化工具

https://docs.ray.io/en/master/tune/index.html
注:为了可读性,将代码放到了参考与更多部分(并附上代码来源链接与测试结果)

install

pip install ray[tune] -i https://pypi.tuna.tsinghua.edu.cn/simple # pip install ray 安装总是time out 
# C:\ProgramData\Anaconda3\envs\torch\lib\site-packages\ray\_private\compat.py", line 14, in patch_redis_empty_recv
#    import redis     ModuleNotFoundError: No module named 'redis' redis 是一个 Key-Value 数据库
pip install redis -i https://pypi.tuna.tsinghua.edu.cn/simple

Quick Start Example

超参数优化工具笔记 Tune with PyTorch Quick Start+基础概念
       程序为a和b定义一个搜索空间,并让Ray Tune在该空间中搜索最优值。

手写数据集

       和上一个例子类似,不过搜索的数据是从分布里选取的。(更进一步的还有早停和贝叶斯搜索方法与加载搜索结果的方法)

关键概念

项目解释更多
SerchSpace搜索空间
Trainable类似目标函数# Pass in a Trainable class or function, along with a search space “config”. tune.run(trainable, config={“a”: 2, “b”: 4})
Search Algorithms搜索函数
Scjesulers训练策略tune.run(trainable, config, num_samples, scheduler)
Analyses结果分析

使用指导略

How to use Tune with PyTorch(ray的官方文档)

pytorch tune
auto_pytorch
Skopt

参考与更多

# 安装ray https://blog.csdn.net/weixin_45211921/article/details/117452963
pip3 install pytest-runner
pip3 install ray
pip install ray[default]# windows 下需要多的步骤

Quick Start Example

# https://docs.ray.io/en/master/tune/index.html
from ray import tune

# 1. Define an objective function.
def objective(config):
    score = config["a"] ** 2 + config["b"]
    return {"score": score}

# 2. Define a search space.
search_space = {
    "a": tune.grid_search([0.001, 0.01, 0.1, 1.0]),
    "b": tune.choice([1, 2, 3]),
}

# 3. Start a Tune run and print the best result.
analysis = tune.run(objective, config=search_space) # 运行次数 num_samples :tune.run(trainable, config={"a": 2, "b": 4}, num_samples=10)
print(analysis.get_best_config(metric="score", mode="min")) # 度量标准和模式

结果分析

# https://docs.ray.io/en/latest/tune/key-concepts.html
analysis = tune.run(trainable,config=config,metric="score",mode="min",search_alg=BayesOptSearch(random_search_steps=4),stop={"training_iteration": 20},)
# 获取最佳结果
best_trial = analysis.best_trial  # Get best trial
best_config = analysis.best_config  # Get best trial's hyperparameters
best_logdir = analysis.best_logdir  # Get best trial's logdir
best_checkpoint = analysis.best_checkpoint  # Get best trial's best checkpoint
best_result = analysis.best_result  # Get best trial's last results
best_result_df = analysis.best_result_df  # Get best result as pandas dataframe

# 或者直接获取所有的结果用于分析
# Get a dataframe with the last results for each trial
df_results = analysis.results_df
# Get a dataframe of results for a specific score or mode
df = analysis.dataframe(metric="score", mode="max")

运行结果

== Status == 略去状态信息
#  tune.run 将执行所有 trials (除非出错). 
+-----------------------+----------+----------------+-------+-----+
| Trial name            | status   | loc            |     a |   b |
|-----------------------+----------+----------------+-------+-----|
| objective_47380_00000 | RUNNING  | 127.0.0.1:5320 | 0.001 |   3 |
| objective_47380_00001 | PENDING  |                | 0.01  |   3 |
| objective_47380_00002 | PENDING  |                | 0.1   |   3 |
| objective_47380_00003 | PENDING  |                | 1     |   1 |
+-----------------------+----------+----------------+-------+-----+


+-----------------------+------------+----------------+-------+-----+--------+------------------+---------+
| Trial name            | status     | loc            |     a |   b |   iter |   total time (s) |   score |
|-----------------------+------------+----------------+-------+-----+--------+------------------+---------|
| objective_47380_00001 | RUNNING    | 127.0.0.1:8292 | 0.01  |   3 |      1 |      0           |  3.0001 |
| objective_47380_00002 | RUNNING    | 127.0.0.1:8316 | 0.1   |   3 |        |                  |         |
| objective_47380_00003 | PENDING    |                | 1     |   1 |        |                  |         |
| objective_47380_00000 | TERMINATED | 127.0.0.1:5320 | 0.001 |   3 |      1 |      0.000999928 |  3      |
+-----------------------+------------+----------------+-------+-----+--------+------------------+---------+



+-----------------------+------------+----------------+-------+-----+--------+------------------+---------+
| Trial name            | status     | loc            |     a |   b |   iter |   total time (s) |   score |
|-----------------------+------------+----------------+-------+-----+--------+------------------+---------|
| objective_47380_00000 | TERMINATED | 127.0.0.1:5320 | 0.001 |   3 |      1 |      0.000999928 |  3      |
| objective_47380_00001 | TERMINATED | 127.0.0.1:8292 | 0.01  |   3 |      1 |      0           |  3.0001 |
| objective_47380_00002 | TERMINATED | 127.0.0.1:8316 | 0.1   |   3 |      1 |      0           |  3.01   |
| objective_47380_00003 | TERMINATED | 127.0.0.1:8756 | 1     |   1 |      1 |      0           |  2      |
+-----------------------+------------+----------------+-------+-----+--------+------------------+---------+
12.21 seconds (11.50 seconds for the tuning loop)# 环境 虚拟机 win7 8GB 内存 无gpu 
{'a': 1.0, 'b': 1}

Process finished with exit code 0

手写数据集

# https://docs.ray.io/en/latest/tune/getting-started.html#tune-tutorial


import numpy as np
import torch
import torch.optim as optim
import torch.nn as nn
from torchvision import datasets, transforms
from torch.utils.data import DataLoader
import torch.nn.functional as F

from ray import tune
from ray.tune.schedulers import ASHAScheduler


class ConvNet(nn.Module):
    def __init__(self):
        super(ConvNet, self).__init__()
        # In this example, we don't change the model architecture
        # due to simplicity.
        self.conv1 = nn.Conv2d(1, 3, kernel_size=3)
        self.fc = nn.Linear(192, 10)

    def forward(self, x):
        x = F.relu(F.max_pool2d(self.conv1(x), 3))
        x = x.view(-1, 192)
        x = self.fc(x)
        return F.log_softmax(x, dim=1)





# Change these values if you want the training to run quicker or slower.
EPOCH_SIZE = 1 # 512
TEST_SIZE = 1 # 256

def train(model, optimizer, train_loader):
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        # We set this just for the example to run quickly.
        if batch_idx * len(data) > EPOCH_SIZE:
            return
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()
        output = model(data)
        loss = F.nll_loss(output, target)
        loss.backward()
        optimizer.step()


def test(model, data_loader):
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model.eval()
    correct = 0
    total = 0
    with torch.no_grad():
        for batch_idx, (data, target) in enumerate(data_loader):
            # We set this just for the example to run quickly.
            if batch_idx * len(data) > TEST_SIZE:
                break
            data, target = data.to(device), target.to(device)
            outputs = model(data)
            _, predicted = torch.max(outputs.data, 1)
            total += target.size(0)
            correct += (predicted == target).sum().item()

    return correct / total


def train_mnist(config):
    # Data Setup
    mnist_transforms = transforms.Compose(
        [transforms.ToTensor(),
         transforms.Normalize((0.1307, ), (0.3081, ))])

    train_loader = DataLoader(
        datasets.MNIST("~/data", train=True, download=True, transform=mnist_transforms),
        batch_size=64,
        shuffle=True)
    test_loader = DataLoader(
        datasets.MNIST("~/data", train=False, transform=mnist_transforms),
        batch_size=64,
        shuffle=True)

    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

    model = ConvNet()
    model.to(device)

    optimizer = optim.SGD(
        model.parameters(), lr=config["lr"], momentum=config["momentum"])
    for i in range(10):
        train(model, optimizer, train_loader)
        acc = test(model, test_loader)

        # Send the current training result back to Tune
        tune.report(mean_accuracy=acc)

        if i % 5 == 0:
            # This saves the model to the trial directory
            torch.save(model.state_dict(), "./model.pth")


search_space = {
    "lr": tune.sample_from(lambda spec: 10 ** (-10 * np.random.rand())),
    "momentum": tune.uniform(0.1, 0.9),
}

# Uncomment this to enable distributed execution
# `ray.init(address="auto")`

# Download the dataset first 等待从http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz下载数据
datasets.MNIST("~/data", train=True, download=True)

analysis = tune.run(train_mnist, config=search_space) #TODO


dfs = analysis.trial_dataframes
[d.mean_accuracy.plot() for d in dfs.values()]

贝叶斯搜索

# 首先执行 pip install bayesian-optimization
from ray.tune.suggest.bayesopt import BayesOptSearch

# Define the search space
search_space = {"a": tune.uniform(0, 1), "b": tune.uniform(0, 20)}

algo = BayesOptSearch(random_search_steps=4)

tune.run(
    trainable,
    config=search_space,
    metric="score",
    mode="min",
    search_alg=algo,
    stop={"training_iteration": 20},
)

更多搜索算法或者实现自己的搜索算法

SearchAlgorithm

Summary

Website

Code Example

Random search/grid search

Random search/grid search

tune_basic_example

AxSearch

Bayesian/Bandit Optimization

[Ax]

AX Example

BlendSearch

Blended Search

[Bs]

Blendsearch Example

CFO

Cost-Frugal hyperparameter Optimization

[Cfo]

CFO Example

DragonflySearch

Scalable Bayesian Optimization

[Dragonfly]

Dragonfly Example

SkoptSearch

Bayesian Optimization

[Scikit-Optimize]

SkOpt Example

HyperOptSearch

Tree-Parzen Estimators

[HyperOpt]

Running Tune experiments with HyperOpt

BayesOptSearch

Bayesian Optimization

[BayesianOptimization]

BayesOpt Example

TuneBOHB

Bayesian Opt/HyperBand

[BOHB]

BOHB Example

NevergradSearch

Gradient-free Optimization

[Nevergrad]

Nevergrad Example

OptunaSearch

Optuna search algorithms

[Optuna]

Optuna Example

ZOOptSearch

Zeroth-order Optimization

[ZOOpt]

ZOOpt Example

SigOptSearch>

Closed source

[SigOpt]

SigOpt Example

HEBOSearch

Heteroscedastic Evolutionary Bayesian Optimization

[HEBO]

HEBO Example

日程表与早停 Schedulers

如果未指定调度程序,Tune 将默认使用先进先出 (FIFO) 调度程序,该调度程序仅按您的搜索算法选择的试验按照它们被挑选的顺序通过,并且不会早停。

from ray.tune.schedulers import HyperBandScheduler

# Create HyperBand scheduler and minimize the score
hyperband = HyperBandScheduler(metric="score", mode="max")

config = {"a": tune.uniform(0, 1), "b": tune.uniform(0, 1)}

tune.run(trainable, config=config, num_samples=20, scheduler=hyperband)

运行结果

(pid=7320) 
== Status ==
Current time: 2022-**-** **:**:** (running for 00:04:28.04)
Memory usage on this node: 7.0/* GiB
Using FIFO scheduling algorithm.
Resources requested: 1.0/2 CPUs, 0/0 GPUs, 0.0/0.82 GiB heap, 0.0/0.41 GiB objects
Result logdir: C:\Users\Administrator\ray_results\train_mnist_2022-05-25_10-04-34
Number of trials: 1/1 (1 RUNNING)
+-------------------------+----------+----------------+-------------+------------+
| Trial name              | status   | loc            |          lr |   momentum |
|-------------------------+----------+----------------+-------------+------------|
| train_mnist_05ccd_00000 | RUNNING  | 127.0.0.1:7780 | 5.27504e-09 |   0.653214 |
+-------------------------+----------+----------------+-------------+------------+


Result for train_mnist_05ccd_00000:
  date: 2022-**-**_**-**-** # 时间戳
  done: false
  iterations_since_restore: 1
  mean_accuracy: 0.03125
  node_ip: 127.0.0.1
  training_iteration: 1
  trial_id: 05ccd_00000

Result for train_mnist_05ccd_00000:
  date: 2022-**-**_**-**-**
  done: true
  experiment_id: 20149107d1784b8e83e2f3139e19e512
  experiment_tag: 0_lr=5.275e-09,momentum=0.65321
  hostname: YYDN-20211110NL
  iterations_since_restore: 10
  mean_accuracy: 0.109375
  node_ip: 127.0.0.1
  warmup_time: 0.006000518798828125
  
== Status ==
Current time: 2022-05-25 10:09:05 (running for 00:04:29.67)
Memory usage on this node: 7.1/8.0 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/2 CPUs, 0/0 GPUs, 0.0/0.82 GiB heap, 0.0/0.41 GiB objects
Number of trials: 1/1 (1 TERMINATED)
+-------------------------+------------+----------------+-------------+------------+----------+--------+------------------+
| Trial name              | status     | loc            |          lr |   momentum |      acc |   iter |   total time (s) |
|-------------------------+------------+----------------+-------------+------------+----------+--------+------------------|
| train_mnist_05ccd_00000 | TERMINATED | 127.0.0.1:7780 | 5.27504e-09 |   0.653214 | 0.109375 |     10 |          1.00106 |
+-------------------------+------------+----------------+-------------+------------+----------+--------+------------------+

2022-05-25 10:09:05,136	INFO tune.py:701 -- Total run time: 271.33 seconds (269.49 seconds for the tuning loop).
Process finished with exit code -1

关键概念

搜索空间

# https://docs.ray.io/en/latest/tune/api_docs/search_space.html#tune-sample-docs
config = {
    # Sample a float uniformly between -5.0 and -1.0
    "uniform": tune.uniform(-5, -1),

    # Sample a float uniformly between 3.2 and 5.4,
    # rounding to increments of 0.2
    "quniform": tune.quniform(3.2, 5.4, 0.2),

    # Sample a float uniformly between 0.0001 and 0.01, while
    # sampling in log space
    "loguniform": tune.loguniform(1e-4, 1e-2),

    # Sample a float uniformly between 0.0001 and 0.1, while
    # sampling in log space and rounding to increments of 0.00005
    "qloguniform": tune.qloguniform(1e-4, 1e-1, 5e-5),

    # Sample a random float from a normal distribution with
    # mean=10 and sd=2
    "randn": tune.randn(10, 2),

    # Sample a random float from a normal distribution with
    # mean=10 and sd=2, rounding to increments of 0.2
    "qrandn": tune.qrandn(10, 2, 0.2),

    # Sample a integer uniformly between -9 (inclusive) and 15 (exclusive)
    "randint": tune.randint(-9, 15),

    # Sample a random uniformly between -21 (inclusive) and 12 (inclusive (!))
    # rounding to increments of 3 (includes 12)
    "qrandint": tune.qrandint(-21, 12, 3),

    # Sample a integer uniformly between 1 (inclusive) and 10 (exclusive),
    # while sampling in log space
    "lograndint": tune.lograndint(1, 10),

    # Sample a integer uniformly between 1 (inclusive) and 10 (inclusive (!)),
    # while sampling in log space and rounding to increments of 2
    "qlograndint": tune.qlograndint(1, 10, 2),

    # Sample an option uniformly from the specified choices
    "choice": tune.choice(["a", "b", "c"]),

    # Sample from a random function, in this case one that
    # depends on another value from the search space
    "func": tune.sample_from(lambda spec: spec.config.uniform * 0.01),

    # Do a grid search over these values. Every value will be sampled
    # `num_samples` times (`num_samples` is the parameter you pass to `tune.run()`)
    "grid": tune.grid_search([32, 64, 128])
}

Tune: A Research Platform for Distributed Model Selection and Training

更多

超参数优化工具

pytorch tune:这篇文章是对channel进行了优化
auto_pytorch
Skopt

文章出处登录后可见!

已经登录?立即刷新

共计人评分,平均

到目前为止还没有投票!成为第一位评论此文章。

(0)
社会演员多的头像社会演员多普通用户
上一篇 2022年5月27日 下午4:22
下一篇 2022年5月27日 下午4:25

相关推荐