环境声明

  • Python版本:Python 3.10+
  • PyTorch版本:PyTorch 2.0+
  • 开发工具:PyCharm 或 VS Code
  • 操作系统:Windows / macOS / Linux (通用)
  • 额外依赖torch>=2.0.0, numpy>=1.24.0, optuna>=3.0.0

学习目标和摘要

学习目标

  1. 理解神经架构搜索(NAS)的核心概念与搜索空间设计
  2. 掌握基于强化学习、进化算法和可微分方法的NAS技术
  3. 学会使用权重共享和超网技术加速架构搜索
  4. 了解AutoML工具的实际应用
  5. 能够使用PyTorch实现简化版NAS算法

文章摘要:神经架构搜索(NAS)是自动化机器学习(AutoML)的核心技术,旨在自动发现最优神经网络架构。本章将系统讲解NAS的搜索空间设计、主流搜索策略(强化学习、进化算法、DARTS)、权重共享技术(Once-for-All、BigNAS)以及2025年最新进展(AutoFormer),并提供完整的PyTorch实现代码。


1. NAS概述与搜索空间

1.1 什么是神经架构搜索

神经架构搜索(Neural Architecture Search, NAS)是一种自动化设计神经网络架构的技术。传统上,神经网络架构的设计依赖于人类专家的经验和直觉,需要大量的试错和调优。NAS的目标是让算法自动在预定义的搜索空间中寻找最优的网络结构。

NAS的核心思想可以类比为:如果深度学习是教计算机"学习",那么NAS就是教计算机"学会如何学习"——即自动发现最适合特定任务的网络结构。

1.2 NAS的三要素

一个完整的NAS系统包含三个核心组件:

组件 描述 示例
搜索空间 定义所有可能的架构集合 链式结构、细胞结构
搜索策略 如何在搜索空间中探索 强化学习、进化算法、梯度下降
性能评估 如何评估候选架构的质量 验证集准确率、参数量、FLOPs

1.3 搜索空间设计

1.3.1 链式结构搜索空间

链式结构是最简单的搜索空间,网络由一系列层顺序连接而成。每层的选择包括:

  • 卷积层:核大小(3x3、5x5)、通道数(32、64、128)
  • 池化层:最大池化、平均池化
  • 跳跃连接:是否添加残差连接
  • 激活函数:ReLU、Sigmoid、Tanh
# 链式结构搜索空间示例
import torch
import torch.nn as nn

class ChainSearchSpace:
    """链式结构搜索空间定义"""
    
    def __init__(self):
        # 可选择的操作
        self.operations = [
            'conv_3x3',      # 3x3卷积
            'conv_5x5',      # 5x5卷积
            'dconv_3x3',     # 3x3空洞卷积
            'max_pool',      # 最大池化
            'avg_pool',      # 平均池化
            'skip_connect',  # 跳跃连接
            'none'           # 无连接
        ]
        
        # 可选择的通道数
        self.channels = [16, 32, 64, 128]
        
        # 网络深度范围
        self.min_depth = 3
        self.max_depth = 8
    
    def sample_architecture(self):
        """随机采样一个架构"""
        import random
        depth = random.randint(self.min_depth, self.max_depth)
        architecture = []
        
        for i in range(depth):
            op = random.choice(self.operations)
            ch = random.choice(self.channels)
            architecture.append({'layer': i, 'op': op, 'channels': ch})
        
        return architecture
1.3.2 细胞结构搜索空间

细胞结构(Cell-based)搜索空间是更高级的设计,将网络分解为重复的"细胞"(Cell)或"块"(Block)。每个细胞内部有复杂的连接结构。

细胞结构的优势:

  • 模块化:相同的细胞可以重复使用
  • 可迁移性:在CIFAR-10上搜索的细胞可迁移到ImageNet
  • 搜索效率高:只需搜索细胞内部结构
# 细胞结构搜索空间
class CellSearchSpace:
    """细胞结构搜索空间(类似DARTS)"""
    
    def __init__(self, num_nodes=4):
        self.num_nodes = num_nodes  # 细胞内的节点数
        
        # 候选操作集合
        self.primitives = [
            'none',
            'max_pool_3x3',
            'avg_pool_3x3',
            'skip_connect',
            'sep_conv_3x3',   # 可分离卷积
            'sep_conv_5x5',
            'dil_conv_3x3',   # 空洞卷积
            'dil_conv_5x5'
        ]
    
    def get_num_edges(self):
        """计算细胞内的边数"""
        # 对于N个节点,边数为 2 + 3 + ... + N = (N+2)(N-1)/2
        n = self.num_nodes
        return (n + 2) * (n - 1) // 2
    
    def sample_cell(self):
        """随机采样一个细胞结构"""
        import random
        num_edges = self.get_num_edges()
        cell = []
        
        for edge in range(num_edges):
            # 每条边选择一个操作
            op = random.choice(self.primitives)
            cell.append(op)
        
        return cell

2. 基于强化学习的NAS

2.1 RNN控制器

强化学习NAS使用一个RNN作为控制器(Controller)来生成网络架构描述。控制器输出一系列决策(如选择什么操作、多少通道),这些决策定义了一个神经网络架构。

核心思想

  • 控制器生成架构 -> 训练该架构 -> 获得验证准确率作为奖励 -> 更新控制器策略
import torch
import torch.nn as nn
import torch.nn.functional as F

class ControllerRNN(nn.Module):
    """RNN控制器:生成网络架构描述"""
    
    def __init__(self, num_layers=6, num_ops=7, hidden_size=64):
        super(ControllerRNN, self).__init__()
        
        self.num_layers = num_layers  # 网络层数
        self.num_ops = num_ops        # 可选操作数
        self.hidden_size = hidden_size
        
        # 嵌入层
        self.embedding = nn.Embedding(num_ops, hidden_size)
        
        # LSTM控制器
        self.lstm = nn.LSTMCell(hidden_size, hidden_size)
        
        # 输出层:预测每个位置的操作
        self.ops_classifier = nn.Linear(hidden_size, num_ops)
        
        # 初始化隐藏状态
        self.init_hidden = nn.Parameter(
            torch.zeros(1, hidden_size), requires_grad=True
        )
        self.init_cell = nn.Parameter(
            torch.zeros(1, hidden_size), requires_grad=True
        )
    
    def forward(self, batch_size=1):
        """生成一个架构描述"""
        # 初始化隐藏状态
        hidden = self.init_hidden.expand(batch_size, -1)
        cell_state = self.init_cell.expand(batch_size, -1)
        
        # 存储所有层的操作选择
        log_probs = []
        actions = []
        
        # 输入起始标记(用0表示)
        inputs = torch.zeros(batch_size, dtype=torch.long)
        if next(self.parameters()).is_cuda:
            inputs = inputs.cuda()
        
        for layer in range(self.num_layers):
            # 嵌入输入
            embed = self.embedding(inputs)
            
            # LSTM前向传播
            hidden, cell_state = self.lstm(embed, (hidden, cell_state))
            
            # 预测操作
            logits = self.ops_classifier(hidden)
            probs = F.softmax(logits, dim=-1)
            
            # 采样操作
            action = torch.multinomial(probs, 1).squeeze(1)
            
            # 计算对数概率(用于策略梯度)
            log_prob = F.log_softmax(logits, dim=-1)
            log_prob = log_prob.gather(1, action.unsqueeze(1)).squeeze(1)
            
            log_probs.append(log_prob)
            actions.append(action)
            
            # 下一个输入是当前选择的操作
            inputs = action
        
        return torch.stack(actions, dim=1), torch.stack(log_probs, dim=1)

2.2 策略梯度训练

控制器使用REINFORCE算法(策略梯度)进行训练。奖励是生成的网络在验证集上的准确率。

class ReinforceTrainer:
    """使用REINFORCE算法训练控制器"""
    
    def __init__(self, controller, baseline=0.0):
        self.controller = controller
        self.baseline = baseline  # 基线,用于减小方差
        self.optimizer = torch.optim.Adam(controller.parameters(), lr=0.001)
    
    def train_step(self, rewards, log_probs):
        """
        参数:
            rewards: [batch_size],每个架构的验证准确率
            log_probs: [batch_size, num_layers],每个操作的对数概率
        """
        # 计算损失: -E[R * log P(a)]
        # 使用基线减小方差
        advantages = rewards - self.baseline
        
        # 损失 = -sum(log_prob * advantage)
        loss = -(log_probs.sum(dim=1) * advantages).mean()
        
        # 更新基线(移动平均)
        self.baseline = 0.9 * self.baseline + 0.1 * rewards.mean().item()
        
        # 反向传播
        self.optimizer.zero_grad()
        loss.backward()
        
        # 梯度裁剪
        torch.nn.utils.clip_grad_norm_(self.controller.parameters(), 5.0)
        
        self.optimizer.step()
        
        return loss.item()

3. 可微分架构搜索(DARTS)

3.1 连续松弛

DARTS(Differentiable Architecture Search)的核心创新是将离散的架构选择松弛为连续的,从而可以使用梯度下降优化。

关键思想

  • 传统NAS:每条边选择一个操作(离散选择)
  • DARTS:每条边是所有操作的加权和(连续松弛)
class MixedOp(nn.Module):
    """混合操作:所有候选操作的加权和"""
    
    def __init__(self, C, stride, primitives):
        super(MixedOp, self).__init__()
        
        self.ops = nn.ModuleList()
        for primitive in primitives:
            op = self._create_op(primitive, C, stride)
            self.ops.append(op)
        
        # 架构参数(可学习)
        self.alphas = nn.Parameter(torch.zeros(len(primitives)))
    
    def _create_op(self, op_name, C, stride):
        """根据名称创建操作"""
        if op_name == 'none':
            return Zero(stride)
        elif op_name == 'skip_connect':
            return Identity() if stride == 1 else FactorizedReduce(C, C)
        elif op_name == 'conv_3x3':
            return ReLUConvBN(C, C, 3, stride, 1)
        elif op_name == 'conv_5x5':
            return ReLUConvBN(C, C, 5, stride, 2)
        elif op_name == 'sep_conv_3x3':
            return SepConv(C, C, 3, stride, 1)
        elif op_name == 'sep_conv_5x5':
            return SepConv(C, C, 5, stride, 2)
        else:
            raise ValueError(f"Unknown operation: {op_name}")
    
    def forward(self, x):
        """前向传播:所有操作的加权和"""
        # 使用softmax得到混合权重
        weights = F.softmax(self.alphas, dim=0)
        
        # 加权求和
        output = sum(w * op(x) for w, op in zip(weights, self.ops))
        return output


# 辅助操作定义
class ReLUConvBN(nn.Module):
    """ReLU + Conv + BN"""
    
    def __init__(self, C_in, C_out, kernel_size, stride, padding):
        super(ReLUConvBN, self).__init__()
        self.op = nn.Sequential(
            nn.ReLU(inplace=False),
            nn.Conv2d(C_in, C_out, kernel_size, stride, padding, bias=False),
            nn.BatchNorm2d(C_out)
        )
    
    def forward(self, x):
        return self.op(x)


class SepConv(nn.Module):
    """可分离卷积"""
    
    def __init__(self, C_in, C_out, kernel_size, stride, padding):
        super(SepConv, self).__init__()
        self.op = nn.Sequential(
            nn.ReLU(inplace=False),
            nn.Conv2d(C_in, C_in, kernel_size, stride, padding, 
                     groups=C_in, bias=False),
            nn.Conv2d(C_in, C_in, 1, 1, 0, bias=False),
            nn.BatchNorm2d(C_in),
            nn.ReLU(inplace=False),
            nn.Conv2d(C_in, C_in, kernel_size, 1, padding, 
                     groups=C_in, bias=False),
            nn.Conv2d(C_in, C_out, 1, 1, 0, bias=False),
            nn.BatchNorm2d(C_out)
        )
    
    def forward(self, x):
        return self.op(x)


class Identity(nn.Module):
    """恒等映射"""
    
    def forward(self, x):
        return x


class Zero(nn.Module):
    """零操作"""
    
    def __init__(self, stride):
        super(Zero, self).__init__()
        self.stride = stride
    
    def forward(self, x):
        if self.stride == 1:
            return x.mul(0.0)
        return x[:, :, ::self.stride, ::self.stride].mul(0.0)


class FactorizedReduce(nn.Module):
    """降维操作"""
    
    def __init__(self, C_in, C_out):
        super(FactorizedReduce, self).__init__()
        self.conv_1 = nn.Conv2d(C_in, C_out // 2, 1, stride=2, bias=False)
        self.conv_2 = nn.Conv2d(C_in, C_out // 2, 1, stride=2, bias=False)
        self.bn = nn.BatchNorm2d(C_out)
    
    def forward(self, x):
        out = torch.cat([self.conv_1(x), self.conv_2(x[:, :, 1:, 1:])], dim=1)
        return self.bn(out)

3.2 双层优化

DARTS使用双层优化(Bilevel Optimization):

  • 内层:优化网络权重(在训练集上)
  • 外层:优化架构参数(在验证集上)
class DARTSTrainer:
    """DARTS训练器"""
    
    def __init__(self, model, args):
        self.model = model
        
        # 网络权重优化器
        self.w_optimizer = torch.optim.SGD(
            model.weights(),
            lr=args.learning_rate,
            momentum=args.momentum,
            weight_decay=args.weight_decay
        )
        
        # 架构参数优化器
        self.alpha_optimizer = torch.optim.Adam(
            model.alphas(),
            lr=args.arch_learning_rate,
            betas=(0.5, 0.999),
            weight_decay=args.arch_weight_decay
        )
        
        self.scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(
            self.w_optimizer, float(args.epochs), eta_min=args.learning_rate_min
        )
    
    def train_step(self, train_data, valid_data):
        """单步训练"""
        # 步骤1:在训练集上更新网络权重
        self.w_optimizer.zero_grad()
        logits = self.model(train_data[0])
        loss = F.cross_entropy(logits, train_data[1])
        loss.backward()
        self.w_optimizer.step()
        
        # 步骤2:在验证集上更新架构参数
        self.alpha_optimizer.zero_grad()
        logits = self.model(valid_data[0])
        loss = F.cross_entropy(logits, valid_data[1])
        loss.backward()
        self.alpha_optimizer.step()
        
        return loss.item()
    
    def derive_architecture(self):
        """从连续松弛中导出离散架构"""
        return self.model.genotype()

4. 基于进化的NAS

4.1 遗传算法

进化算法模拟自然选择过程,通过选择、交叉、变异等操作在种群中搜索最优架构。

import random
import copy

class Individual:
    """个体:表示一个网络架构"""
    
    def __init__(self, architecture):
        self.architecture = architecture
        self.fitness = None  # 验证准确率
        self.params = None   # 参数量
        self.flops = None    # 计算量


class GeneticNAS:
    """基于遗传算法的NAS"""
    
    def __init__(self, population_size=50, mutation_rate=0.1, 
                 crossover_rate=0.8, generations=100):
        self.population_size = population_size
        self.mutation_rate = mutation_rate
        self.crossover_rate = crossover_rate
        self.generations = generations
        
        self.population = []
        self.best_individual = None
    
    def initialize_population(self, search_space):
        """初始化种群"""
        self.population = []
        for _ in range(self.population_size):
            arch = search_space.sample_architecture()
            individual = Individual(arch)
            self.population.append(individual)
    
    def evaluate_population(self, train_fn):
        """评估种群中所有个体"""
        for individual in self.population:
            if individual.fitness is None:
                fitness, params, flops = train_fn(individual.architecture)
                individual.fitness = fitness
                individual.params = params
                individual.flops = flops
    
    def select_parent(self):
        """锦标赛选择"""
        tournament_size = 3
        tournament = random.sample(self.population, tournament_size)
        return max(tournament, key=lambda x: x.fitness)
    
    def crossover(self, parent1, parent2):
        """单点交叉"""
        if random.random() > self.crossover_rate:
            return copy.deepcopy(parent1)
        
        arch1 = parent1.architecture
        arch2 = parent2.architecture
        
        # 选择交叉点
        point = random.randint(1, min(len(arch1), len(arch2)) - 1)
        
        # 创建子代
        child_arch = arch1[:point] + arch2[point:]
        return Individual(child_arch)
    
    def mutate(self, individual, search_space):
        """变异操作"""
        arch = copy.deepcopy(individual.architecture)
        
        for i in range(len(arch)):
            if random.random() < self.mutation_rate:
                # 随机改变这一层的操作或通道数
                if random.random() < 0.5:
                    arch[i]['op'] = random.choice(search_space.operations)
                else:
                    arch[i]['channels'] = random.choice(search_space.channels)
        
        return Individual(arch)
    
    def evolve(self, search_space, train_fn):
        """进化主循环"""
        # 初始化
        self.initialize_population(search_space)
        
        for generation in range(self.generations):
            print(f"Generation {generation + 1}/{self.generations}")
            
            # 评估
            self.evaluate_population(train_fn)
            
            # 记录最优
            current_best = max(self.population, key=lambda x: x.fitness)
            if self.best_individual is None or \
               current_best.fitness > self.best_individual.fitness:
                self.best_individual = copy.deepcopy(current_best)
            
            print(f"  Best fitness: {self.best_individual.fitness:.4f}")
            
            # 创建新一代
            new_population = [self.best_individual]  # 保留最优个体
            
            while len(new_population) < self.population_size:
                parent1 = self.select_parent()
                parent2 = self.select_parent()
                
                child = self.crossover(parent1, parent2)
                child = self.mutate(child, search_space)
                
                new_population.append(child)
            
            self.population = new_population
        
        return self.best_individual

4.2 NSGA-II多目标优化

NSGA-II(非支配排序遗传算法II)用于多目标优化,同时考虑准确率和模型复杂度。

class NSGA2NAS(GeneticNAS):
    """基于NSGA-II的多目标NAS"""
    
    def __init__(self, population_size=50, generations=100):
        super().__init__(population_size, 0.1, 0.8, generations)
    
    def dominates(self, ind1, ind2):
        """判断ind1是否支配ind2"""
        # 目标1:最大化准确率
        # 目标2:最小化参数量
        better_in_one = False
        
        if ind1.fitness > ind2.fitness:
            better_in_one = True
        elif ind1.fitness < ind2.fitness:
            return False
        
        if ind1.params < ind2.params:
            better_in_one = True
        elif ind1.params > ind2.params:
            return False
        
        return better_in_one
    
    def non_dominated_sort(self, population):
        """非支配排序"""
        fronts = [[]]
        domination_count = {}
        dominated_solutions = {}
        
        for i, p in enumerate(population):
            domination_count[i] = 0
            dominated_solutions[i] = []
            
            for j, q in enumerate(population):
                if i != j:
                    if self.dominates(p, q):
                        dominated_solutions[i].append(j)
                    elif self.dominates(q, p):
                        domination_count[i] += 1
            
            if domination_count[i] == 0:
                fronts[0].append(i)
        
        i = 0
        while len(fronts[i]) > 0:
            next_front = []
            for p in fronts[i]:
                for q in dominated_solutions[p]:
                    domination_count[q] -= 1
                    if domination_count[q] == 0:
                        next_front.append(q)
            i += 1
            fronts.append(next_front)
        
        return fronts[:-1]  # 去掉最后一个空列表
    
    def crowding_distance(self, front, population):
        """计算拥挤距离"""
        if len(front) <= 2:
            return {i: float('inf') for i in front}
        
        distance = {i: 0 for i in front}
        
        # 按准确率排序
        sorted_by_acc = sorted(front, 
                              key=lambda i: population[i].fitness)
        distance[sorted_by_acc[0]] = float('inf')
        distance[sorted_by_acc[-1]] = float('inf')
        
        acc_range = (population[sorted_by_acc[-1]].fitness - 
                    population[sorted_by_acc[0]].fitness)
        
        for i in range(1, len(sorted_by_acc) - 1):
            if acc_range > 0:
                distance[sorted_by_acc[i]] += (
                    population[sorted_by_acc[i+1]].fitness - 
                    population[sorted_by_acc[i-1]].fitness
                ) / acc_range
        
        # 按参数量排序
        sorted_by_params = sorted(front, 
                                 key=lambda i: population[i].params)
        distance[sorted_by_params[0]] = float('inf')
        distance[sorted_by_params[-1]] = float('inf')
        
        params_range = (population[sorted_by_params[-1]].params - 
                       population[sorted_by_params[0]].params)
        
        for i in range(1, len(sorted_by_params) - 1):
            if params_range > 0:
                distance[sorted_by_params[i]] += (
                    population[sorted_by_params[i+1]].params - 
                    population[sorted_by_params[i-1]].params
                ) / params_range
        
        return distance

5. 权重共享与超网

5.1 Once-for-All网络

Once-for-All(OFA)是一种训练一次即可导出多种子网络的超网方法。它通过渐进式收缩训练,使超网中的权重可以被不同大小的子网络共享。

class OFASuperNet(nn.Module):
    """Once-for-All超网"""
    
    def __init__(self, num_classes=1000, base_channels=64):
        super(OFASuperNet, self).__init__()
        
        # 最大配置
        self.max_depth = 20
        self.max_channels = base_channels * 4
        self.max_kernel = 7
        self.max_expand_ratio = 6
        
        # 第一层
        self.first_conv = nn.Conv2d(3, base_channels, 3, padding=1, bias=False)
        self.first_bn = nn.BatchNorm2d(base_channels)
        
        # 动态层(MBConv块)
        self.blocks = nn.ModuleList()
        in_ch = base_channels
        
        for i in range(self.max_depth):
            out_ch = min(in_ch * 2, self.max_channels)
            self.blocks.append(
                DynamicMBConv(in_ch, out_ch, self.max_expand_ratio)
            )
            in_ch = out_ch
        
        # 分类头
        self.classifier = nn.Linear(self.max_channels, num_classes)
    
    def forward(self, x, arch_config=None):
        """
        参数:
            x: 输入
            arch_config: 架构配置,包含depth, width, kernel_size等
        """
        if arch_config is None:
            arch_config = self.sample_active_subnet()
        
        # 第一层
        x = F.relu(self.first_bn(self.first_conv(x)))
        
        # 动态块
        for i in range(arch_config['depth']):
            x = self.blocks[i](x, arch_config)
        
        # 全局平均池化
        x = F.adaptive_avg_pool2d(x, 1)
        x = x.view(x.size(0), -1)
        
        # 分类
        x = self.classifier(x)
        return x
    
    def sample_active_subnet(self):
        """随机采样一个子网络配置"""
        import random
        return {
            'depth': random.randint(5, self.max_depth),
            'width_mult': random.uniform(0.5, 1.0),
            'kernel_size': random.choice([3, 5, 7]),
            'expand_ratio': random.choice([3, 4, 6])
        }


class DynamicMBConv(nn.Module):
    """动态MobileNetV2块"""
    
    def __init__(self, in_ch, out_ch, max_expand_ratio):
        super(DynamicMBConv, self).__init__()
        
        self.max_expand_ratio = max_expand_ratio
        hidden_dim = in_ch * max_expand_ratio
        
        # 扩展卷积
        self.expand_conv = nn.Conv2d(in_ch, hidden_dim, 1, bias=False)
        self.expand_bn = nn.BatchNorm2d(hidden_dim)
        
        # 深度卷积(支持动态核大小)
        self.depth_conv_3 = nn.Conv2d(hidden_dim, hidden_dim, 3, 
                                       padding=1, groups=hidden_dim, bias=False)
        self.depth_conv_5 = nn.Conv2d(hidden_dim, hidden_dim, 5, 
                                       padding=2, groups=hidden_dim, bias=False)
        self.depth_conv_7 = nn.Conv2d(hidden_dim, hidden_dim, 7, 
                                       padding=3, groups=hidden_dim, bias=False)
        self.depth_bn = nn.BatchNorm2d(hidden_dim)
        
        # 投影卷积
        self.project_conv = nn.Conv2d(hidden_dim, out_ch, 1, bias=False)
        self.project_bn = nn.BatchNorm2d(out_ch)
    
    def forward(self, x, config):
        """根据配置动态前向"""
        identity = x
        
        # 扩展
        expand_ch = int(x.size(1) * config['expand_ratio'])
        x = self.expand_conv(x)
        x = self.expand_bn(x)
        x = F.relu(x)
        x = x[:, :expand_ch, :, :]
        
        # 深度卷积
        if config['kernel_size'] == 3:
            x = self.depth_conv_3(x)
        elif config['kernel_size'] == 5:
            x = self.depth_conv_5(x)
        else:
            x = self.depth_conv_7(x)
        x = self.depth_bn(x)
        x = F.relu(x)
        
        # 投影
        x = self.project_conv(x)
        x = self.project_bn(x)
        
        # 残差连接
        if identity.size() == x.size():
            x = x + identity
        
        return x

5.2 BigNAS

BigNAS是另一种权重共享方法,通过三明治规则(Sandwich Rule)训练超网,确保大模型和小模型都能获得良好的性能。

class BigNASTrainer:
    """BigNAS训练器"""
    
    def __init__(self, supernet, args):
        self.supernet = supernet
        self.optimizer = torch.optim.SGD(
            supernet.parameters(),
            lr=args.learning_rate,
            momentum=0.9,
            weight_decay=args.weight_decay
        )
    
    def train_step(self, inputs, targets):
        """使用三明治规则训练"""
        self.optimizer.zero_grad()
        
        # 三明治规则:同时训练最大、最小和随机采样的子网络
        
        # 1. 训练最大子网络
        max_config = self.get_max_config()
        outputs_max = self.supernet(inputs, max_config)
        loss_max = F.cross_entropy(outputs_max, targets)
        loss_max.backward()
        
        # 2. 训练最小子网络
        min_config = self.get_min_config()
        outputs_min = self.supernet(inputs, min_config)
        loss_min = F.cross_entropy(outputs_min, targets)
        loss_min.backward()
        
        # 3. 训练随机子网络
        random_config = self.sample_config()
        outputs_random = self.supernet(inputs, random_config)
        loss_random = F.cross_entropy(outputs_random, targets)
        loss_random.backward()
        
        self.optimizer.step()
        
        return {
            'loss_max': loss_max.item(),
            'loss_min': loss_min.item(),
            'loss_random': loss_random.item()
        }
    
    def get_max_config(self):
        """获取最大配置"""
        return {
            'depth': self.supernet.max_depth,
            'width_mult': 1.0,
            'kernel_size': 7,
            'expand_ratio': 6
        }
    
    def get_min_config(self):
        """获取最小配置"""
        return {
            'depth': 5,
            'width_mult': 0.25,
            'kernel_size': 3,
            'expand_ratio': 3
        }
    
    def sample_config(self):
        """随机采样配置"""
        import random
        return {
            'depth': random.randint(5, self.supernet.max_depth),
            'width_mult': random.choice([0.25, 0.5, 0.75, 1.0]),
            'kernel_size': random.choice([3, 5, 7]),
            'expand_ratio': random.choice([3, 4, 6])
        }

6. AutoML工具与应用

6.1 Optuna

Optuna是一个高效的超参数优化框架,支持多种搜索算法(TPE、CMA-ES等)。

import optuna
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms

# 定义目标函数
def objective(trial):
    """Optuna目标函数"""
    
    # 定义搜索空间
    n_layers = trial.suggest_int('n_layers', 1, 3)
    dropout = trial.suggest_float('dropout', 0.1, 0.5)
    lr = trial.suggest_float('lr', 1e-5, 1e-1, log=True)
    optimizer_name = trial.suggest_categorical('optimizer', ['Adam', 'SGD'])
    
    # 构建模型
    layers = []
    in_features = 784  # MNIST
    
    for i in range(n_layers):
        out_features = trial.suggest_int(f'n_units_l{i}', 64, 512, log=True)
        layers.append(nn.Linear(in_features, out_features))
        layers.append(nn.ReLU())
        layers.append(nn.Dropout(dropout))
        in_features = out_features
    
    layers.append(nn.Linear(in_features, 10))
    model = nn.Sequential(*layers)
    
    # 训练配置
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    model = model.to(device)
    
    # 优化器
    if optimizer_name == 'Adam':
        optimizer = optim.Adam(model.parameters(), lr=lr)
    else:
        momentum = trial.suggest_float('momentum', 0.5, 0.99)
        optimizer = optim.SGD(model.parameters(), lr=lr, momentum=momentum)
    
    # 数据加载
    transform = transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize((0.1307,), (0.3081,))
    ])
    
    train_loader = torch.utils.data.DataLoader(
        datasets.MNIST('./data', train=True, download=True, transform=transform),
        batch_size=128, shuffle=True
    )
    
    # 训练
    criterion = nn.CrossEntropyLoss()
    model.train()
    
    for epoch in range(5):  # 简化为5个epoch
        for batch_idx, (data, target) in enumerate(train_loader):
            data, target = data.to(device), target.to(device)
            data = data.view(data.size(0), -1)
            
            optimizer.zero_grad()
            output = model(data)
            loss = criterion(output, target)
            loss.backward()
            optimizer.step()
            
            if batch_idx >= 100:  # 限制迭代次数
                break
    
    # 验证
    model.eval()
    correct = 0
    total = 0
    
    with torch.no_grad():
        for data, target in train_loader:
            data, target = data.to(device), target.to(device)
            data = data.view(data.size(0), -1)
            output = model(data)
            _, predicted = torch.max(output.data, 1)
            total += target.size(0)
            correct += (predicted == target).sum().item()
            
            if total >= 1000:  # 限制验证样本数
                break
    
    accuracy = correct / total
    return accuracy


# 运行优化
def run_optuna():
    """运行Optuna优化"""
    study = optuna.create_study(direction='maximize')
    study.optimize(objective, n_trials=50)
    
    print("Best trial:")
    trial = study.best_trial
    print(f"  Value: {trial.value:.4f}")
    print("  Params:")
    for key, value in trial.params.items():
        print(f"    {key}: {value}")
    
    return study

6.2 Auto-sklearn与TPOT

# Auto-sklearn示例(需要安装:pip install auto-sklearn)
def auto_sklearn_example():
    """Auto-sklearn使用示例"""
    try:
        import autosklearn.classification
        from sklearn.datasets import load_digits
        from sklearn.model_selection import train_test_split
        
        # 加载数据
        X, y = load_digits(return_X_y=True)
        X_train, X_test, y_train, y_test = train_test_split(
            X, y, test_size=0.2, random_state=42
        )
        
        # 创建Auto-sklearn分类器
        automl = autosklearn.classification.AutoSklearnClassifier(
            time_left_for_this_task=120,  # 2分钟
            per_run_time_limit=30,
            metric=autosklearn.metrics.accuracy
        )
        
        # 训练
        automl.fit(X_train, y_train)
        
        # 评估
        predictions = automl.predict(X_test)
        accuracy = (predictions == y_test).mean()
        print(f"Auto-sklearn accuracy: {accuracy:.4f}")
        
        # 显示最终集成
        print(automl.leaderboard())
        
    except ImportError:
        print("auto-sklearn not installed. Skipping example.")


# TPOT示例(需要安装:pip install tpot)
def tpot_example():
    """TPOT使用示例"""
    try:
        from tpot import TPOTClassifier
        from sklearn.datasets import load_digits
        from sklearn.model_selection import train_test_split
        
        # 加载数据
        X, y = load_digits(return_X_y=True)
        X_train, X_test, y_train, y_test = train_test_split(
            X, y, test_size=0.2, random_state=42
        )
        
        # 创建TPOT分类器
        tpot = TPOTClassifier(
            generations=5,
            population_size=20,
            verbosity=2,
            random_state=42
        )
        
        # 训练
        tpot.fit(X_train, y_train)
        
        # 评估
        accuracy = tpot.score(X_test, y_test)
        print(f"TPOT accuracy: {accuracy:.4f}")
        
        # 导出最佳pipeline
        tpot.export('best_pipeline.py')
        
    except ImportError:
        print("tpot not installed. Skipping example.")

7. 2025年最新进展

7.1 AutoFormer

AutoFormer是微软亚洲研究院提出的Vision Transformer架构搜索方法,它将权重共享技术引入到ViT的搜索中。

核心创新

  1. 搜索空间:包含网络深度、embedding维度、注意力头数等
  2. 权重共享:训练一个超网,包含所有可能的子网络
  3. 渐进式收缩:从大到小逐步训练,确保小模型也能获得良好初始化
class AutoFormerSearchSpace:
    """AutoFormer搜索空间"""
    
    def __init__(self):
        # 可搜索维度
        self.search_depth = [10, 12, 14]           # Transformer层数
        self.search_embed_dim = [384, 480, 528]    # Embedding维度
        self.search_num_heads = [6, 8, 10]         # 注意力头数
        self.search_mlp_ratio = [3.0, 3.5, 4.0]    # MLP扩展比例
    
    def sample_config(self):
        """采样一个ViT配置"""
        import random
        return {
            'depth': random.choice(self.search_depth),
            'embed_dim': random.choice(self.search_embed_dim),
            'num_heads': random.choice(self.search_num_heads),
            'mlp_ratio': random.choice(self.search_mlp_ratio)
        }
    
    def get_max_config(self):
        """获取最大配置"""
        return {
            'depth': max(self.search_depth),
            'embed_dim': max(self.search_embed_dim),
            'num_heads': max(self.search_num_heads),
            'mlp_ratio': max(self.search_mlp_ratio)
        }


class DynamicVisionTransformer(nn.Module):
    """动态Vision Transformer(AutoFormer风格)"""
    
    def __init__(self, img_size=224, patch_size=16, in_chans=3, 
                 num_classes=1000, max_embed_dim=528, max_depth=14):
        super().__init__()
        
        self.max_embed_dim = max_embed_dim
        self.max_depth = max_depth
        
        # Patch嵌入
        self.patch_embed = nn.Conv2d(
            in_chans, max_embed_dim, 
            kernel_size=patch_size, stride=patch_size
        )
        
        # 位置编码
        num_patches = (img_size // patch_size) ** 2
        self.pos_embed = nn.Parameter(
            torch.zeros(1, num_patches + 1, max_embed_dim)
        )
        self.cls_token = nn.Parameter(torch.zeros(1, 1, max_embed_dim))
        
        # Transformer块
        self.blocks = nn.ModuleList([
            DynamicTransformerBlock(max_embed_dim)
            for _ in range(max_depth)
        ])
        
        # 分类头
        self.norm = nn.LayerNorm(max_embed_dim)
        self.head = nn.Linear(max_embed_dim, num_classes)
    
    def forward(self, x, config=None):
        if config is None:
            config = {'depth': self.max_depth, 'embed_dim': self.max_embed_dim}
        
        B = x.shape[0]
        embed_dim = config['embed_dim']
        
        # Patch嵌入
        x = self.patch_embed(x)
        x = x.flatten(2).transpose(1, 2)
        x = x[:, :, :embed_dim]
        
        # 添加CLS token
        cls_tokens = self.cls_token[:, :, :embed_dim].expand(B, -1, -1)
        x = torch.cat([cls_tokens, x], dim=1)
        
        # 添加位置编码
        x = x + self.pos_embed[:, :, :embed_dim]
        
        # Transformer块
        for i in range(config['depth']):
            x = self.blocks[i](x, config)
        
        # 分类
        x = self.norm(x)
        x = x[:, 0]
        x = self.head(x[:, :embed_dim])
        
        return x


class DynamicTransformerBlock(nn.Module):
    """动态Transformer块"""
    
    def __init__(self, max_embed_dim):
        super().__init__()
        
        self.norm1 = nn.LayerNorm(max_embed_dim)
        self.attn = DynamicAttention(max_embed_dim)
        self.norm2 = nn.LayerNorm(max_embed_dim)
        self.mlp = DynamicMLP(max_embed_dim)
    
    def forward(self, x, config):
        embed_dim = config['embed_dim']
        
        # 注意力
        x = x + self.attn(self.norm1(x), config)
        
        # MLP
        x = x + self.mlp(self.norm2(x), config)
        
        return x


class DynamicAttention(nn.Module):
    """动态多头注意力"""
    
    def __init__(self, max_embed_dim, max_num_heads=10):
        super().__init__()
        
        self.max_embed_dim = max_embed_dim
        self.max_num_heads = max_num_heads
        
        self.qkv = nn.Linear(max_embed_dim, max_embed_dim * 3)
        self.proj = nn.Linear(max_embed_dim, max_embed_dim)
    
    def forward(self, x, config):
        embed_dim = config['embed_dim']
        num_heads = config['num_heads']
        
        B, N, _ = x.shape
        head_dim = embed_dim // num_heads
        
        # QKV投影
        qkv = self.qkv(x)[:, :, :embed_dim * 3]
        qkv = qkv.reshape(B, N, 3, num_heads, head_dim).permute(2, 0, 3, 1, 4)
        q, k, v = qkv[0], qkv[1], qkv[2]
        
        # 注意力计算
        attn = (q @ k.transpose(-2, -1)) * (head_dim ** -0.5)
        attn = F.softmax(attn, dim=-1)
        
        x = (attn @ v).transpose(1, 2).reshape(B, N, embed_dim)
        x = self.proj(x)[:, :, :embed_dim]
        
        return x


class DynamicMLP(nn.Module):
    """动态MLP"""
    
    def __init__(self, max_embed_dim, max_ratio=4.0):
        super().__init__()
        
        self.max_embed_dim = max_embed_dim
        self.max_hidden_dim = int(max_embed_dim * max_ratio)
        
        self.fc1 = nn.Linear(max_embed_dim, self.max_hidden_dim)
        self.fc2 = nn.Linear(self.max_hidden_dim, max_embed_dim)
    
    def forward(self, x, config):
        embed_dim = config['embed_dim']
        mlp_ratio = config['mlp_ratio']
        
        hidden_dim = int(embed_dim * mlp_ratio)
        
        x = self.fc1(x)[:, :, :hidden_dim]
        x = F.gelu(x)
        x = self.fc2(x)[:, :hidden_dim, :embed_dim]
        
        return x

7.2 NAS方法对比

方法 搜索策略 搜索成本 主要优势 主要局限
NASNet 强化学习 高(数千GPU小时) 发现高性能细胞结构 计算成本极高
ENAS 权重共享+RL 大幅降低搜索成本 可能陷入局部最优
DARTS 可微分 低(单卡可完成) 效率高,端到端训练 存在崩溃问题
AmoebaNet 进化算法 发现新颖结构 计算成本高
OFA 权重共享 低(训练一次) 支持多种部署场景 需要专门训练策略
BigNAS 权重共享 大模型小模型同时优化 超网训练复杂
AutoFormer 权重共享 针对ViT优化 仅适用于Transformer

8. NAS简化实现代码

8.1 完整DARTS实现

"""
简化版DARTS实现
用于CIFAR-10图像分类
"""

import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import DataLoader


class DARTSCell(nn.Module):
    """DARTS搜索单元"""
    
    def __init__(self, steps, multiplier, C_prev_prev, C_prev, C, 
                 reduction, reduction_prev):
        super(DARTSCell, self).__init__()
        
        self.reduction = reduction
        self.reduction_prev = reduction_prev
        
        # 预处理层
        if reduction_prev:
            self.preprocess0 = FactorizedReduce(C_prev_prev, C)
        else:
            self.preprocess0 = ReLUConvBN(C_prev_prev, C, 1, 1, 0)
        self.preprocess1 = ReLUConvBN(C_prev, C, 1, 1, 0)
        
        self.steps = steps
        self.multiplier = multiplier
        
        # 候选操作
        self.ops = nn.ModuleList()
        self.bns = nn.ModuleList()
        
        primitives = [
            'none', 'max_pool_3x3', 'avg_pool_3x3', 'skip_connect',
            'sep_conv_3x3', 'sep_conv_5x5', 'dil_conv_3x3', 'dil_conv_5x5'
        ]
        
        for i in range(self.steps):
            for j in range(2 + i):
                stride = 2 if reduction and j < 2 else 1
                op = MixedOp(C, stride, primitives)
                self.ops.append(op)
    
    def forward(self, s0, s1, weights):
        s0 = self.preprocess0(s0)
        s1 = self.preprocess1(s1)
        
        states = [s0, s1]
        offset = 0
        
        for i in range(self.steps):
            s = sum(self.ops[offset + j](h, weights[offset + j]) 
                   for j, h in enumerate(states))
            offset += len(states)
            states.append(s)
        
        return torch.cat(states[-self.multiplier:], dim=1)


class DARTSNetwork(nn.Module):
    """DARTS搜索网络"""
    
    def __init__(self, C=16, num_classes=10, layers=8, steps=4, 
                 multiplier=4, stem_multiplier=3):
        super(DARTSNetwork, self).__init__()
        
        self.steps = steps
        self.multiplier = multiplier
        
        C_curr = stem_multiplier * C
        self.stem = nn.Sequential(
            nn.Conv2d(3, C_curr, 3, padding=1, bias=False),
            nn.BatchNorm2d(C_curr)
        )
        
        C_prev_prev, C_prev, C_curr = C_curr, C_curr, C
        self.cells = nn.ModuleList()
        reduction_prev = False
        
        for i in range(layers):
            if i in [layers // 3, 2 * layers // 3]:
                C_curr *= 2
                reduction = True
            else:
                reduction = False
            
            cell = DARTSCell(steps, multiplier, C_prev_prev, C_prev, 
                            C_curr, reduction, reduction_prev)
            self.cells.append(cell)
            
            reduction_prev = reduction
            C_prev_prev = C_prev
            C_prev = multiplier * C_curr
        
        self.global_pooling = nn.AdaptiveAvgPool2d(1)
        self.classifier = nn.Linear(C_prev, num_classes)
        
        # 架构参数
        self._initialize_alphas()
    
    def _initialize_alphas(self):
        k = sum(2 + i for i in range(self.steps))
        num_ops = 8  # 候选操作数
        
        self.alphas_normal = nn.Parameter(torch.randn(k, num_ops))
        self.alphas_reduce = nn.Parameter(torch.randn(k, num_ops))
        
        self._arch_parameters = [
            self.alphas_normal,
            self.alphas_reduce
        ]
    
    def arch_parameters(self):
        return self._arch_parameters
    
    def weights(self):
        return [p for n, p in self.named_parameters() 
                if 'alphas' not in n]
    
    def forward(self, x):
        s0 = s1 = self.stem(x)
        
        for i, cell in enumerate(self.cells):
            if cell.reduction:
                weights = F.softmax(self.alphas_reduce, dim=-1)
            else:
                weights = F.softmax(self.alphas_normal, dim=-1)
            
            s0, s1 = s1, cell(s0, s1, weights)
        
        out = self.global_pooling(s1)
        logits = self.classifier(out.view(out.size(0), -1))
        return logits
    
    def genotype(self):
        """导出离散架构"""
        def _parse(weights):
            gene = []
            n = 2
            start = 0
            
            for i in range(self.steps):
                end = start + n
                W = weights[start:end].copy()
                
                edges = sorted(range(i + 2), 
                             key=lambda x: -max(W[x][k] for k in range(len(W[x]))
                             if k != 0))[:2]
                
                for j in edges:
                    k_best = None
                    for k in range(len(W[j])):
                        if k != 0:
                            if k_best is None or W[j][k] > W[j][k_best]:
                                k_best = k
                    gene.append((k_best, j))
                
                start = end
                n += 1
            
            return gene
        
        gene_normal = _parse(F.softmax(self.alphas_normal, dim=-1).data.cpu().numpy())
        gene_reduce = _parse(F.softmax(self.alphas_reduce, dim=-1).data.cpu().numpy())
        
        return {'normal': gene_normal, 'reduce': gene_reduce}


def train_darts():
    """训练DARTS"""
    
    # 数据预处理
    transform_train = transforms.Compose([
        transforms.RandomCrop(32, padding=4),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize((0.4914, 0.4822, 0.4465), 
                           (0.2023, 0.1994, 0.2010))
    ])
    
    transform_test = transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize((0.4914, 0.4822, 0.4465), 
                           (0.2023, 0.1994, 0.2010))
    ])
    
    # 加载CIFAR-10
    trainset = torchvision.datasets.CIFAR10(
        root='./data', train=True, download=True, transform=transform_train
    )
    trainloader = DataLoader(trainset, batch_size=64, shuffle=True, num_workers=2)
    
    testset = torchvision.datasets.CIFAR10(
        root='./data', train=False, download=True, transform=transform_test
    )
    testloader = DataLoader(testset, batch_size=64, shuffle=False, num_workers=2)
    
    # 创建模型
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    model = DARTSNetwork().to(device)
    
    # 优化器
    w_optimizer = torch.optim.SGD(
        model.weights(), lr=0.025, momentum=0.9, weight_decay=3e-4
    )
    alpha_optimizer = torch.optim.Adam(
        model.arch_parameters(), lr=3e-4, betas=(0.5, 0.999)
    )
    
    scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(
        w_optimizer, T_max=50, eta_min=0.001
    )
    
    # 训练循环
    for epoch in range(50):
        model.train()
        train_loss = 0
        correct = 0
        total = 0
        
        for batch_idx, (inputs, targets) in enumerate(trainloader):
            inputs, targets = inputs.to(device), targets.to(device)
            
            # 更新网络权重
            w_optimizer.zero_grad()
            outputs = model(inputs)
            loss = F.cross_entropy(outputs, targets)
            loss.backward()
            w_optimizer.step()
            
            # 更新架构参数(每批次或每几批次)
            if batch_idx % 5 == 0:
                try:
                    val_inputs, val_targets = next(val_iter)
                except:
                    val_iter = iter(testloader)
                    val_inputs, val_targets = next(val_iter)
                
                val_inputs, val_targets = val_inputs.to(device), val_targets.to(device)
                
                alpha_optimizer.zero_grad()
                outputs = model(val_inputs)
                loss = F.cross_entropy(outputs, val_targets)
                loss.backward()
                alpha_optimizer.step()
            
            train_loss += loss.item()
            _, predicted = outputs.max(1)
            total += targets.size(0)
            correct += predicted.eq(targets).sum().item()
        
        scheduler.step()
        
        # 验证
        model.eval()
        test_loss = 0
        correct_test = 0
        total_test = 0
        
        with torch.no_grad():
            for inputs, targets in testloader:
                inputs, targets = inputs.to(device), targets.to(device)
                outputs = model(inputs)
                loss = F.cross_entropy(outputs, targets)
                
                test_loss += loss.item()
                _, predicted = outputs.max(1)
                total_test += targets.size(0)
                correct_test += predicted.eq(targets).sum().item()
        
        print(f'Epoch {epoch+1}/50: '
              f'Train Acc: {100.*correct/total:.2f}%, '
              f'Test Acc: {100.*correct_test/total_test:.2f}%')
    
    # 导出最终架构
    genotype = model.genotype()
    print('Final genotype:', genotype)
    
    return model, genotype


if __name__ == '__main__':
    model, genotype = train_darts()

9. 避坑小贴士

9.1 搜索空间设计

  • 搜索空间过大:会导致搜索效率极低,建议从较小的搜索空间开始,逐步扩展
  • 搜索空间过小:可能无法发现优秀的架构,需要平衡搜索空间和计算资源
  • 操作冗余:避免包含性能相近的操作,如同时使用3x3和4x4卷积

9.2 DARTS常见问题

  • 架构崩溃(Collapse):DARTS倾向于选择跳跃连接或无操作,导致网络退化

    • 解决方案:使用早停策略、添加操作丢弃正则化、使用P-DARTS等改进方法
  • 验证集过拟合:架构参数在验证集上过拟合

    • 解决方案:增加验证集大小、使用早停、限制架构参数更新频率

9.3 权重共享训练

  • 子网络干扰:不同子网络共享权重可能导致相互干扰

    • 解决方案:使用渐进式收缩训练、三明治规则、梯度掩码
  • 排序不一致:超网性能与子网络独立训练性能不一致

    • 解决方案:使用更精细的权重共享策略、添加排序一致性损失

9.4 计算资源管理

  • 搜索成本:NAS通常需要大量计算资源,建议:
    • 使用代理任务(如CIFAR-10代替ImageNet)
    • 限制每个候选架构的训练轮数
    • 使用权重共享减少重复训练

10. 本章小结和知识点回顾

核心概念

  1. NAS三要素:搜索空间定义了可能的架构集合,搜索策略决定如何探索,性能评估衡量架构质量

  2. 搜索空间类型:链式结构简单直接,细胞结构模块化且可迁移

  3. 搜索策略

    • 强化学习:RNN控制器+策略梯度,适合离散搜索空间
    • 进化算法:遗传算法、NSGA-II,适合多目标优化
    • 可微分方法:DARTS通过连续松弛实现端到端优化
  4. 权重共享:OFA和BigNAS通过训练超网大幅降低搜索成本

  5. AutoML工具:Optuna、Auto-sklearn、TPOT提供开箱即用的自动化机器学习功能

关键公式

  • DARTS连续松弛

    output = sum(softmax(alpha_i) * op_i(x))
    
  • REINFORCE策略梯度

    gradient = E[R * gradient(log P(a))]
    
  • NSGA-II非支配排序:基于Pareto最优性对解进行分层

一句话总结

NAS让神经网络学会"自我设计",通过自动化搜索最优架构,将人类专家从繁琐的调参工作中解放出来,是通往通用人工智能的重要一步。


参考资料

  1. Zoph et al. “Neural Architecture Search with Reinforcement Learning” (ICLR 2017)
  2. Liu et al. “DARTS: Differentiable Architecture Search” (ICLR 2019)
  3. Real et al. “Regularized Evolution for Image Classifier Architecture Search” (AAAI 2019)
  4. Cai et al. “Once-for-All: Train One Network and Specialize it for Efficient Deployment” (ICLR 2020)
  5. Chen et al. “AutoFormer: Searching Transformers for Visual Recognition” (ICCV 2021)
  6. Dong et al. “BigNAS: Scaling Up Neural Architecture Search with Big Single-Stage Models” (ECCV 2020)

本教程为《深度学习精通》系列第22章,转载请注明出处。

Logo

开源鸿蒙跨平台开发社区汇聚开发者与厂商,共建“一次开发,多端部署”的开源生态,致力于降低跨端开发门槛,推动万物智联创新。

更多推荐