RevoNAD: Reflective Evolutionary Exploration for Neural Architecture Design

1Korea University     2Samsung AI Center, DS Division
{gsjang95, dabujin98, hannie12, dlwogurgur, spk7}@korea.ac.kr
s.steve.jang@samsung.com
* Corresponding Author

Overview

Comparison of search efficiency across CIFAR10, CIFAR100, and ImageNet16-120. RevoNAD discovers significantly stronger architectures with only five generated candidates, outperforming prior NAD, NAS, and LLM-driven NAS methods that require up to hundreds or thousands of searches. The results demonstrate RevoNAD's cost-efficient exploration and superior design effectiveness across diverse datasets.
Teaser
Overview of the RevoNAD Orchestrator. Starting from a base model, RevoNAD employs a Multi-Expert Ideation module that distills literature into structured architectural inspiration tokens. These inspirations guide an LLM-based Reflective Design Exploration module, which adaptively balances exploration and refinement based on reward variance. Each generated candidate is trained and evaluated, and a Pareto-guided Evolution process selects diverse, high-quality architectures, enabling stable, feedback-aligned neural architecture design.

Abstract

Recent progress in leveraging large language models (LLMs) has enabled Neural Architecture Design (NAD) systems to generate new architecture not limited from manually predefined search space. Nevertheless, LLM-driven generation remains challenging: the token-level design loop is discrete and non-differentiable, preventing feedback from smoothly guiding architectural improvement. These methods, in turn, commonly suffer from mode collapse into redundant structures or drift toward infeasible designs when constructive reasoning is not well grounded. We introduce RevoNAD, a reflective evolutionary orchestrator that effectively bridges LLM-based reasoning with feedback-aligned architectural search. First, RevoNAD presents a Multi-round Multi-expert Consensus to transfer isolated design rules into meaningful architectural clues. Then, Adaptive Reflective Exploration adjusts the degree of exploration leveraging reward variance; it explores when feedback is uncertain and refines when stability is reached. Finally, Pareto-guided Evolutionary Selection effectively promotes architectures that jointly optimize accuracy, efficiency, latency, confidence, and structural diversity. Across CIFAR10, CIFAR100, ImageNet16-120, COCO-5K, and Cityscape, RevoNAD achieves state-of-the-art performance. Ablation and transfer studies further validate the effectiveness of RevoNAD in allowing practically reliable, and deployable neural architecture design.



Quantitative Results

Method # Archs CIFAR10 (↑) CIFAR100 (↑) ImageNet16-120 (↑)
Val.Test Val.Test Val.Test
ENAS [41] - 37.51±3.1953.89±0.58 13.37±2.3513.96±2.33 15.06±1.9514.84±2.10
DARTS [33] - 39.77±0.0054.30±0.00 38.57±0.0015.61±0.00 18.87±0.0016.32±0.00
SETN [14] - 84.04±0.2887.64±0.00 58.86±0.0659.05±0.24 33.06±0.0232.52±0.21
DSNAS [56] - 89.66±0.2993.08±0.13 30.87±16.4031.01±16.4 40.61±0.1041.07±0.09
PC-DARTS [56] - 89.96±0.1593.41±0.30 67.12±0.3967.48±0.89 40.83±0.0841.31±0.22
SNAS [55] - 90.10±1.0492.77±0.83 69.69±2.3969.34±1.98 42.84±1.7943.16±2.64
iDARTS [61] - 89.86±0.6093.58±0.32 70.57±0.2470.83±0.48 40.38±0.5940.89±0.68
GDAS [15] - 89.89±0.0893.61±0.09 71.34±0.0470.70±0.30 41.59±1.3341.71±0.98
DRNAS [8] - 91.55±0.0094.36±0.00 73.49±0.0073.51±0.00 46.37±0.0046.34±0.00
β-DARTS [58] - 91.55±0.0094.36±0.00 73.49±0.0073.51±0.00 46.37±0.0046.34±0.00
Λ-DARTS [36] - 91.55±0.0094.36±0.00 73.49±0.0073.51±0.00 46.37±0.0046.34±0.00
REA [16] 500 91.19±0.3193.92±0.30 71.81±1.1271.84±0.99 45.15±0.8945.54±1.03
RS [43] 500 90.93±0.3693.70±0.36 70.93±1.0971.04±1.07 44.45±1.1044.57±1.25
REINFORCE [51] 500 91.09±0.3793.85±0.37 71.61±1.1271.71±1.09 45.05±1.0245.24±1.18
BOHB [18] 500 90.82±0.5393.61±0.52 70.74±1.2970.85±1.28 44.26±1.3644.42±1.49
Oracle - 91.6194.37 73.4973.51 46.7747.31
GENIUS [65] 10 91.07±0.2093.79±0.09 70.96±0.3370.91±0.72 45.29±0.8144.96±1.02
LLMatic [38] 2000 -94.26±0.13 -71.62±1.73 -45.87±0.96
LeMo-NADe-GPT4 [42] 30 90.9089.41 63.3867.90 27.0527.70
NADER [57] 5 91.17±0.2494.52±0.22 73.29±1.8673.12±1.09 47.98±0.7347.99±0.38
NADER [57] 10 91.18±0.2394.52±0.22 74.71±0.4574.65±0.33 48.56±0.8348.61±0.76
NADER [57] 500 91.5594.62 75.7276.00 50.2050.52
RevoNAD (Qwen2.5) 5 91.17±0.2594.77±0.18 76.21±0.4276.25±0.37 51.00±0.2150.55±0.17
RevoNAD (GPT4o) 5 92.55±0.2795.22±0.20 76.70±0.3976.38±0.32 50.58±0.1950.72±0.20
Table 1. Comparison with existing state-of-the-art NAS and NAD methods on CIFAR10, CIFAR100 and ImageNet16-120. Classification accuracy (%) on CIFAR-10/100 and ImageNet16-120. The bold values indicate the best performance. RevoNAD rows are highlighted in blue.