Recent progress in leveraging large language models (LLMs) has enabled Neural Architecture Design (NAD) systems to generate new architecture not limited from manually predefined search space. Nevertheless, LLM-driven generation remains challenging: the token-level design loop is discrete and non-differentiable, preventing feedback from smoothly guiding architectural improvement. These methods, in turn, commonly suffer from mode collapse into redundant structures or drift toward infeasible designs when constructive reasoning is not well grounded. We introduce RevoNAD, a reflective evolutionary orchestrator that effectively bridges LLM-based reasoning with feedback-aligned architectural search. First, RevoNAD presents a Multi-round Multi-expert Consensus to transfer isolated design rules into meaningful architectural clues. Then, Adaptive Reflective Exploration adjusts the degree of exploration leveraging reward variance; it explores when feedback is uncertain and refines when stability is reached. Finally, Pareto-guided Evolutionary Selection effectively promotes architectures that jointly optimize accuracy, efficiency, latency, confidence, and structural diversity. Across CIFAR10, CIFAR100, ImageNet16-120, COCO-5K, and Cityscape, RevoNAD achieves state-of-the-art performance. Ablation and transfer studies further validate the effectiveness of RevoNAD in allowing practically reliable, and deployable neural architecture design.
| Method | # Archs | CIFAR10 (↑) | CIFAR100 (↑) | ImageNet16-120 (↑) | |||
|---|---|---|---|---|---|---|---|
| Val. | Test | Val. | Test | Val. | Test | ||
| ENAS [41] | - | 37.51±3.19 | 53.89±0.58 | 13.37±2.35 | 13.96±2.33 | 15.06±1.95 | 14.84±2.10 |
| DARTS [33] | - | 39.77±0.00 | 54.30±0.00 | 38.57±0.00 | 15.61±0.00 | 18.87±0.00 | 16.32±0.00 |
| SETN [14] | - | 84.04±0.28 | 87.64±0.00 | 58.86±0.06 | 59.05±0.24 | 33.06±0.02 | 32.52±0.21 |
| DSNAS [56] | - | 89.66±0.29 | 93.08±0.13 | 30.87±16.40 | 31.01±16.4 | 40.61±0.10 | 41.07±0.09 |
| PC-DARTS [56] | - | 89.96±0.15 | 93.41±0.30 | 67.12±0.39 | 67.48±0.89 | 40.83±0.08 | 41.31±0.22 |
| SNAS [55] | - | 90.10±1.04 | 92.77±0.83 | 69.69±2.39 | 69.34±1.98 | 42.84±1.79 | 43.16±2.64 |
| iDARTS [61] | - | 89.86±0.60 | 93.58±0.32 | 70.57±0.24 | 70.83±0.48 | 40.38±0.59 | 40.89±0.68 |
| GDAS [15] | - | 89.89±0.08 | 93.61±0.09 | 71.34±0.04 | 70.70±0.30 | 41.59±1.33 | 41.71±0.98 |
| DRNAS [8] | - | 91.55±0.00 | 94.36±0.00 | 73.49±0.00 | 73.51±0.00 | 46.37±0.00 | 46.34±0.00 |
| β-DARTS [58] | - | 91.55±0.00 | 94.36±0.00 | 73.49±0.00 | 73.51±0.00 | 46.37±0.00 | 46.34±0.00 |
| Λ-DARTS [36] | - | 91.55±0.00 | 94.36±0.00 | 73.49±0.00 | 73.51±0.00 | 46.37±0.00 | 46.34±0.00 |
| REA [16] | 500 | 91.19±0.31 | 93.92±0.30 | 71.81±1.12 | 71.84±0.99 | 45.15±0.89 | 45.54±1.03 |
| RS [43] | 500 | 90.93±0.36 | 93.70±0.36 | 70.93±1.09 | 71.04±1.07 | 44.45±1.10 | 44.57±1.25 |
| REINFORCE [51] | 500 | 91.09±0.37 | 93.85±0.37 | 71.61±1.12 | 71.71±1.09 | 45.05±1.02 | 45.24±1.18 |
| BOHB [18] | 500 | 90.82±0.53 | 93.61±0.52 | 70.74±1.29 | 70.85±1.28 | 44.26±1.36 | 44.42±1.49 |
| Oracle | - | 91.61 | 94.37 | 73.49 | 73.51 | 46.77 | 47.31 |
| GENIUS [65] | 10 | 91.07±0.20 | 93.79±0.09 | 70.96±0.33 | 70.91±0.72 | 45.29±0.81 | 44.96±1.02 |
| LLMatic [38] | 2000 | - | 94.26±0.13 | - | 71.62±1.73 | - | 45.87±0.96 |
| LeMo-NADe-GPT4 [42] | 30 | 90.90 | 89.41 | 63.38 | 67.90 | 27.05 | 27.70 |
| NADER [57] | 5 | 91.17±0.24 | 94.52±0.22 | 73.29±1.86 | 73.12±1.09 | 47.98±0.73 | 47.99±0.38 |
| NADER [57] | 10 | 91.18±0.23 | 94.52±0.22 | 74.71±0.45 | 74.65±0.33 | 48.56±0.83 | 48.61±0.76 |
| NADER [57] | 500 | 91.55 | 94.62 | 75.72 | 76.00 | 50.20 | 50.52 |
| RevoNAD (Qwen2.5) | 5 | 91.17±0.25 | 94.77±0.18 | 76.21±0.42 | 76.25±0.37 | 51.00±0.21 | 50.55±0.17 |
| RevoNAD (GPT4o) | 5 | 92.55±0.27 | 95.22±0.20 | 76.70±0.39 | 76.38±0.32 | 50.58±0.19 | 50.72±0.20 |