Entropy Regularizing Activation: Boosting Continuous Control, Large Language Models, and Image Classification with Activation as Entropy Constraints
arXiv Code 🤗 HuggingFaceAbstract
We propose ERA, a new paradigm that constrains the sampling entropy above given thresholds by applying specially designed activations to the outputs of models. Our approach demonstrates broad effectiveness across different domains: 1) for large language models (LLMs), boosting the AIME 2025 score for Qwen2.5-Math-7B by 37.4%; 2) for continuous control reinforcement learning agents, improving performance by more than 30% over strong baselines such as SAC on the challenging HumanoidBench; 3) for image classification, enhancing ImageNet top-1 accuracy by 0.69% for ResNet-50. These gains are achieved with a computational overhead of less than 7%. Our work validates output activation as a powerful tool for entropy control, opening a new direction for designing simpler and more robust algorithms.
Applying ERA to Large Language Models
For large language models, we apply an activation layer to the logits $z$ to obtain a transformed set $z'$. This layer adaptively modulates the logit values based on the response entropy $H_{\text{resp}}$ and token advantage $A_t$:
$$ z' = \begin{cases} kz & H_{\text{resp}} < \omega_{\text{low}},\; A_{t}>0 \\ z & \omega_{\text{low}} \leq H_{\text{resp}} \leq \omega_{\text{high}} \\ \tfrac{1}{k}z & H_{\text{resp}} > \omega_{\text{high}},\; A_{t}>0 \end{cases} $$To ensure the stability of the policy update, we apply an inverse scaling factor to the advantages of the modified tokens:
$$ A_t' = \begin{cases} \frac 1k A_t & H_{\text{resp}} < \omega_{\text{low}},\; A_{t}>0 \\ A_t & \omega_{\text{low}} \leq H_{\text{resp}} \leq \omega_{\text{high}} \\ kA_t & H_{\text{resp}} > \omega_{\text{high}},\; A_{t}>0 \end{cases} $$This allows ERA to be integrated seamlessly into on-policy algorithms, resulting in the following GRPO objective:
$$ J(\theta) = \mathbb{E}_t[\mathbb{E}_{a_t\sim \pi_\theta(\cdot |s_t)}\log \pi_\theta'(a_t|s_t) A'_t] $$ For more details and proof of the entropy bound of ERA, please refer to our paper.Applying ERA to Continuous Control
In continuous control, we enforce a minimum entropy on the final policy by constraining the underlying Gaussian's entropy to a higher value. This is achieved by adjusting the Gaussian's standard deviation, $\sigma$. Our activation function $g(\cdot)$ computes the final parameters $(\mu', \sigma')$ as:
$$ \mu' = \mu, \\ \sigma' = \exp{\left[\max\left(\log \sigma_{\max} + (\mathcal{H}_0' -D\log \sqrt{2\pi e} - D \log \sigma_{\max})\frac{e^{\hat{\sigma}_i}}{\sum_{j=1}^{D}e^{\hat{\sigma}_j}} , \log\sigma_{\min}\right)\right]} $$Here, $\mathcal{H}_0'$ is the target entropy plus a compensation parameter $\delta \ge 0$ to account for the bounding bias. This parameter can be a constant or automatically tuned by minimizing the following loss:
$$ L(\hat{\delta}) = \mathbb{E}_{s \sim \mathcal{D}} \left[\hat{\delta}(\mathcal{H}[\pi(\cdot|s)] - \mathcal{H}_0)\right] $$ Please refer to our paper for a detailed derivation, implementation details and proof of the entropy bound provided by ERA in continuous control settings.Policy Visualization
We visualize the behaviors of SAC-ERA agents in different tasks to showcase the effectiveness and stability of the learned policies.
Dog Run

Dog Walk

Humanoid Run

Humanoid Walk

H1 Run

H1 Walk

H1 Slide

H1 Stand

Applying ERA to Image Classification
In discrete classification, regularizing predictive entropy is crucial for preventing overconfidence. For a softmax policy, we transform the pre-activation logits $z$ into $z'$ to ensure the policy's entropy is at least a target value $\mathcal{H}_0$:
$$ z' = h^{-1}\left[\max \left(\frac{\log \tau}{\tau} + \left(C_{\mathcal{H}_0} - n \frac{\log \tau}{\tau}\right)\frac{1}{D-1}\left(1 - \frac{e^{z_i}}{\sum_{j=1}^{D}e^{z_j}}\right), 0\right)\right] $$Unlike label smoothing which applies uniform regularization, ERA allows the model to learn a structured, input-dependent uncertainty distribution, tailoring the regularization to each sample for greater expressive capacity and improved performance. For a detailed demonstration, derivation, and proof of the entropy bound provided by ERA in classification settings, please refer to our paper.
Performance on ImageNet and CIFAR-10
Comparison with Other Regularization Methods
To investigate the effectiveness of ERA against common regularization methods, we conducted comparative experiments on CIFAR-10 against various intensities of Label Smoothing and Dropout. The results below show that increasing label smoothing intensity can harm performance, and dropout offers marginal gains. In contrast, ERA consistently and effectively enhances model performance, validating its advantage over conventional regularization methods.
BibTeX
@misc{kang2025entropyregularizingactivationboosting,
title={Entropy Regularizing Activation: Boosting Continuous Control, Large Language Models, and Image Classification with Activation as Entropy Constraints},
author={Zilin Kang and Chonghua Liao and Tingqiang Xu and Huazhe Xu},
year={2025},
eprint={2510.08549},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2510.08549},
}