Learning in Spiking Neural Networks by Reinforcement of Stochastic Synaptic Transmission

郑重声明：原文参见标题，如有侵权，请联系作者，将会撤销发布！

Neuron, no. 6 (2003): 1063-1073

Summary

　　众所周知，化学突触传递是一个不可靠的过程，但这种不可靠的函数尚不清楚。在此，我考虑了一个假设，即突触传递的随机性是由大脑用来学习的，类似于达尔文进化论利用基因突变的方式。这是可能的，如果突触是"享乐主义的"，通过增加它们的囊泡释放或失败的概率来响应一个全局奖励信号，这取决于奖励之前的动作。享乐主义突触通过计算平均奖励的梯度的随机近似来学习。它们与突触动态(如短期促进和抑制)以及树突状细胞整合和动作电位生成的复杂性相一致。一个享乐主义突触网络可以通过适当地给予奖励来训练以执行所需的计算，如这里通过IF神经元模型的数值仿真所示。

Introduction

　　许多类型的学习可以被视为优化。例如，操作性条件可以被视为动物适应其动作以最大化奖励的过程。“实践使之完美”的格言是指反复提高复杂的动作技能，例如弹钢琴或打网球。人们普遍认为，学习至少部分基于大脑突触组织的可塑性。因此，似乎存在为优化神经回路函数而量身定制的突触可塑性类型。

　　这种突触可塑性可以采取什么具体形式？为了激发想像力，从进化中汲取灵感是很有帮助的，进化是生物学优化过程的最著名例子。进化的一个令人着迷的方面是，它需要不完美的基因复制。这种不可靠性可能在其他方面似乎是不可取的，但是随机突变和重组对于产生变异实际上是必不可少的，变异允许进化以寻找改良的基因型。

Results

Training a Multilayer Network

Release-Failure Antagonism

The Matching Law

Dynamic Synapses

Postsynaptic Voltage Dependence

Postsynaptic Locus of Plasticity

Temporal Antagonism

Discussion

Hedonistic synapses are just a mechanism for stochastic gradient learning, a topic that has been studied extensively in the field of neural networks. What is new here?

I'm a synaptic physiologist. How can I look for hedonistic synapses in the brain?

There are many sources of randomness in the brain. Why do you single out stochastic vesicle release as the basis for stochastic gradient learning?

You've demonstrated learning with hedonistic synapses for some toy problems, but it will never scale up to really large networks

Even if stochastic gradient learning were applied to small neural circuits in the brain, wouldn't it still be too slow?

What if the reward signal is delayed in time? Won't that be catastrophic for the learning time of hedonistic synapses?

Do you believe that hedonistic synapses are the explanation of operant conditioning?

Could temporal antagonism alone be sufficient for stochastic gradient learning?

In your example, two out of three synapses would change in the right direction. Couldn't the circuit end up increasing its average reward anyway, in spite of the errant synapse?

Do hedonistic synapses require that reward and punishment be balanced so that the average reinforcement is zero?

What do you regard as the greatest weakness of your model?

Experimental Procedures

Hedonistic Synapse Model

REINFORCE Learning

REINFORCE Learning for Stochastic Synapses

REINFORCE as Stochastic Gradient Learning

Bias-Variance Tradeoff

Related Learning Rules

Variable-Interval Reward Schedule

Numerical Simulations