Knowledge transfer between heterogeneous reinforcement learning agent
Journal Title: Science Paper Online - Year 2010, Vol 5, Issue 2
Abstract
Aiming at the problem of the existing knowledge transfer methods are only suitable for homogenous reinforcement learning agents, a kind of Q learning algorithm that can transfer knowledge between heterogeneous Agents with different state and action spaces. The main idea of the proposed Q learning algorithm can be described as the follows. Based on a task that was already learned by an old and a new Agent, a neural network was used to off-line learn a mapping relationship of Q value function between the two Agents. The constructed mapping of Q value function was then used to obtain Q value of the new Agent in a new task that was already learned by the old Agent while was not learned by the new Agent. The proposed Q learning algorithm can decrease the number of trials of the new Agent and so as to improve learning speed. Simulation results of 10×10 mazes illustrate the validity of the proposed Q learning algorithm.
Authors and Affiliations
Bo Liu, Ruhai Lei
Centrifugal cast microstructure of semisolid hypereutectic high chromium cast iron and its quantitative Analysis
For the semisolid slurry of hypereutectic high chromium cast iron prepared by slope cooling body method, it is fabricated into the semisolid annular part by centrifugal casting method in this paper. The microstructure is...
中空纤维更新液膜技术提取发酵液中<br /> 青霉素的经济效益评价<br />
采用中空纤维更新液膜实现了从发酵滤液中同步分离和富集青霉素的新型提取工艺。与现行的醋酸丁酯溶剂萃取工艺相比,萃取和反萃在同一设备内进行,省去了冷却和溶剂的蒸馏回收提纯过程,极大的简化了工艺流程,所需设备体积小,溶剂消耗量...
h-型自适应有限元法计算重力坝应力的<br /> 能量误差控制标准<br />
鉴于目前重力坝有限元计算仍没有相应应力取值标准,本文应用自适应有限元法,通过对L型板和重力坝进行多种情况下的线弹性自适应计算,提出了基于h-型自适应有限元法的全域能量误差限控制标准。算例表明,自适应计算不会因角缘应力集中而...
Reservoir-processing analysis and the development of pressure in Upper Paleozoic of Ordos Basin
This article takes a dynamic analysis, comprehensive research and theory of basin forming, hydrocarbon generation and hydrocarbon accumulation. Through the detailed analysis about sedimentary burial history, hydrocarbon-...
一种股骨头假体设计方法
为解决人工髋关节置换手术后假体无菌松动等并发症,适应假体个性化快速设计和制造的需求,采用医用CT技术、计算机辅助设计和有限元计算分析相结合的方法,依据患者股骨髓腔解剖形状和置换要求设计髋关节假体。用有限元软件ANSYS建立股骨...