A Spatial Cognitive Model that Integrates the Effects of Endogenous and Exogenous Information on the Hippocampus and Striatum

Jing Huang He-Yuan Yang Xiao-Gang Ruan Nai-Gong Yu Guo-Yu Zuo Hao-Meng Liu

Citation: J. Huang, H. Y. Yang, X. G. Ruan, N. G. Yu, G. Y. Zuo, H. M. Liu. A spatial cognitive model that integrates the effects of endogenous and exogenous information on the hippocampus and striatum. International Journal of Automation and Computing. http://doi.org/10.1007/s11633-021-1286-z doi:  10.1007/s11633-021-1286-z
Citation: Citation: J. Huang, H. Y. Yang, X. G. Ruan, N. G. Yu, G. Y. Zuo, H. M. Liu. A spatial cognitive model that integrates the effects of endogenous and exogenous information on the hippocampus and striatum. International Journal of Automation and Computing . http://doi.org/10.1007/s11633-021-1286-z doi:  10.1007/s11633-021-1286-z

doi: 10.1007/s11633-021-1286-z

A Spatial Cognitive Model that Integrates the Effects of Endogenous and Exogenous Information on the Hippocampus and Striatum

More Information
    Author Bio:

    Jing Huang received the Ph. D. degree in pattern recognition and intelligent system from Beijing University of Technology, China in 2016. Now she is an associate professor in Faculty of Information Technology, Beijing University of Technology, China. Her research interests include cognitive robotics, machine learning, and artificial Intelligence. E-mail: huangjing@bjut.edu.cn (Corresponding author) ORCID iD: 0000-0001-8804-7150

    He-Yuan Yang received the B. Sc. degree in automation from North China University of Water Resources and Electric Power (NCWU), China in 2019. He is currently a master student in control science and engineering at Faculty of Information Technology of Beijing University of Technology, China. His research interest is cognitive robotics. E-mail: yangheyuan@emails.bjut.edu.cn

    Xiao-Gang Ruan received the Ph. D. degree in control science and engineering from Zhejiang University, China in 1992. Now he is a professor of Beijing University of Technology, and he is also as a director of Institute of Artificial Intelligent and Robots (IAIR). His research interests include automatic control, artificial intelligence, and intelligent robot. E-mail: adrxg@bjut.edu.cn

    Nai-Gong Yu received the B. Eng. degree in information processing display and recognition from Harbin Institute of Technology, China in 1989, the M. Eng. degree in control science and engineering from Shanghai Jiao Tong University, China in 1996, and the Ph. D. degree in pattern recognition and intelligent systems from Beijing University of Technology, China in 2005. He worked as a visiting scholar in University of Alberta, Canada in 2011. He is currently a professor with Faculty of Information Technology, Beijing University of Technology, China. His research interests include computational intelligence, intelligent systems and robotics. E-mail: yunaigong@bjut.edu.cn

    Guo-Yu Zuo received the Ph. D. degree in cybernetics from Beijing University of Technology, China in 2005. He is currently an associate professor and head of Intelligent Robot Laboratory of Beijing University of Technology, China. He has published over 50 journal and conference articles and achieved over 20 Chinese patents in artificial intelligence and robotics. His research interests include computational intelligence, robot learning, robot control, and human-robot interaction. E-mail: zuoguoyu@bjut.edu.cn

    Hao-Meng Liu received the B. Sc. degree in computer science and technology from Beijing University of Technology (BJUT), China in 2019. He is currently a master student in control engineering at Faculty of Information Technology of Beijing University of Technology, China. His research interest is industrial big data. E-mail: 15570122@emails.bjut.edu.cn

图(16) / 表(2)
计量
  • 文章访问数:  5
  • HTML全文浏览量:  15
  • PDF下载量:  3
  • 被引次数: 0
出版历程
  • 收稿日期:  2020-10-08
  • 录用日期:  2021-01-28
  • 网络出版日期:  2021-03-20

A Spatial Cognitive Model that Integrates the Effects of Endogenous and Exogenous Information on the Hippocampus and Striatum

doi: 10.1007/s11633-021-1286-z
    作者简介:

    Jing Huang received the Ph. D. degree in pattern recognition and intelligent system from Beijing University of Technology, China in 2016. Now she is an associate professor in Faculty of Information Technology, Beijing University of Technology, China. Her research interests include cognitive robotics, machine learning, and artificial Intelligence. E-mail: huangjing@bjut.edu.cn (Corresponding author) ORCID iD: 0000-0001-8804-7150

    He-Yuan Yang received the B. Sc. degree in automation from North China University of Water Resources and Electric Power (NCWU), China in 2019. He is currently a master student in control science and engineering at Faculty of Information Technology of Beijing University of Technology, China. His research interest is cognitive robotics. E-mail: yangheyuan@emails.bjut.edu.cn

    Xiao-Gang Ruan received the Ph. D. degree in control science and engineering from Zhejiang University, China in 1992. Now he is a professor of Beijing University of Technology, and he is also as a director of Institute of Artificial Intelligent and Robots (IAIR). His research interests include automatic control, artificial intelligence, and intelligent robot. E-mail: adrxg@bjut.edu.cn

    Nai-Gong Yu received the B. Eng. degree in information processing display and recognition from Harbin Institute of Technology, China in 1989, the M. Eng. degree in control science and engineering from Shanghai Jiao Tong University, China in 1996, and the Ph. D. degree in pattern recognition and intelligent systems from Beijing University of Technology, China in 2005. He worked as a visiting scholar in University of Alberta, Canada in 2011. He is currently a professor with Faculty of Information Technology, Beijing University of Technology, China. His research interests include computational intelligence, intelligent systems and robotics. E-mail: yunaigong@bjut.edu.cn

    Guo-Yu Zuo received the Ph. D. degree in cybernetics from Beijing University of Technology, China in 2005. He is currently an associate professor and head of Intelligent Robot Laboratory of Beijing University of Technology, China. He has published over 50 journal and conference articles and achieved over 20 Chinese patents in artificial intelligence and robotics. His research interests include computational intelligence, robot learning, robot control, and human-robot interaction. E-mail: zuoguoyu@bjut.edu.cn

    Hao-Meng Liu received the B. Sc. degree in computer science and technology from Beijing University of Technology (BJUT), China in 2019. He is currently a master student in control engineering at Faculty of Information Technology of Beijing University of Technology, China. His research interest is industrial big data. E-mail: 15570122@emails.bjut.edu.cn

English Abstract

Citation: J. Huang, H. Y. Yang, X. G. Ruan, N. G. Yu, G. Y. Zuo, H. M. Liu. A spatial cognitive model that integrates the effects of endogenous and exogenous information on the hippocampus and striatum. International Journal of Automation and Computing. http://doi.org/10.1007/s11633-021-1286-z doi:  10.1007/s11633-021-1286-z
Citation: Citation: J. Huang, H. Y. Yang, X. G. Ruan, N. G. Yu, G. Y. Zuo, H. M. Liu. A spatial cognitive model that integrates the effects of endogenous and exogenous information on the hippocampus and striatum. International Journal of Automation and Computing . http://doi.org/10.1007/s11633-021-1286-z doi:  10.1007/s11633-021-1286-z
    • Cognitive science and neurophysiology can help us to understand the origins of cognition and intelligence. Inspired by the progress made in earlier studies, reproducing cognitive models of agents to make them behave like intelligent animals has become a promising approach in the research area of artificial intelligence[1-5].

      Research shows that the hippocampus in mammals is the core area of spatial cognition. The hippocampus and its adjacent regions contain a variety of neurons such as place cells[6], grid cells[7] and head-direction cells[8]. Place cells will fire when a rat arrives at a specific place, and the range corresponding to the firing activities is called the place field (PF)[9]. Hippocampal place cells establish the mapping between the brain area of the animal and the physical world, and this is considered the neurophysiological basis for the cognitive map[10].

      Simulating the mechanisms of how animals like rats form their spatial cognition on agents has attracted much attention. Many studies have focused on the computational models of hippocampal place cells, which can be divided into three types by the way they deal with the exogenous and endogenous information. For mammals, exogenous information is the visual, olfactory and auditory information they see, smell and hear when they move freely in the environment. Endogenous information is the self-motor information of their proprioception and vestibular sensations[11-13]. Some researchers think place cells depend only on exogenous information, and that the most representative model is the boundary vector cell model (BVC)[14-16]. The environment is described by several boundary vector cells, and the model describes the firing activity of place cells as a continuous function of the relative position of extended barriers (e.g., walls, large objects and impassable drop) in the animal′s surroundings. However, other researchers think place cells completely depend on endogenous information. In these cases, most of the models use artificial neural networks to associate endogenous information with the activities of place cells. Rolls et al.[17] established a competitive learning network to simulate the information-processing process of place cells according to lateral inhibition in the hippocampus. Yu et al.[18] used a back propagation (BP) neural network, and Zhou and Wu[19] used a radial basis function (RBF) neural network to simulate place cells. In the network, the firing rate of grid cells worked as the input, while the firing rate of all place cells was the expected output. The network selected the maximum firing rate of place cell as the current estimation of location.

      Although hippocampal place cells are very important for spatial cognition, their activities are not enough to help an animal to find its target. Animals need to combine the activities of other brain regions, e.g., those of the striatum, to complete the task of spatial navigation. The striatum, a brain region closely connected to the hippocampus, is mainly responsible for rewarding learning and action selection, which is also believed to be involved with spatial cognition[20]. The striatum receives the location information from the hippocampus and reward information from dopamine cells in the ventral tegmental area. It subsequently integrates various neuronal signals to participate in action regulation[21-23]. The computational models based on the striatum mainly relate it to reinforcement learning or action selection[24, 25], but few studies discuss how it works together with the hippocampus in spatial cognition.

      Much evidence shows that mammals use a variety of information to encode the environment[26]. For example, blind mice can find their way home, and the place cells still fire even without the activities of grid cells in the early development of rats. A few scholars have been inspired to integrate various kinds of information in the computation model. Doboli et al.[27] proposed a model of hippocampal place cells, in which endogenous and exogenous information are fused in a linear weighted way. Aggarwal[28] proposed a cognitive model integrating both motor input (via grid cells and vestibular inputs) and sensory input (e.g., vision, auditory and olfactory) based on continuous attractor networks, and explained the firing pattern of place cells. Madl et al.[29] hypothesized the mechanism of the integration of multiple information in hippocampal place cells from the perspective of mathematics. Their work mainly focused on the hippocampus and did not take other brain areas like the striatum into consideration. In addition, most of the research only discussed the cognitive mechanism or neurophysiological background of such phenomena, but few applied the models to agent navigation. Arleo and Gerstner[30] took into account both exogenous and endogenous information and combined reward learning to solve robot navigation. However, the exogenous information and the endogenous information did not work simultaneously, which means the robot used only one kind of information at one time. Finally, in the paper mentioned above, the exogenous information only involved visual information, which is not common for humans and animals. In fact, they always use multiple kinds of sensory information like vision and smell together when exploring the environment. Rodent experiments showed that rats encode environments with a combination of exogenous and endogenous information and take advantage of both cues to solve tasks[31, 32]. The results also showed that if the rats used exogenous information alone, they reached the learning standard one day later than those that used both types. Therefore,it is reasonable to combine endogenous and exogenous information in a model.

      In order to overcome the imperfections mentioned above, we propose a model that integrates the effects of endogenous and exogenous information to give a more biologically plausible and more efficient simulation of spatial recognition. This model mainly consists of the hippocampus and the striatum. The main contributions of the paper can be summarized as follows. First, the effects of both endogenous and exogenous information on the activities of place cells are considered in the model. Second, the model consists of not only the hippocampus but also the striatum, which helps analyze how different brain regions work together in spatial cognition. Lastly, although the striatum module is fully connected with the hippocampus module in our model, only the information representing the active space cells can be transmitted to the striatum module. This is different from previous models[33, 34] and more in accord with the physiological findings[35].

      The rest of this paper is arranged as follows. In Section 2, we introduce the model framework, its mathematical details and how the model works. In Section 3, we simulate the water maze environment[36] to test the spatial cognition ability of the model and compare our model with other models in terms of navigation performance. In Section 4, the discussion, we analyze the experimental results and the possible reasons for our findings. Finally, we conclude our work in Section 5.

    • We propose a bionic model that simulates the function of the hippocampus and striatum inspired by research on brain regions associated with spatial cognition in animals. The architecture is shown in Fig. 1. As shown in Fig. 1, the agent interacts with the environment constantly, perceives the environmental information and outputs actions to the environment. The model consists of the information perception module (IPM), the ventral tegmental area (VTA), hippocampus (HPC) and striatum (STR). IPM is responsible for perceiving all the information from the inner body or the outer environment, which refers to endogenous and exogenous information, respectively. For convenience, we limit the exogenous information in our model to visual and olfactory information. IPM transmits the information to the next module, namely, the HPC module. We chose place cells as the key elements to build the HPC module since place cells in the hippocampus are the neurophysiological basis of the cognitive map and play a major part in spatial cognition. Meanwhile, VTA, which contains many dopamine cells, is added to form the reward circuit with the striatum. Both HPC and VTA are connected to the STR module, which simulates the striatum and is responsible for reward learning and action selection. The neurons in STR are divided into two groups depending on their function: one group is responsible for action selection and consists of multiple neurons, each of which represents an action, while the other group is responsible for reward learning and includes only one neuron for simplicity.

      Figure 1.  Architecture of the model

      When the agent explores the environment, it perceives exogenous information (i.e., visual, olfactory information) and endogenous information (i.e., vestibular sensation, proprioception). Both types of information are used to form multidimensional perception. The information is transmitted to the place cells in the HPC module. The firing activity of the HPC corresponds to the real location and forms the cognitive map. The information is subsequently transmitted to the STR module. STR receives location information from HPC and reward information from VTA at the same time. It thus evaluates the state of the agent and selects the best action.

    • The perceptual information in the model includes exogenous and endogenous information. As mentioned earlier, exogenous information refers to visual and olfactory stimuli. Suppose the agent explores within a square whose side length is L, the visual information perceived by the agent at the location $(x,y)$ and time $t$. $v_t^k(x,y),\;(k = $$ 1,2,3,4)$ represents the distance from the agent to the four sides (Fig. 2).

      Figure 2.  Representation of visual information

      However, the agent in our model is an analog of the rats and, accordingly, is supposed to have a visual angle of less than 180°. It cannot measure all distances to the sides. In such a situation, we use the visual information perceived last time to represent it as shown in (1):

      $${{v}}_t^j(x,y) = {{v}}_{{t_n}}^j(x,y),\;\;{\rm{ ( }}j = 1,2,3,4)$$ (1)

      where ${t_n}$ is the last time when it can perceive the distance.

      Meanwhile, the noise is taken into consideration. Suppose the amplitude of the noise is $noise_{-}v$, the visual information with noise $\tilde v_t^k(x,y)$ is calculated according to (2):

      $$ \tilde v_t^k(x,y) = \frac{(v_t^k(x,y) + (L - v_t^k(x,y))\times noise_{-}v)}{L}. $$ (2)

      The value of $noise_{-}v$ is set within (0, 0.05) based on the work of Kulvicius et al.[37]

      Olfactory information is supposed to scatter according to a Gaussian distribution as shown in (3). Suppose there are ${N_o}$ kinds of odor sources in the environment. The intensity of a certain odor at the location $(x,y)$, denoted $o_t^k(x,y)$, is calculated in (3):

      $$\begin{split} o_t^k(x,y) = &\;\exp \left( - \left(\frac{{{{[(x - s_x^k)]}^2}}}{{2\sigma _o^2}} + \frac{{{{[(y - s_y^k)]}^2}}}{{2\sigma _o^2}}\right)\right){\rm{ }} \\ & {\rm{ }}(k{\rm{ \;=\; }}1,2, \cdots ,{N_o}) \end{split} $$ (3)

      where $s_x^k,s_y^k$ are the coordinates of the center of the ${k}$-th odor, and $\sigma _o^{}$ is the standard deviation of the Gaussian distribution, which indicates how fast the odor scatters. The closer the agent is to the center of odors, the higher the intensity. We also take noise into consideration. The olfactory signals with noise, denoted as $\tilde o_t^k(x,y)$, are represented in (4):

      $$\tilde o_t^k(x,y) = \frac{(o_t^k(x,y) + (1 - o_t^k(x,y))\times noise_{-}o)}{o_{\max }^k}$$ (4)

      where $noise_{-}o$ is the amplitude of noise, and $o_{\max }^k$ is the maximum intensity of the ${k}$-th odor. The range of $noise_{-}o$ is also within (0, 0.05)[37]. Both the visual information and the olfactory information at time $t$ are integrated into one vector to represent the whole exogenous information shown in (5):

      $$ \tilde EX_t^{}(x,y) = (\tilde v_t^1(x,y),\tilde v_t^2(x,y),\tilde v_t^3(x,y), \tilde v_t^4(x,y),\tilde o_t^k(x,y)). $$ (5)

      In addition to exogenous information, endogenous information is taken into consideration in our model. The endogenous information is about how the agent perceives its location through HPC,especially the space cells, as described by Arleo and Gerstner in earlier studies[30]. Therefore, endogenous information is calculated by the distance between the real location the agent occupies and the perceived location that activates a certain place cell as shown in (6) and (7):

      $$EN_t^{}(x,y) =\exp \frac{ - d{(l(t),{l_{pc}}(t))^2}}{2\sigma _{EN}^2}$$ (6)
      $$\tilde E{N_t}(x,y)=E{N_t}(x,y)+(1 - E{N_t}(x,y))\times noise_{-}EN$$ (7)

      where $EN_t^{}(x,y)$ denotes the endogenous information perceived at the location $(x,y)$ and $\tilde E{N_t}(x,y)$ denotes $EN_t^{}(x,y)$ with noise, $d( \cdot )$ is the distance function, $l(t)$ is the coordinates of the real location where the agent stands at time $t$, ${l_{pc}}(t)$ is the location that the activating place cell corresponds to at time $t$, $\sigma _{EN}^{}$ is the standard deviation of $EN_t^{}(x,y)$ and $noise_{-}EN$ is the amplitude of the noise. The range of $\sigma _{EN}^{}$ and $noise_{-}EN$ is (0, 1) and (0, 0.05), respectively[27].

      The endogenous information is a standard normal distribution of the distance function according to (6) and (7). This indicates that it measures how accurately the agent perceives its location within the place cells, i.e., the more accurately the robot perceives its location, the bigger the value becomes.

    • The hippocampus module, HPC, receives the output from the information perception module and processes it. As stated earlier, we regard the place cells as the key elements in HPC since they are the neurophysiological basis of the cognitive map. Here, we use a two-layer feed-forward neural network to simulate HPC as shown in Fig. 3.

      Figure 3.  Neural network architecture of HPC

      The first layer is responsible for transmitting information from the IPM. Many studies have shown that both the exogenous and endogenous information play an important role in spatial cognition[38-40]. However, the exact proportion of the two kinds of information in cognition remains unclear. Here, two constants ${g_{ex}}$ and ${g_{en}}$ are introduced to represent the proportions. ${g_{ex}}$ represents the proportion of the exogenous information, while ${g_{en}}$ represents the proportion of the endogenous information. Therefore ${g_{ex}} + {g_{en}} = 1$, ${g_{ex}},{g_{en}} > 0$.

      The second layer consists of N place cells and is responsible for generating the cognitive map for the environment. Each place cell in the layer receives the exogenous information and endogenous information from the former layer and becomes activated. The activation rate for each neuron is calculated according to (8).

      $$\begin{split} r_t^i =& \;\exp \Bigg( - \left(\frac{1}{m}\times{g_{ex}}\times(\tilde EX_t^{}(x,y) - {{{W}}_{i,ex}}(t)) + \right.\\[-2pt] &\left.\frac{1}{n}\times{g_{en}}\times(\tilde E{N_t}(x,y) - {{{W}}_{i,en}}(t))\right)^2\big/2\sigma _{pc}^2\Bigg) \end{split} $$ (8)

      where $r_t^i$ is the activation rate of the ${i}$-th neuron at time $t$, ${{{W}}_{i,ex}}(t)$ is the connection weight between the ${i}$-th neuron and the exogenous information input, and ${{{W}}_{i,en}}(t)$ is the connection weight between the ${i}$-th neuron and the endogenous information input. $m,n$ are the dimensions of the vectors $\tilde EX_t^{}(x,y)$ and $\tilde E{N_t}(x,y)$, respectively. In this paper, $m = 5, \,n = 1$. $\sigma _{pc}^{}$ is the standard deviation of the rate, $0 < \sigma _{pc}^{} < 1$.

      The connection weights between the layers are modified in the manner of “Winner-Take-All”. Only those place cells with the maximum firing rate are selected to update the related weights as shown in (9):

      $$win(t) = \mathop {\arg \max \;} \limits_i r_t^i.$$ (9)

      The weights of the selected connections are modified according to (10) and (11):

      $$\begin{split} &{{W}}_{win(t),ex}^{}(t + 1){\rm{ \;=\; }}{{{W}}_{win(t),ex}}(t) + \\ &\quad\quad\;\mu \times(\tilde EX_t^{}(x,y) - {{{W}}_{win(t),ex}}(t)) \end{split} $$ (10)
      $$\begin{split} &{{W}}_{win(t),en}^{}(t + 1){\rm{\; = \;}}{{{W}}_{win(t),en}}(t) + \\ &\quad\quad\;\mu \times(\tilde EN_t^{}(x,y) - {{{W}}_{win(t),en}}(t)) \end{split} $$ (11)

      where ${{win}}(t)$ is the selected cell at time $t$, and $\mu \in (0,1)$ is the learning rate.

      The HPC module works like a competitive neural network. In such a mechanism, we can find the place cells that match the external and internal information best. The firing activities of these place cells can form a cognitive map, and the primary task in spatial cognition is completed.

    • In related studies[41, 42], the function of the striatum is about reward learning and action selection. Therefore, in this paper, the STR module consists of two groups of neurons. One group is responsible for handling rewards, while the other is responsible for action selection as shown in Fig. 4.

      Figure 4.  Neural network connecting the hippocampus and striatum

      The reward-learning group includes only one neuron. It receives the reward signals from VTA and quantifies it according to (12), where ${R_t}$ is the reward value at time $t$.

      $${R_t} = \left\{ {\begin{array}{*{20}{l}} { - 1},&{{\rm{if}}\;{\rm{there}}\;{\rm{is}}\;{\rm{a}}\;{\rm{obstacle}}}\\ {10},&{{\rm{if}}\;{\rm{there}}\;{\rm{is}}\;{\rm{food}}}\\ 0,&{{\rm{otherwise}}}. \end{array}} \right.$$ (12)

      The action-selection group consists of eight cells, each of which represents one action, such as going East (E), South (S), West (W), North (N), Northeast (NE), Southeast (SE), Northwest (NW) and Southwest (SW). All the action-selection cells are fully connected with the place cells in HPC to obtain the cognition information as shown in Fig. 4.

      In this paper, we use a Q-learning-like method as the working mechanism of STR. The Q table is introduced as an important data structure, which is a set of $Q({s_t},{a_t})$. The action selection mechanism is based on the Q table and is described as follows.

      If $Q({s_t},{a_t}) = 0$, then the agent will randomly select an action with the probability of P, or keep the current direction with the probability of 1−P.

      If $Q({s_t},{a_t}) \ne 0$, then the agent will choose an action by $\varepsilon - $greedy strategy, i.e., it will choose the action with the biggest Q value with the probability of $1 - \varepsilon $, or move randomly with the probability of $\varepsilon $, $0 < \varepsilon < 1$.

      For simplicity, we design a filter ${F_i}({r_t})$ to select the place cells that are the most active as shown in (13).

      $${F_i}({r_t}) = \left\{ {\begin{array}{*{20}{l}} 1,&{{\rm{if}}\;\;r_t^i > \theta }\\ 0,&{{\rm{otherwise}}} \end{array}} \right.$$ (13)

      where $\theta $ is the threshold, $0 < \theta < 1$.

      After an action has been selected, the related Q value and the connection weight between the active place cells and the action-selected neuron are updated according to (14) and (15):

      $$Q({s_t},{a_t}) = \frac{{\displaystyle\sum\nolimits_i {{W_{i,{a_t}}}{F_i}({r_t})} }}{{\displaystyle\sum\nolimits_i {{F_i}({r_t})} }}$$ (14)
      $$\begin{split} & {{{W}}_{i,{a_t}}}(t + 1) = {{{W}}_{i,{a_t}}}(t) + \alpha ({R_{t + 1}} + \\ & \quad\quad\;\gamma {\max _a}Q({r_{t + 1}},a) - {{{W}}_{i,{a_t}}}(t)){F_i}({r_t}) \end{split} $$ (15)

      where ${{{W}}_{i,{a_t}}}(t)$ is the connection weight between the $i$-th active place cell and the selected action ${a_t}$ at time $t$, and $\gamma $ is the discount factor, $\gamma \in (0,1)$.

      Here, the Q values are stored in the network in a dispersed way in the form of the connection weights. Meanwhile, the updating that takes place every time corresponds to a few active place cells. This indicates that multiple locations are involved in one updating, and helps reduce the state space and speed of the algorithm. In addition, the introduction of the filter decreases the number of the place cells involved in the updating, which further improves the efficiency.

    • Fig. 5 shows the flow chart for the working algorithm.

      Figure 5.  Flow chart of the working algorithm

      The working algorithm is stated as follows.

      1) Parameter initialization: Initialize the main parameters and data structures.

      2) Environment exploration: The agent moves freely in the environment to perceive both the exogenous and endogenous information according to (1)−(7).

      3) Cognitive map formation: Calculate the firing rate of place cells in HPC and update the synaptic weight between IPM and HPC according to (8)−(11).

      4) Reward learning: Receive the signal from VTA and calculate the reward value according to (12).

      5) Action selection: Choose the most suitable action.

      6) Updating: Modify the Q value and connection weights according to (13)−(15).

      7) Judge: If the agent arrives at the goal or it takes too long, the algorithm ends. Otherwise, go back to Step 2) and continue exploration.

    • The water maze experiment was designed and presented by Morris in 1981[36]. The experiment forces animals to swim and to learn how to find a survival platform. It is mainly used to test the learning and memory ability of experimental animals in spatial cognition and is the first choice for behavioral research, especially learning and memory research. Therefore, we simulated the water maze experiment to test the validity of our model in spatial cognition navigation.

      The experimental environment is a 100×100 square space as shown in Fig. 6. The star in the figure represents the starting position of the agent, the yellow rectangle, which has a size of 20×20, represents the survival platform, and the red line is the path of the simulated rat. In the experiment, the agent starts from the star, moves freely and searches for the survival platform. The experiment ends when the agent finds the platform or time runs out. The agent is designed to be able to measure the distances from itself to the walls (although it cannot receive all the distance data at one time). It can also smell the odor of the food on the platform, which provides the exogenous information. The agent is also able to calculate its real position, which provides the endogenous information.

      Figure 6.  Experiment environment

      In this section, we carried out a set of experiments to test the performance of our model. First, we explain the settings for all the parameters in the model. Next, we describe the primary experiment carried out to test basic spatial cognition ability and adaptability. This was achieved by randomly changing the positions of the starting point or the survival platform. Obstacles are added to the environment to make it more complicated, which tests the robustness of the model. Finally, we compare our model with other similar models, including the SARSA($\lambda $) learning model[20] and another brain-inspired model[30], to test efficiency and performance in spatial cognition.

    • There are many parameters in our model, such as the learning rate $\mu $, the discount factor $\gamma $, various kinds of standard deviations and other parameters. We omit a detailed discussion here to make the paper concise for the reader, and we only list the specific settings of all parameters in Table 1. The parameters shown in Table 1 were determined by experimentation multiple times until the best results were obtained.

      Table 1.  Settings of all parameters in the model

      ParameterValueParameterValueParameterValue
      $noise_{-}v$0.03$noise_{-}o$0.03$noise_{-}EN$0.03
      ${\sigma _o}$0.02${\sigma _{EN}}$0.02${\sigma _{pc}}$0.07
      $L$100$N$400$\mu $0.05
      $\alpha $0.2$\gamma $0.9$P$0.5
      $\varepsilon $0.3$\theta $0.5${g_{ex}},{g_{en}}$0.6,0.4
    • In the primary experiments, we simulated the basic water maze environment. We randomly changed the position of the starting point and survival platform to verify the spatial cognition ability and adaptability. The results are shown in Figs. 7 and 8. We recorded the paths of the agent in the experiments using different positions of the starting point and the survival platform.

      Figure 7.  Paths in the experiments with different start points

      Figure 8.  Paths in the experiments with different survival platforms

      The results shown in the figures illustrate several characteristics of our model. First, the model successfully simulated the behavior of animals like rats in the Morris water maze. The agent could find the survival platform after exploring the environment, which proves it has spatial cognition ability like real animals. Second, the model is self-learning and unsupervised. No instructions or supervision were given to the agent throughout the whole exploration. The agent explored the environment autonomously. Its knowledge of the environment comes only from continuously interacting with the surroundings, which is in accordance with the way humans and animals learn skills. Third, the model is flexible and self-adapted. The agent still successfully found the right goal after we changed the positions of the start and the survival platform, which proves that it is not task-oriented and can adapt to new situations easily. Last but not least, the model shows gradual learning. The performance at the beginning of the experiment was usually not very good, but it improved with the process of learning until the agent finally reached the platform (Figs. 7 and 8). This is consistent with law of exercise presented by Thorndike et al.[43, 44]

      We also recorded the paths in the experiments (Fig. 8). As stated earlier, four experiments with different positions of the start and the survival platform were carried out. Each experiment included 25 trials. The results shown in Fig. 9 indicate the gradualness of learning. Spatial cognition gradually forms during the procedure. This observation also proves the memory ability of our model because memory is the basis of the gradualness of intelligence.

      Figure 9.  Number of steps in the 25 experiments

    • We added two obstacles to the environment to increase its complexity and to verify the robustness of our model. The two obstacles are represented by black rectangles in Fig. 10. When the agent encounters the obstacles, it is punished and returned to its previous position, and the action is chosen once again.

      Figure 10.  Paths of the agent in the advanced experiments

      In addition to the conclusions in the primary experiments, the agent could still find its way from the start to the goal despite the environment being more complicated and the positions of the start being changed if the agent encountered an obstacle (Fig. 10). This proves the robustness of our model. The model is stable and robust when the agent faces various kinds of changes to the environment.

      The final path in the experiments showed that the agent learned not only how to arrive at the platform but also how to avoid the obstacles when exploring. This indicates that the agent had formed a complete cognitive simulation of the environment.

    • In this section, we compared our model with other models to test the efficiency and performance in spatial cognition.

      1) Comparison with reinforcement learning algorithm

      The introduction of reinforcement learning into the animal navigation model can reflect behavior from a machine learning perspective[45, 46]. The essence of reinforcement learning is that agents constantly interact with and receive feedback from the environment and learn from it. Here we compared our model with the SARSA($\lambda $) algorithm in the context of the water maze experiment.

      The state-action-reward-state-action (SARSA) algori- thm is one of the most effective methods to solve the reinforcement learning problem[20]. The eligibility trace, which measures how eligible each state-action pair is for updating, has been introduced into SARSA to avoid the inherent disadvantages of slow value transfer and convergence. The SARSA algorithm with eligibility trace is known as SARSA($\lambda $)[20].

      The settings of the contrast environment are the same as in the primary experiment. After multiple trials, the parameters of SARSA($\lambda $) are set as follows: $\alpha = 0.02$, $\eta = 0.9$, $\lambda = 1$, where $\alpha $ is the learning rate, $\eta $ is the discount factor and $\lambda $ is the decay factor. In addition to these parameters, the state space of the algorithm includes 20×20 states, and the action space includes eight actions. This is similar to our model. The reward function in SARSA(λ) is also the same as in our model.

      The results of the SARSA(λ) algorithm and our model in 200 trials are shown in Figs. 11 and 12. Both models were able to make the agent complete the task and find the survival platform. However, our model is faster than the SARSA(λ) algorithm and can find a shorter path to the target. The number of steps to find the platform using the different methods is shown in Table 2.

      Figure 11.  Results of SARSA($\lambda $) algorithm

      Figure 12.  Results of our model

      Table 2.  Number of steps with different methods

      1510203050100200
      SARSA($\lambda $)970367382244179917739
      Our method160500542221192721

      2) Comparison with brain-inspired model

      Arleo and Gerstner[30] proposed a computational model of hippocampal activity, the Arleo and Gerstner′s (AG) model for spatial cognition and navigation tasks with exogenous and endogenous stimuli. In the AG model, the two kinds of information are integrated to establish and maintain the hippocampal place fields. The external stimuli are visual data, and the internal stimuli are the positions of the robot at the current moment in time. The model consists of three layers of neural network: visual cell layer, place cell layer and action cell layer. The visual cell layer receives and integrates sensory information. This layer then clusters the information into multiple active points to express the current environment. The place cell layer receives information from the visual cell layer and encodes the environment as a basis for goal-oriented spatial behavior. Reward-based learning is applied to map the place cell activity into action cell activity. Finally, the model outputs actions to guide the behavior of the animal.

      The settings of the environment and the parameters of our model (e.g., reward value, learning rate, decay rate and exploration rate) are kept identical with those published for the AG model[30] for direct comparison of the models (Fig. 13). In Figs. 13(a) and 13(b), the white rectangles represent the obstacles, the dark grey square represents the start position and the black line marks the path.

      Figure 13.  Results of different models after 50 trials

      The path of our model is straighter than the path observed for the AG model (Fig. 13). This indicates that the path length of our model is shorter and proves that the performance of our model is better than the performance of the AG model.

    • In this section, we try to analyze the possible reasons for the advantages of our model. The analysis is mainly from two perspectives: the effect of the endogenous information and the number of active place cells.

    • As mentioned in the introduction, rats that use combined information reach the learning standard a day earlier than those that only use the exogenous information. Therefore, the endogenous information is regarded to have significant effects on spatial cognition, which inspired us to develop our model.

      Will the phenomena observed in the physiological experiments be reproduced in our model? We carried out the following experiments to answer this question. The settings of the experiments were the same as for the primary experiment. The experiments included two groups: one group took the combined information as the input (${g_{ex}} = 0.6,\,{g_{en}} = 0.4$) while the other only received the exogenous information (i.e., ${g_{ex}} = 1,\,{g_{en}} = 0$). Each group was executed 50 times. The number of steps in every trial was recorded (Fig. 14). In order to avoid the effect of accident factors, we changed the start position to the same position as in the primary experiment.

      Figure 14.  Number of steps in the experiments with or without endogenous information

      It can be seen from Fig. 14 that the step number of the blue line is lower on average than that of the orange line, and the blue line is flatter than the orange line, which indicates that the performance of the model with both types of information is better irrespective of the path length or the stability. The results are completely in accordance with the conclusions of physiological experiments, and the effect of the endogenous information on spatial cognition is proven once again by this experiment.

      In our opinion, the exogenous information plays a prime role in spatial cognition, while the endogenous information is a necessary complement to it. This improves the accuracy of the space perception and thus also improves the space cognition.

    • As mentioned above, the firing activities of place cells build up the mapping from the real world to the brain and are considered as the physiological basis for the cognitive map. Building up the cognitive map is the key content in spatial cognition. Other issues in spatial cognition are no longer difficult once the cognitive map is being built up. Since the place cells have significant impact on spatial cognition, it is reasonable to measure the ability for spatial cognition by observing the activities of the place cells or to explain the performance in spatial cognition by the activities of the cells.

      We recorded the number of active place cells to explain the performance of our model. Here, the word active means that the firing rate of the place cell is above the given threshold in accordance with (13).

      The result is shown in Fig. 15. We can see from the diagram that as the threshold increases, the number of active place cells decreases. Irrespective of how it changes, the number of active place cells for the model with combined information is always more than that of the model with only exogenous information. This indicates that the overall firing rate of place cells for the combined information is always higher than for the exogenous only information. A higher firing rate can activate more place cells, and more place cells can provide more details about the location because of their location specificity. Thus, a more accurate cognitive map can be built up, and the performance in spatial cognition is improved.

      Figure 15.  Number of active place cells with different thresholds

      Why could the integration of both exogenous and endogenous information shorten the path length and achieve better results in the experiments? The reason may be twofold. Generally speaking, the fusion of both kinds of information increases the firing rate of place cells so that more place cells are activated. From a neurophysiological and cognitive perspective, such an increase indicates that more environmental information is obtained so as to form a better spatial cognition for animals. In fact, the design in our model that multiple place cells are activated when the agent is in a certain position is totally in accordance with O′Keefe′s findings[35], which illuminates the biological plausibility of our model. On the other hand, from a computational perspective, more activated place cells will lead to more items in Q-table updated in the learning procedure shown in Fig. 16, which records the number of updated Q values in 200 trials with or without combined information. Compared with single-step updating in traditional reinforcement learning, the way that a group of Q values update in each trial not only speeds up learning, but also extends the exploration range, which helps find the optimal route.

      Figure 16.  Number of updated Q values in 200 trials

    • This paper presents a spatial cognitive navigation model that integrates the effects of endogenous and exogenous information on the hippocampus and striatum. The model helps analyze how different information and brain regions work together in spatial cognition. The new model differs in significant ways from previous models. The connection between HPC and STR is improved so that only the information representing the active space cells can be transmitted to the striatum module. This situation is more in agreement with the physiological findings. Meanwhile, both exogenous and endogenous information is considered in the model, which makes the model biologically more plausible and more efficient. An improved Q-learning-like method is presented to simulate the function of the striatum in the STR module. The Q values are stored in a dispersed way in the neural network in the form of the connection weights. Each update corresponds to a few active place cells,which indicates that multiple locations are involved, and this helps reduce the state space and speed of the algorithm.

      The classic psychological experiment for spatial cognition, the Morris water maze, was reproduced to test the validity of our model. We carried out a series of experiments based on a water maze that included primary and advanced experiments and compared our model with a reinforcement learning algorithm and another brain-inspired model. Our model demonstrates improvements in self-adaptability, robustness and efficiency. Moreover, we tried to analyze the possible reasons for the advantages of our model from the effect of the endogenous information and the number of active place cells. We argue that the endogenous information is a necessary complement to exogenous information, which can improve the accuracy of the space perception and thus also improve the space cognition. In addition, our model increases the firing rate of place cells, which helps activate more place cells, builds up a more accurate cognitive map and improves the spatial cognition performance.

      We recognize that the mathematical model of endogenous information described in this paper is relatively simple, and that it reduces the influence of other cells in the hippocampus. The way in which exogenous and endogenous information combines also needs to be studied further. For example, how is the contribution ratio decided dynamically and autonomously for the two types of information? All of these topics are targets for further research.

    • This work was supported by National Natural Science Foundation of China (Nos. 61773027 and 62076014), National Key Research and Development Program Project (No. 2020YFB1005903), and Industrial Internet Innovation and Development Project (No. 135060009002).

参考文献 (46)

目录

    /

    返回文章
    返回