There are a few exciting machine learning concepts that do not receive nearly enough attention. Let’s look at Constructing skill trees, Prefrontal cortex basal ganglia working memory, State–action–reward–state–action, and Sammon mapping.
Constructing Skill Trees
Constructing skill trees (CST) is a hierarchical reinforcement learning technique that can create skill trees from a series of example solution trajectories gathered through demonstration. CST segments each demonstration route into skills and integrates the results into a skill tree using an incremental MAP (maximum a posteriori) change point detection technique. It uses a changepoint detection algorithm to partition each trajectory into a skill chain by recognising a suitable abstraction change or a segment too complex to characterise as a single skill. Each trajectory’s skill chains are then combined to build a skill tree.
CST is a considerably faster algorithm than skill chaining for learning. Even a failure can help you enhance your talent. Agent-centric features can be utilised to learn skills that can be applied to different issues. In the PinBall area, CST has been utilised to learn abilities through human demonstration. On a mobile manipulator, it has also been used to learn skills from human demonstration.
Prefrontal Cortex Basal Ganglia Working Memory
The algorithm prefrontal cortex basal ganglia working memory (PBWM) represents working memory in the prefrontal cortex and the basal ganglia, according to a research study. In terms of functioning, it’s similar to long short-term memory (LSTM) but is more biologically definable. PBWM was inspired by LSTM and provides flexible memory regulation but was built with a heavy focus on biological plausibility. Sensory stimuli are only allowed into PBWM’s Working Memory (WM) shop in an all-or-none approach.
However, researchers have pointed out that the exact functionality of PBWM is masked by the fact that it is a complex model with a highly interwoven architecture of a variety of neural subsystems and several parallel learning algorithms, both supervised and unsupervised. A few years back, a group of researchers proposed a simpler PBWM model that concentrates on just one key component of the technique: the employment of internal gating events to govern memory content. This model substitutes a more abstract tabular representation of all possible input and memory states for all physiologically based neural subcomponents. The simplified PBWM model thus ignores most of PBWM’s biological realism, but it does highlight its basic capability. It is the control over memory content by internal gating actions, which can be learned through reinforcement learning alone.
The state–action–reward–state–action (SARSA) method is a reinforcement learning approach for learning a Markov decision process policy. The SARSA algorithm is a slightly modified version of the well-known q-Learning algorithm. In any reinforcement learning algorithm, a learning agent’s policy can be one of two types: on-policy and off-policy.
The greedy strategy is used to learn the q-value in the q-Learning technique, which is an off-policy technique. The SARSA approach, on the other hand, is an on-policy that learns the q-value from the present policy’s activity. The most significant distinction between SARSA and q-learning is that the greatest reward for the following state is not always used to update the q-values. Instead, the same policy that decides the initial action is used to select a new action and therefore reward. SARSA gets its name from the fact that it uses the quintuple Q(s, a, r, s’, a’) to perform updates. Where s and a represent the initial state and action, r represents the reward observed in the next state, and s’ and a’ represent the subsequent state-activity combination.
Sammon mapping, also known as Sammon projection, is an algorithm for mapping a high-dimensional space to a lower-dimensional space while attempting to preserve the structure of inter-point distances in the higher-dimensional region. It’s especially well-suited to exploratory data analysis. According to a study, unlike principal component analysis, the mapping cannot be expressed as a linear combination of the original variables, making it more difficult to employ for classification purposes.
Since its introduction in 1969, the Sammon mapping has been one of the most successful nonlinear metric multidimensional scaling methods; however, efforts have been focused on algorithm improvements rather than the form of the stress function. Through the use of left and right Bregman divergence, the Sammon mapping’s performance has been improved.