MITB Banner

OpenAI Secretly Works on Q*, Inches Closer Towards AGI

This new development comes in the background of Andrej Karpathy thinking of centralisation and decentralisation lately.  

Share

Illustration by Diksha Mishra

Listen to this story

OpenAI is reportedly working on a project Q* (pronounced Q-Star), capable of solving unfamiliar math problems. 

A few people at OpenAI believe that Q* could be a big step towards achieving artificial general intelligence (AGI). At the same time, this new model is raising concerns among some AI safety researchers due to the accelerated advancements, particularly after watching the demo of the model circulated within OpenAI in recent weeks, as per The Information. 

The model is created by OpenAI’s chief scientist Ilya Sutskevar and other top researchers Jakub Pachocki and Szymon Sidor. 

Interestingly, this new development comes in the background of Andrej Karpathy – who also happened to be building JARVIS at OpenAI – recently posted  on X, saying that he has been thinking of centralisation and decentralisation lately.  

Karpathy is mostly talking about building an AI system where it involves a trade-off between centralisation and decentralisation of decision-making and information. In order to achieve optimal results, you have to balance these two aspects, and Q-Learning seems to be fitting perfectly in the equation to enable all of this. 

What is Q-Learning? 

Experts believe that Q* is built on the principles of Q-learning which is a foundational concept in the field of AI, specifically in the area of reinforcement learning. Q-learning’s algorithm is categorised as model-free reinforcement learning, and is designed to understand the value of an action within a specific state. 

The ultimate goal of Q-learning is to find an optimal policy that defines the best action to take in each state, maximising the cumulative reward over time.

Q-learning is based on the notion of a Q-function, aka the state-action value function. This function operates with two inputs: a state and an action. It returns an estimate of the total reward expected, starting from that state, alongside taking that action, and thereafter following the optimal policy. 

In simple instances, Q-learning maintains a table (known as the Q-table) where each row represents a state and each column represents an action. The entries in this table are the Q-values, which are updated as the agent learns through exploration and exploitation.

This is how it works: A key aspect of Q-learning is balancing exploration (trying new things) and exploitation (using known information). This is often managed by strategies like ε-greedy, where the agent explores randomly with probability ε and exploits the best-known action with probability 1-ε.

Share
Picture of Siddharth Jindal

Siddharth Jindal

Siddharth is a media graduate who loves to explore tech through journalism and putting forward ideas worth pondering about in the era of artificial intelligence.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.