Listen to this story
Believing that the missing piece to get to AGI is the part where machines have the ability to think or rather have ‘common sense’, former Google DeepMind senior research scientist with a PhD from MIT, Tejas Kulkarni, started Common Sense Machines (CSM). Kulkarni along with Max Kleiman-Weiner, PhD researcher from MIT, and scout investor at Sequoia Capital, started CSM, an end-to-end generative AI platform that creates game-engine ready content. “By 2019 it was clear that LLMs were beginning to work, and image generation was also pretty far along. It was evident that 3D would be next, and the core problem that we had to solve was image to 3d.” said Kulkarni, who believes that 3D has been an unsolved problem in AI for a long time.
With everyone on the way to becoming a gamer, Jensen Huang, chief of NVIDIA, recently told AIM on how AI has revolutionised computer graphics and even exclaimed on how companies are now training AI agents on games to “build something crazy.” In comes CSM-kind of companies that build 3D assets which can be integrated into games and metaverse. “CSM is building content creation layers where our products can go into Omniverse, and a myriad of other engines,” said Kulkarni. UGC (user-generated content) creators consisting of gamers, hobbyists, and professionals who want to ideate, create assets using CSM, and use a 3d printer to build the model, serve as their core audience. “For UGC, you need to build easier intuitive forms of user experiences,” he said. 3D artists, studios, animators, robotics companies and people building visualisation architecture also form their target audience.
GPT of The 3D World
Recently, the company released CSM Cube that allows various modalities for input such as image, video and text. The 3D foundation models are built on an inference stack that combines techniques derived from geometric deep learning, diffusion models, neural radiance fields, computer graphics and 3D computer vision.
Subscribe to our Newsletter
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
“We were the first to build an image to 3D model at our scale- nobody had created it. There isn’t a GPT of this world yet,” stated Kulkarni and acknowledges that CSM is still building such an architecture, with no such precedent. “In terms of algorithms, I think it’s a wild wild west, so nobody knows.”
Building ‘Common Sense’
By creating a 3D environment, one can give an understanding of the physical field. “Right now LLMs predict text but they don’t really have an understanding of objects, people, agents, beliefs, goals and spaces. Whether it’s in a virtual environment or physical environment, that capability is still quite missing in current AI systems,” said Kulkarni. “ Here, we are first creating assets, and then we’re gonna give them movement and then they’re gonna have common sense.”
In a far-fetched goal, CSM would build assets that can probably operate just like how a real character would operate in this environment. A similar development was last tested with AI agents in a virtual world, where a group of researchers from Stanford and Google created 25 AI agents/avatars with different identities that interacted and simulated believable human behaviour.
Kulkarni also spoke about the current gaps in transformer models that can prevent machines from thinking on their own : every token in a transformer model is not grounded (the process of connecting or linking words, and concepts used in language to their real-world referents or meanings). Currently, the tokens are misaligned which leads to hallucinations, and people try ways to solve it which doesn’t get resolved. “I think if you really want to solve the problem, there’s an existence proof which is humans. Unless we ground each word to everything that that object refers to, the missing bits will remain.
Evading Competition Until Now
Kulkarni’s Indian roots delve into Pune. Having done his schooling here, Kulkarni then moved to pursue his BS at Purdue University in Indiana. Spanning over 10 years, much before CSM was founded, Kulkarni’s interest in AI and 3D generation was evident.
Founded in 2020, Common Sense Machines has raised a total of $10M in funding, and is backed by VC corporations Intel Capital, Toyota Ventures among others. With a team of 15 people, CSM is working on new developments, and an API option will be soon offered to users. However, competition is shaping up in the field.
“In the last two years, we were playing in no man’s land, but I think now we have escaped that. Now the question is, how do we navigate?” Considering having a head start in the space, Kulkarni is not worried about big tech , which are aligned, but rather other content companies that may be more threatening – “I think all the usual suspects like media companies such as Shutterstock, Getty, older content companies, and game engines such as Unity and Unreal.”
Talking about Midjourney, which Kulkarni counts as a new generation startup, he said that they will probably experiment with 3D as well, but how deep will they go or become discord focused is what needs to be seen. Interestingly, Nick St. Pierre, creative director and community developer in AI and art, tweeted that Midjourney is working on 3D and it might be released soon. However, it will not generate mesh (a 3D object’s surface geometry) but will be more focused on general quality of reflections, transparency, and light field-like outputs like NeRFs – placing CSM ahead at the moment.