MITB Banner

The Largest CAD Dataset Released With 15M Designs

Share

The Largest CAD Dataset Released With 15M Designs

Illustration by The Largest CAD Dataset Released With 15M Designs

In an attempt to automate industrial designing, researchers from Princeton University and Columbia University introduced a large dataset of 15 million two-dimensional real-world computer-aided designs — SketchGraphs. Along with that to facilitate research in ML-aided design, they also launched an open-source data processing pipeline. 

Introduced during the International Conference on Machine Learning, SketchGraphs is aimed to train the artificial intelligence machine with this large dataset, in order to expertise it to assist humans in creating CAD models. In a recent paper, researchers revealed that each of the CAD sketches is represented with a geometric constraint graph and the understanding of the line and shape sequence in which the design was initially created. This will enable the predictions of what is going to be designed next.

There have been many CAD data sets available by voxel or mesh, which have allowed users to work on sampling realistic 3D shapes for creating CAD models. However, these models are usually not modifiable in parametric design settings and thus not preferred for engineering workflows. SketchGraphs, on the other hand, approaches parametric modelling instead of focusing on 3D shape modelling.

Left: Example of a sketch; Right: A portion of its geometric constraint graph.

This large dataset can be used to train AI models directly from the targeted applications making it easier for engineers to design workflow. Further, by providing a set of rendering functions for sketches, the researchers are aiming to enable work on CAD inference from images.

The SketchGraphs Dataset For Creating CAD Models

Ranging from a simple part of a machine to an entire machine itself, CAD models, like AutoCAD, SolidWorks, and OnShape can be used to design anything. However, the SketchGraphs dataset was obtained from the public API of product development platform OnShape, which includes sketches of 15 years, resulting in over 15 million sketches.

The main reason for introducing SketchGraphs by researchers is to understand the underlying framework of how the geometry is constructed. And thus for each CAD sketch, the researchers aimed at extracting the ground truth construction operations for both the geometric primitives and the constraints attached to them.

Firstly the researchers leveraged OnShape’s API for gathering the metadata of all the public documents from 2015-2020. This provided the researchers with two million unique document IDs. Further, these unique documents contained multiple PartStudios with each one mentioning the design of the individual component of a CAD model. After extracting all the 2D sketches, omitting the non-sketch features, from each of the PartStudio, the researchers achieved 15 million sketches. 

Left: Histogram of sketch sizes. Middle: Number of constraints with respect to the numbers of primitives in the sketch. Right: Average node degree with respect to the number of primitives.

The sketches also had to undergo specific criteria of containing at least one geometric primitive and one geometric constraint, in order to get included in the dataset. Thus the dataset has a range of ketches starting from larger constraint graphs to simple ones on a single shape.

Applications of SketchGraphs Dataset

The researchers also noted some targeted applications for which they believe SketchGraphs dataset can be beneficial in order to train those models. Alongside, the researchers also highlighted the unexplored field of machine-designed focused applications, for which SketchGraphs can act as a testbed for future research.

The paper further demonstrated two cases of SketchGraphs dataset — Autoconstrain and Generative Modeling. For both, conditionally inferring constraints and unconditional generative modelling, the researchers provided initial benchmarking for these applications. 

Case in point — Autoconstraints, for which researchers suggest that by treating the primitives of the dataset sketches as input, the ground truth constraints become the predictive target. Post that the task of autoconstrain is to predict a set of constraints given as an input. The researchers for this proposed an auto-regressive model based on message passing networks.

Autoconstraining a sketch. Left: Original input of the sketch. Blue Arrows: User modifications. Modification A: Dragging the top circle’s upwards; Modification B: Both enlarging and dragging it to the right.

To evaluate the Autoconstrain model, the researchers predicted edges on a test dataset, where they obtained an average edge precision of 0.74. They further demonstrated the inferred constraints by editing a sample sketch and checking out the results of the solved state. 

Wrapping Up

Along with SketchGraphs, the large-scale dataset for CAD sketches, the researchers also introduced an open-source processing pipeline for ML-aided designs. Researchers believe that effective training of machine learning models to construct object designs has immense potential to encourage more efficient design workflows for engineers. And, “unsupervised learning on the SketchGraphs data will allow such possibilities for CAD designs,” stated the researchers.

Read the research paper here.

Share
Picture of Sejuti Das

Sejuti Das

Sejuti currently works as Associate Editor at Analytics India Magazine (AIM). Reach out at sejuti.das@analyticsindiamag.com
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.