The Largest CAD Dataset Released With 15M Designs

The Largest CAD Dataset Released With 15M Designs

In an attempt to automate industrial designing, researchers from Princeton University and Columbia University introduced a large dataset of 15 million two-dimensional real-world computer-aided designs — SketchGraphs. Along with that to facilitate research in ML-aided design, they also launched an open-source data processing pipeline. 

Introduced during the International Conference on Machine Learning, SketchGraphs is aimed to train the artificial intelligence machine with this large dataset, in order to expertise it to assist humans in creating CAD models. In a recent paper, researchers revealed that each of the CAD sketches is represented with a geometric constraint graph and the understanding of the line and shape sequence in which the design was initially created. This will enable the predictions of what is going to be designed next.

There have been many CAD data sets available by voxel or mesh, which have allowed users to work on sampling realistic 3D shapes for creating CAD models. However, these models are usually not modifiable in parametric design settings and thus not preferred for engineering workflows. SketchGraphs, on the other hand, approaches parametric modelling instead of focusing on 3D shape modelling.


Sign up for your weekly dose of what's up in emerging technology.

Left: Example of a sketch; Right: A portion of its geometric constraint graph.

This large dataset can be used to train AI models directly from the targeted applications making it easier for engineers to design workflow. Further, by providing a set of rendering functions for sketches, the researchers are aiming to enable work on CAD inference from images.

Download our Mobile App

The SketchGraphs Dataset For Creating CAD Models

Ranging from a simple part of a machine to an entire machine itself, CAD models, like AutoCAD, SolidWorks, and OnShape can be used to design anything. However, the SketchGraphs dataset was obtained from the public API of product development platform OnShape, which includes sketches of 15 years, resulting in over 15 million sketches.

The main reason for introducing SketchGraphs by researchers is to understand the underlying framework of how the geometry is constructed. And thus for each CAD sketch, the researchers aimed at extracting the ground truth construction operations for both the geometric primitives and the constraints attached to them.

Firstly the researchers leveraged OnShape’s API for gathering the metadata of all the public documents from 2015-2020. This provided the researchers with two million unique document IDs. Further, these unique documents contained multiple PartStudios with each one mentioning the design of the individual component of a CAD model. After extracting all the 2D sketches, omitting the non-sketch features, from each of the PartStudio, the researchers achieved 15 million sketches. 

Left: Histogram of sketch sizes. Middle: Number of constraints with respect to the numbers of primitives in the sketch. Right: Average node degree with respect to the number of primitives.

The sketches also had to undergo specific criteria of containing at least one geometric primitive and one geometric constraint, in order to get included in the dataset. Thus the dataset has a range of ketches starting from larger constraint graphs to simple ones on a single shape.

Applications of SketchGraphs Dataset

The researchers also noted some targeted applications for which they believe SketchGraphs dataset can be beneficial in order to train those models. Alongside, the researchers also highlighted the unexplored field of machine-designed focused applications, for which SketchGraphs can act as a testbed for future research.

The paper further demonstrated two cases of SketchGraphs dataset — Autoconstrain and Generative Modeling. For both, conditionally inferring constraints and unconditional generative modelling, the researchers provided initial benchmarking for these applications. 

Case in point — Autoconstraints, for which researchers suggest that by treating the primitives of the dataset sketches as input, the ground truth constraints become the predictive target. Post that the task of autoconstrain is to predict a set of constraints given as an input. The researchers for this proposed an auto-regressive model based on message passing networks.

Autoconstraining a sketch. Left: Original input of the sketch. Blue Arrows: User modifications. Modification A: Dragging the top circle’s upwards; Modification B: Both enlarging and dragging it to the right.

To evaluate the Autoconstrain model, the researchers predicted edges on a test dataset, where they obtained an average edge precision of 0.74. They further demonstrated the inferred constraints by editing a sample sketch and checking out the results of the solved state. 

Wrapping Up

Along with SketchGraphs, the large-scale dataset for CAD sketches, the researchers also introduced an open-source processing pipeline for ML-aided designs. Researchers believe that effective training of machine learning models to construct object designs has immense potential to encourage more efficient design workflows for engineers. And, “unsupervised learning on the SketchGraphs data will allow such possibilities for CAD designs,” stated the researchers.

Read the research paper here.

Support independent technology journalism

Get exclusive, premium content, ads-free experience & more

Rs. 299/month

Subscribe now for a 7-day free trial

More Great AIM Stories

Sejuti Das
Sejuti currently works as Associate Editor at Analytics India Magazine (AIM). Reach out at

AIM Upcoming Events

Early Bird Passes expire on 3rd Feb

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

All you need to know about Graph Embeddings

Embeddings can be the subgroups of a group, similarly, in graph theory embedding of a graph can be considered as a representation of a graph on a surface, where points of that surface are made up of vertices and arcs are made up of edges