Now Reading
The Largest CAD Dataset Released With 15M Designs

The Largest CAD Dataset Released With 15M Designs

The Largest CAD Dataset Released With 15M Designs

In an attempt to automate industrial designing, researchers from Princeton University and Columbia University introduced a large dataset of 15 million two-dimensional real-world computer-aided designs — SketchGraphs. Along with that to facilitate research in ML-aided design, they also launched an open-source data processing pipeline. 

Introduced during the International Conference on Machine Learning, SketchGraphs is aimed to train the artificial intelligence machine with this large dataset, in order to expertise it to assist humans in creating CAD models. In a recent paper, researchers revealed that each of the CAD sketches is represented with a geometric constraint graph and the understanding of the line and shape sequence in which the design was initially created. This will enable the predictions of what is going to be designed next.

Register for our upcoming Masterclass>>

There have been many CAD data sets available by voxel or mesh, which have allowed users to work on sampling realistic 3D shapes for creating CAD models. However, these models are usually not modifiable in parametric design settings and thus not preferred for engineering workflows. SketchGraphs, on the other hand, approaches parametric modelling instead of focusing on 3D shape modelling.

Left: Example of a sketch; Right: A portion of its geometric constraint graph.

This large dataset can be used to train AI models directly from the targeted applications making it easier for engineers to design workflow. Further, by providing a set of rendering functions for sketches, the researchers are aiming to enable work on CAD inference from images.

The SketchGraphs Dataset For Creating CAD Models

Ranging from a simple part of a machine to an entire machine itself, CAD models, like AutoCAD, SolidWorks, and OnShape can be used to design anything. However, the SketchGraphs dataset was obtained from the public API of product development platform OnShape, which includes sketches of 15 years, resulting in over 15 million sketches.

The main reason for introducing SketchGraphs by researchers is to understand the underlying framework of how the geometry is constructed. And thus for each CAD sketch, the researchers aimed at extracting the ground truth construction operations for both the geometric primitives and the constraints attached to them.

Firstly the researchers leveraged OnShape’s API for gathering the metadata of all the public documents from 2015-2020. This provided the researchers with two million unique document IDs. Further, these unique documents contained multiple PartStudios with each one mentioning the design of the individual component of a CAD model. After extracting all the 2D sketches, omitting the non-sketch features, from each of the PartStudio, the researchers achieved 15 million sketches. 

Left: Histogram of sketch sizes. Middle: Number of constraints with respect to the numbers of primitives in the sketch. Right: Average node degree with respect to the number of primitives.

The sketches also had to undergo specific criteria of containing at least one geometric primitive and one geometric constraint, in order to get included in the dataset. Thus the dataset has a range of ketches starting from larger constraint graphs to simple ones on a single shape.

Applications of SketchGraphs Dataset

The researchers also noted some targeted applications for which they believe SketchGraphs dataset can be beneficial in order to train those models. Alongside, the researchers also highlighted the unexplored field of machine-designed focused applications, for which SketchGraphs can act as a testbed for future research.

The paper further demonstrated two cases of SketchGraphs dataset — Autoconstrain and Generative Modeling. For both, conditionally inferring constraints and unconditional generative modelling, the researchers provided initial benchmarking for these applications. 

See Also
Chinese PanGu Alpha GPT-3

Case in point — Autoconstraints, for which researchers suggest that by treating the primitives of the dataset sketches as input, the ground truth constraints become the predictive target. Post that the task of autoconstrain is to predict a set of constraints given as an input. The researchers for this proposed an auto-regressive model based on message passing networks.

Autoconstraining a sketch. Left: Original input of the sketch. Blue Arrows: User modifications. Modification A: Dragging the top circle’s upwards; Modification B: Both enlarging and dragging it to the right.

To evaluate the Autoconstrain model, the researchers predicted edges on a test dataset, where they obtained an average edge precision of 0.74. They further demonstrated the inferred constraints by editing a sample sketch and checking out the results of the solved state. 

Wrapping Up

Along with SketchGraphs, the large-scale dataset for CAD sketches, the researchers also introduced an open-source processing pipeline for ML-aided designs. Researchers believe that effective training of machine learning models to construct object designs has immense potential to encourage more efficient design workflows for engineers. And, “unsupervised learning on the SketchGraphs data will allow such possibilities for CAD designs,” stated the researchers.

Read the research paper here.

What Do You Think?

Join Our Discord Server. Be part of an engaging online community. Join Here.

Subscribe to our Newsletter

Get the latest updates and relevant offers by sharing your email.

Copyright Analytics India Magazine Pvt Ltd

Scroll To Top