Now Reading
How Deep Reinforcement Learning Can Make Factories Efficient & Dispatch Products Faster

How Deep Reinforcement Learning Can Make Factories Efficient & Dispatch Products Faster

Manufacturing and production systems have a lot of catching up to do with the world of software. The manufacturing ecosystem has seen a lot of upgrade and innovation but it still lags in terms of software application. With the onslaught of artificial intelligence, new opportunities have opened for the sector to leverage new technology and improve productivity.

In the recent decade, deep learning is driving most of the innovation in AI. Deep learning systems have found applications in a variety of fields such as healthcare, aviation, agriculture and many others. The ability to experiment with different architectures that specialise in specific tasks has also been a special feature in modern-day deep learning systems.

Register for FREE Workshop on Data Engineering>>

Deep Improvements In Dispatching 

One of the most important and critical parts of manufacturing is the dispatching rule. It minimises inventory costs and ensures that goods are delivered on time. With the internet allowing consumers to express their wishes in real-time, the manufacturing world is switching to a more on-demand framework where goods are produced. With a large base of consumers, the trend is also moving towards low volume products designed and semi-customised for a particular consumer. The manufacturing sector has no choice but to leverage advanced technologies to adjust to modern realities. The modern marketplace also demands new orders are verified quickly and the production settings are assembled as soon as possible.

With these demands in mind, researchers at Industrial AI Lab at Hitachi America have tried to solve the acute problem of dispatching in manufacturing using reinforcement learning. They propose to design the shop floor state as a 2D matrix and optimise for dispatching of a good taking into many ignored factors such as job slack time and tardiness. The researchers cleverly avoid maintaining an RL task or model for every product floor and have made improvements in deep RL models to suit their needs. Their approach and experiments have shown to decrease total lateness and the newly proposed policy transfer has reduced training time for the whole deep RL system.

Changing How Dispatching Works 

Older methods mostly relied on heavy heuristics and decision processes that didn’t handle complexity very well. The contribution of researchers Shuai Zheng, Chetan Gupta, and Susumu Serita is to design and represent the shop floor as a matrix and incorporate many important details into the dispatching optimisation function. Apart from this, they have also developed a transfer approach for dispatching policy using manifold alignment.

The overall objective of the deep reinforcement learning method is to optimise the product dispatching in terms of tardiness. The system tries to decrease the tardiness on overall jobs. The design of this improved system includes a job queue which readies components and a processing machine. 

The RL agent observes a state S which consists of a job queue state and machine state and the gives out a probability vector and acts on it. As it is an RL system it receives a reward. A simulation with such an environment is created and repeated multiple times to get different possibilities. The reward is observed for each possible setting a favourable trajectory is chosen for dispatch of the product.

The production settings are incredibly hard and complex and hence the results of the deep RL simulation can’t be transferred to many scenarios. This is unfortunate, as running deep RL experiments for each factory setting is not feasible. Hence the researchers in their Deep Manufacturing Dispatching (DMD) framework, have taken steps where it will be easy to generalise this model for other factory settings. 

See Also
Meta Learning via leaned loss function ML3

The following features are generalised:

  1. Factory setting parameters, such as job queue states and machine states, etc
  2. Job characteristics parameters, such as job length distribution, job arrival speed, etc.

The researchers state in the paper, “To apply a trained policy in a new factory setting or when job characteristics changes, knowledge transfer would greatly improve the performance of learning by avoiding expensive data collection process and reducing training time.”


The state of manufacturing, product ordering and dispatching has been a focus area from a long time. The design of factories and shop floors is a critical aspect when it comes to improving the future efficiency of manufacturing. Deep learning methods can provide a way out of complexity because of their proven track record of handling large amounts of data and features. The complex world of production systems will be helped greatly by a model family such as deep learning which thrives in complex problems with millions of variables.

Subscribe to our Newsletter

Get the latest updates and relevant offers by sharing your email.
Join our Telegram Group. Be part of an engaging community

Copyright Analytics India Magazine Pvt Ltd

Scroll To Top