Solving the 3D Bin Packing Problem with AI and MILP
The relatable anxiety of trying to fit an entire holiday wardrobe into a single 23kg suitcase mirrors one of the most financially demanding challenges in global supply chain logistics: the 3D Bin Packing Problem (3D BPP).
While a consumer might face an extra airline fee for poor packing, the stakes for commercial shipping are exponentially higher. A single poorly allocated item on a pallet can force the use of an additional container, instantly doubling transport costs for that consignment.
Despite these high stakes, human operators fill only 50% to 60% of available space on average, often requiring multiple manual attempts to achieve higher efficiency. To solve this, Decision Lab developed LOGOS (Logistics Optimisation System). By leveraging two distinct computational paradigms (mathematical optimisation and deep reinforcement learning) LOGOS consistently outperforms human capabilities in both speed and spatial utility.
Defining the 3D Bin Packing Challenge
To evaluate the effectiveness of different digital solutions, we established a standardised testing environment. The core objective is to pack a varied sequence of cuboid items into the minimum number of containers while maximising spatial utility.
For our baseline study, we used a 5x5x5 container environment packed with items ranging in dimensions from 1x1x1 to 3x3x3. Container efficiency is precisely calculated using the following utility formula:
To find the optimal operational approach, we developed and compared three distinct computational methods:
- Mathematical Optimisation (LOGOS-OPT): A deterministic approach that searches the entire problem space to calculate an absolute mathematical optimum.
- Deep Reinforcement Learning (LOGOS-RL): An AI-driven approach where an autonomous agent learns highly adaptive packing strategies through continuous trial and error.
- Rules-Based Algorithm: A standard procedural heuristic used as a comparative baseline for traditional industry software.
Approach 1: Mathematical Optimisation (LOGOS-OPT)
Our mathematical framework formulates the 3D BPP as a Mixed Integer Linear Programming (MILP) problem. This method assumes full information is available upfront, allowing the system to map out the entire packing sequence simultaneously.
Driven by the commercial Gurobi solver, LOGOS-OPT maximises total container utility while strictly enforcing real-world physical constraints:
- Spatial Boundaries: Items must remain entirely within the physical dimensions of the container.
- Volumetric Integrity: No physical overlap between items is permitted.
- Rotational Logic: Items can be orthogonally rotated, though specific orientation restrictions can be applied to individual assets.
- Vertical Stability: To prevent damage or structural collapse, the bottom face of every item must be structurally supported by the container floor or the flat surface of items beneath it, verified across all four base vertices.
Approach 2: Deep Reinforcement Learning (LOGOS-RL)
In real-world logistics hubs, operations are rarely static. Items arrive sequentially on a conveyor belt, demanding immediate, real-time placement decisions with highly imperfect future data. This dynamic, stochastic environment is where Deep Reinforcement Learning (DRL) excels.
Unlike LOGOS-OPT, LOGOS-RL does not have an upfront list of items; it evaluates items sequentially, holding visibility of only one step ahead on the conveyor line.

The AI Training Architecture
To build a highly capable neural network, we paired advanced simulation with automated cloud training:
- Simulation Environment: Built using AnyLogic, the model tracks advanced states, including a dynamic height map matrix, container utility trends, and real-time feasibility maps that serve as an action mask.
- AI Training Engine: We utilised machine teaching hosted on Azure to scale training across millions of parallel iterations. The system automatically selected the optimal deep reinforcement learning algorithms (such as SAC, Apex DQN, or PPO) based on environmental feedback.
- Reward Function: Rather than relying on a delayed end-of-game reward, the agent received immediate positive reinforcement proportional to the change in container utility, alongside a steep penalty and immediate episode termination for invalid placements.
Continuous curriculum learning allowed the agent to master basic 1x1x1 packing across 318,000 iterations before successfully mastering highly variable multi-dimensional item packing over 7 million training iterations. Once fully trained, the brain is exported into a lightweight Docker container for instant, on-site deployment.
Benchmarking Performance: Human vs. Machine
To validate LOGOS in practice, we executed two series of rigorous live trials comparing manual human efforts against our automated systems.
Trial 1: Standard Pallet Allocation (17 Items)
- LOGOS-OPT: Generated an optimal packing configuration in 13 seconds, achieving a 75.8% packing density. Using the LOGOS Visual Assistant (a digital guidance system that projects exact placement instructions), the physical packing took just 40 seconds.
- Unassisted Human: Averaged 90 seconds across multiple attempts, only managing to load 12 to 15 of the items, resulting in a significantly lower spatial density between 55% and 70%.
Trial 2: High-Complexity Pallet Allocation (19 Items)
- LOGOS-OPT: Calculated a complex packing arrangement in 105 seconds, yielding a 73.5% packing density that allowed the physical operator to safely pack all 19 items in 50 seconds.
- Unassisted Humans: The increased complexity highlighted a severe performance gap. One experienced packer gave up entirely after 5 minutes. A second achieved only 60% density, leaving 5 critical items completely unpacked. The final human packer managed to fit all items but required 10 full minutes, which is four times longer than the combined computing and packing time of LOGOS.
Comparative Results: Finding the Operational Sweet Spot
When evaluating the three algorithmic models across multiple randomised arrival sequences, Deep Reinforcement Learning proved to be an incredibly formidable alternative to perfect mathematical models.
| Metric | Mathematical Optimisation (LOGOS-OPT) | Deep Reinforcement Learning (LOGOS-RL) | Standard Rules-Based Heuristic |
| Exp 1: Packing Density | 93.5% | 83.2% | 75.0% |
| Exp 1: % of Theoretical Max | 100% | 89.0% | 80.0% |
| Exp 2: Packing Density | 88.35% | 80.4% | 74.8% |
| Exp 2: % of Theoretical Max | 100% | 91.0% | 84.0% |
| Exp 3: Packing Density | 93.6% | 80.5% | 71.8% |
| Exp 3: % of Theoretical Max | 100% | 86.0% | 77.0% |
Key Strategic Insights
- The DRL Advantage: Despite having no knowledge of future item arrivals, LOGOS-RL consistently achieved 86% to 91% of the mathematical gold standard, comfortably outperforming standard rules-based algorithms. In fact, during Experiment 2, the RL agent matched the absolute mathematical optimum a staggering 55% of the time.
- Infinite Scalability: Traditional mathematical optimisation complexity scales exponentially, often requiring complex problem decomposition to avoid computation time explosion when managing large fleets. Conversely, LOGOS-RL delivers near-instantaneous decision-making responses and scales linearly across an infinite number of containers without any added computational overhead.
- Real-Time Resilience: Because the DRL framework processes decisions item-by-item, it is uniquely suited for live logistics environments where orders change mid-route, or stock availability suddenly shifts.
Transforming Fleet Economics
By bridging the gap between theoretical data science and warehouse floor execution, the LOGOS framework provides logistics enterprises with a tangible path toward maximising fleet utilisation, reducing fuel consumption, and eliminating systemic supply chain waste.
To learn more about deploying LOGOS within your logistics infrastructure, contact us.

Leave a Reply