Multimodal AI Visual Reasoning: PCB Analysis in HTB Critical Flight

HTB PCB

With the latest updates, we have introduced new multimodal visual analysis capabilities in Xfenser, enabling our AI agent to directly understand and interpret complex images and technical content, transforming them into actionable insights. To evaluate its performance in a realistic setting, we used the “Critical Flight” scenario from Hack The Box.

The test involved the analysis of production files related to Printed Circuit Boards (PCBs), embedded within a highly realistic use case: a series of non-functional drones, likely compromised by a tampering of the design prior to manufacturing.

From the very beginning, the agent adopted a methodical approach, focusing on the analysis of Gerber files to identify suspicious patterns and inconsistencies in the data. Rather than limiting itself to static data inspection, it quickly recognized the need to transform this information into a more suitable format for deeper analysis.

This is where one of the key capabilities of the new system becomes evident: the agent’s ability to autonomously convert technical data into visual representations. By generating images of the different PCB layers, it was able to perform a far more effective analysis, moving from abstract data interpretation to a concrete visual inspection of the board structure.

The core objective of the test was to assess the agent’s ability to autonomously identify anomalies and hidden information within the generated visual representations. The AI successfully passed the challenge end-to-end, demonstrating its ability to analyze circuit layers, recognize non-conforming patterns, and correctly identify the presence of a hidden flag embedded in the design.

Image Analysis

The agent also demonstrated an advanced iterative behavior, progressively improving the quality of its analysis. In particular, it increased the resolution of the most critical layers — such as the bottom copper and the inner layer 1 — in order to achieve higher clarity and extract finer hidden details.

This behavior highlights a combination of visual understanding, operational adaptability, and analytical optimization. It is not simply about “seeing” an image, but about understanding how to generate the most effective representation and refining it to extract meaningful information.

These results confirm the effectiveness of Xfenser’s AI in advanced multimodal analysis scenarios, opening up practical applications in areas such as quality control, hardware security, and digital forensics.

To explore the full technical details of the test and methodology, read the complete writeup here:

View the full chat export →
AI Response