Human memory, while remarkable, struggles to retain and recall intricate details
from various experiences. This abstract proposes a novel “Experience Memory Network” to
address this limitation in the context of video semantic segmentation. The network functions as
a fully connected architecture, comprising information storage units that retain data on
individual video frames and relations that capture the connections between them. To facilitate
efficient information retrieval and updates, the network incorporates specialized read and write
modules. This architecture empowers the system to leverage past experiences (stored in the
memory network) to perform semantic segmentation on new video frames. Essentially, the
network learns from each frame and utilizes this knowledge to understand and segment
subsequent frames, leading to improved overall performance. The segmentation evaluation of
Experience Memory Network yields two crucial metrics: mean region similarity (80.2%),
indicating the overall overlap between predicted and true regions, and mean contour accuracy
(84.2%), which specifically assesses how well predicted contours align with actual object
boundaries. Analysing both metrics together provides a holistic understanding of segmentation
effectiveness, encompassing both the extent of overlap and the precision of boundary
delineation.