Abstract [eng] |
In this paper, we apply deep neural networks for generating HTML code from a webpage screenshot. First, synthesise our dataset in which web page screenshots are like pix2code, but differently from pix2code, we use a plain Html instead of a domain-specific language, which increases task complexity. While still following encoder-decoder network architecture, we replace decoders LSTM architecture with transformer-based architecture, and for some of the experiments, we also replace CNN based encoder with Transformer. Thus our applied model either follow image captioning with stacked attention architecture or full transformer architecture for image captioning. Using these newer architectures allows us to achieve better accuracy than pix2code authors. |