Being natural to everyone, language-based inputs have demonstrated effective for various tasks such as object detection and image generation. This paper for the first time presents a language-based system for interactive colorization of scene sketches, based on their semantic comprehension. Compared with prior scribble-based interfaces, which require a minimum level of professional skills, our language-based interface is more natural for novice users. The proposed system is built upon deep neural networks trained on a large-scale repository of scene sketches and cartoon-style color images with text descriptions. Given a scene sketch, our system allows users, via language-based instructions, to interactively localize and colorize specific object instances to meet various colorization requirements in a progressive way. We demonstrate the effectiveness of our approach via comprehensive experimental results including alternative studies, comparison with the state-of-the-art, and generalization user studies.