Composer: Creative and Controllable Image Synthesis with Composable Conditions

Lianghua Huang1 Di Chen1 Yu Liu1 Yujun Shen2 Deli Zhao1 Jingren Zhou1

1Alibaba Group     2Ant Group

Composer exponentially expands the control space through composition, leading to an enormous number of ways to generate and manipulate images, i.e., making "the infinite use of finite means".

[Paper]     [BibTeX]     [Code]


Recent large-scale generative models learned on big data are capable of synthesizing incredible images yet suffer from limited controllability. This work offers a new generation paradigm that allows flexible control of the output image, such as spatial layout and palette, while maintaining the synthesis quality and model creativity. With compositionality as the core idea, we first decompose an image into representative factors, and then train a diffusion model with all these factors as the conditions to recompose the input. At the inference stage, the rich intermediate representations work as composable elements, leading to a huge design space (i.e., exponentially proportional to the number of decomposed factors) for customizable content creation. It is noteworthy that our approach, which we call Composer, supports various levels of conditions, such as text description as the global information, depth map and sketch as the local guidance, color histogram for low-level details, etc. Besides improving controllability, we confirm that Composer serves as a general framework and facilitates a wide range of classical generative tasks without retraining. Code and models will be made available.

Composition Results

Composition of text and depth.

Composition of masked image and text.

Composition of sketch, depth and embedding (1).

Composition of sketch, depth and embedding (2).

Composition of text and palette.

Composition of embedding and palette.

Composition of intensity and palette.

Manipulation Results

Image variations when fixing sketch, depth, palette and/or embedding.

Image interpolations when fixing sketch, depth, segmentation map and/or palette.

Image reconfigurations (manipulating an image by directly modifying its elements).

Color interpolations.

Region-specific image editing.

Reformulation of Classical Tasks

Image translation.

Style transfer.

Pose transfer.

Virtual try-on.


  title={Composer: Creative and Controllable Image Synthesis with Composable Conditions},
  author={Huang, Lianghua and Chen, Di and Liu, Yu and Yujun, Shen and Zhao, Deli and Jingren, Zhou},
  booktitle={arXiv preprint arxiv:2302.09778},

We Are Hiring!

If you're looking for an exciting challenge and the opportunity to work with cutting-edge technologies in AIGC and large-scale pretraining, then we are the place for you. We are looking for talented, motivated and creative individuals to join our team. If you are interested, please send your CV to Yu Liu.