DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (2024)

Linxuan XinPeking UniversityShenzhenChinalinxuanxin@stu.pku.edu.cn,Zheng ZhangHuawei Cloud Computing Technologies Co., Ltd.HangzhouChinazhangzheng119@huawei.com,Jinfu WeiTsinghua UniversityShenzhenChinaweijf22@mails.tsinghua.edu.cn,Wei GaoSchool of Electronic and Computer Engineering, Shenzhen Graduate Schoool, Peking UniversityShenzhenChinagaowei262@pku.edu.cnandDuan GaoHuawei Cloud Computing Technologies Co., Ltd.ShenzhenChinagaoduan0306@gmail.com

Abstract.

Prior material creation methods had limitations in producing diverse results mainly because reconstruction-based methods relied on real-world measurements and generation-based methods were trained on relatively small material datasets.To address these challenges, we propose DreamPBR, a novel diffusion-based generative framework designed to create spatially-varying appearance properties guided by text and multi-modal controls, providing high controllability and diversity in material generation.The key to achieving diverse and high-quality PBR material generation lies in integrating the capabilities of recent large-scale vision-language models trained on billions of text-image pairs, along with material priors derived from hundreds of PBR material samples.We utilize a novel material Latent Diffusion Model (LDM) to establish the mapping between albedo maps and the corresponding latent space. The latent representation is then decoded into full SVBRDF parameter maps using a rendering-aware PBR decoder. Our method supports tileable generation through convolution with circular padding.Furthermore, we introduce a multi-modal guidance module, which includes pixel-aligned guidance, style image guidance, and 3D shape guidance, to enhance the control capabilities of the material LDM.We demonstrate the effectiveness of DreamPBR in material creation, showcasing its versatility and user-friendliness on a wide range of controllable generation and editing applications.

Physically-based Rendering, Spatially Varying Bidirectional Reflectance Distribution Function, Multimodal Deep Generative Model, Deep Learning

copyright: noneccs: Computing methodologiesRenderingccs: Computing methodologiesArtificial intelligence

DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (1)

\Description

fig:teaser

1. Introduction

High-quality materials are crucial for achieving photorealistic rendering. Despite advancements in appearance modeling over the past few decades, material creation remains a challenging research area. The material generation approaches can be categorized into reconstruction-based methods and generation-based methods.Reconstruction-based methods use one or many input photographs to estimate surface reflectance properties either through optimization-based inverse rendering (Gao etal., 2019; Guo etal., 2020; Hu etal., 2019) or deep neural network inference (Deschaintre etal., 2018a; Guo etal., 2023). However, the scope of these methods is constrained to real-world photographs, limiting their ability to create imaginative and creative materials.

Recent approaches have explored material generation (Guo etal., 2020; Zhou etal., 2022) using Generative Adversarial Networks (GANs) (Goodfellow etal., 2014).However, these methods are typically trained on hundreds to thousands of materials, which pales in comparison to the billions of images used in large-scale Language-Image generative models. The dataset capacity restricted their generating diversity.Furthermore, GAN-based methods also had training challenges including unstable training, mode collapse, and scalability issues with large datasets.On the other hand, diffusion models (Ho etal., 2020; Rombach etal., 2022) have shown significant advancements, exhibiting advantages in scalability and diversity.Recent advances (Poole etal., 2022; Wang etal., 2023) leverage 2D diffusion models before generating 3D content. However, these methods mainly focus on implicit representation or textured mesh, lacking the capability to disentangle physically based material and illumination.

To address these challenges, we introduce DreamPBR, a novel generative framework for creating high-resolution spatially-varying bidirectional reflectance distribution functions (SVBRDFs) conditioned with text inputs and a variety of multi-modal guidance.The main advantages of our method lie in generating diversity and controllability.Our method can generate semantically correct and detailed materials based on various textual prompts, ranging from highly structured materials with stationary patterns to imaginative materials with flexible content, such as a Hello Kitty carpet (as shown in Figure1).

The key idea of our method is to integrate pretrained 2D text-to-image diffusion models (Rombach etal., 2022) with material priors to generate high-fidelity and diverse materials.While 2D text-to-image Latent Diffusion Models (LDM) excel in generating natural images, they had challenges in producing spatially-varying physically-based material maps due to the large domain gaps between natural images and materials. Consequently, adapting pretrained 2D diffusion models into the material domain, while preserving both quality and diversity, is a non-trivial research task.We introduce a novel material LDM which is learned by a two-stage strategy to address this challenge.In the initial stage, we observed albedo map is a specialized RGB image and stores spatially-varying surface reflectance by RGB pixel values.We transfer the pretrained LDM from the text-to-image domain to the text-to-albedo domain using fine-tuning, which can be regarded as the distillation from a large source domain (natural images) to a relatively small target domain (albedo texture maps) by leveraging the target domain priors.In the subsequent stage, we leverage a PBR decoder to reconstruct SVBRDFs from the latent space of albedo maps learned in the former stage.The reasons that we employ a decoder-only architecture for SVBRDFs generation are:1. The generated SVBRDF parameter maps exhibit strong correlations since they share a common latent representation as the starting point for decoding.2. The decoder module does not compromise generating diversity, as we keep the denoising UNet frozen during the training of the PBR decoder.Additionally, we introduced a highlight-aware decoder for the albedo map to further enhance regularization.

We introduce a multi-modal guidance module designed to serve as the conditioning mechanism for our material LDM, enabling a wide variety of controls for user-friendly material creation. Specifically, this guidance module includes three key components:Pixel Control allows pixel-aligned guidance from inputs like sketches or inpainting masks.Style Control extracts style features from reference images and employs them to guide the generation process.Shape Control enables automatic material generation for a given 3D object with segmentations with an optional 2D exemplar image for reference.Importantly, our framework supports the concurrent use of multiple guidances seamlessly.

We have trained our DreamPBR method on a publicly available SVBRDF dataset, comprising over 700 high-resolution (2k𝑘kitalic_k) SVBRDFs.Thanks to the convolutional backbone of LDM, seamless tileable material generation can be supported by utilizing circular padding in all convolutional operators.

To summarize, our main contributions are as follows:

  • We introduce a novel generative framework for high-quality material generation under text and multi-modal guidances that combine pretrained 2D diffusion model and material domain priors efficiently;

  • We present a rendering-aware decoder module that learns the mapping from a shared latent space to SVBRDFs;

  • Our multi-model guidance module offers rich user-friendly controllability, enabling users to manipulate the generation process effectively;

  • We propose an image-to-image editing scheme that facilitates material editing tasks such as stylization, inpainting, and seamless texture synthesis.

2. Related Work

2.1. Material estimation

Material estimation approaches aim to acquire material data from real-world measurements under varying viewpoints and lighting conditions. We specifically focus on recent material estimation methods that utilize lightweight capture setups using consumer cameras. For a more comprehensive overview of general appearance modeling, please refer to surveys (Dong, 2019; Weinmann and Klein, 2015; Guarnera etal., 2016).

Methods have been developed to leverage multiple images or video sequences captured by a handheld camera to estimate appearance properties. Due to the limitations of lightweight setups, most approaches still rely on regularization such as handcrafted heuristics for diffuse/specular separation (Riviere etal., 2016; Palma etal., 2012), linear combinations of basis BRDFs (Hui etal., 2017), and sparsity assumption for incident lighting (Dong etal., 2014). Another class of methods focuses on reducing the number of input images by leveraging material priors such as stationary materials (Aittala etal., 2015, 2016), hom*ogeneous or piece-wise materials (Xu etal., 2016), and spatially sparse materials (Zhou etal., 2016).

In recent years, deep learning-based methods have shown significant progress in recovering SVBRDFs from single image (Li etal., 2017; Deschaintre etal., 2018a; Li etal., 2018; Guo etal., 2021, 2023; Henzler etal., 2021). These methods employ deep convolutional neural network to predict plausible SVBRDFs from in-the-wild input images in a feed-forward manner. Deschaintre etal. (2019) extended a single-image-based solution to multiple images by latent space max-pooling.More recent work by Gao etal. (2019) introduced a deep inverse rendering pipeline that enables appearance estimation from an arbitrary number of input images.In procedural material modeling,Hu etal. (2019); Shi etal. (2020); Hu etal. (2022a) proposed to optimize material parameters with fixed node graphs to match input images. Hu etal. (2022b) introduced a new pipeline that eliminates the need for predefined node graphs.Most recently, Sartor and Peers (2023) proposed a diffusion-based model to estimate the material properties from a single photograph.

The methods mentioned above rely on captured photographs to reconstruct material and cannot produce non-real-world materials. In contrast, our approach can generate diverse and creative SVBRDFs using natural language inputs.

2.2. Generative models

Image generation

Generative Adversarial Networks (GANs)(Goodfellow etal., 2014) have demonstrated remarkable capabilities in producing high-fidelity images. Subsequent research has focused on GAN improvements such as training stability (Kodali etal., 2017; Karras etal., 2018), attribute disentanglement (Karras etal., 2019), conditional controllability (Li etal., 2021; Park etal., 2019), and generation quality (Karras etal., 2020, 2021). GAN can be used in various applications including text-to-image synthesis (Reed etal., 2016b, a; Zhu etal., 2019), image-to-image translation (Isola etal., 2018; Zhu etal., 2020),video generation (Tulyakov etal., 2017), and even 3D shape generation (Li etal., 2019).

Recent advancements in text-to-image generation have been mainly driven by diffusion models (DMs) (Sohl-Dickstein etal., 2015; Ho etal., 2020; Ramesh etal., 2022). Later advancements (Song etal., 2020, 2021; Liu etal., 2022) have explored efficient sampling strategies to significantly reduce the number of required sampling steps, thereby improving image generation performance. Rombach etal. (2022) proposed Latent Diffusion Model (LDM) to perform the denoising process in learned compact latent space, enabling high-resolution image synthesis and efficient image manipulation.

Controllable generation

Integrating multi-modal controllability into a text-to-image diffusion model is crucial for creation applications.Recent research (DenisZavadski and Rother, 2023; Zhang etal., 2023; Ye etal., 2023; Hu etal., 2022c; Mou etal., 2023) has focused on lightweight multi-modal controllability without the requirements of extensive data and high computational power.Hu etal. (2022c) introduces a fine-tuning strategy using low-rank matrices, enabling domain-specific adaptation. Zhang etal. (2023) proposed ControlNet, adding spatial conditioning to diffusion models for precise generation control. Ye etal. (2023) presented a lightweight framework enhancing diffusion models with image prompts using a decoupled cross-attention mechanism.

Material generation

Guo etal. (2020) proposed an unconditional MaterialGAN for synthesizing SVBRDFs from random noise. The learned latent space facilitates efficient material estimation in inverse renderings. Zhou etal. (2022) developed a StyleGAN2-based model, conditioned by spatial structure and material category, for tileable material synthesis. These GAN-based methods show advantages in generating high-resolution and visually compelling materials. However, their diversity is constrained by the training instability of GANs and the limited range of training datasets.In procedural material generation, Guerrero etal. (2022) first introduced a transformer-based autoregressive model. Later work by Hu etal. (2023) proposed a multi-model node graph generation architecture for creating high-quality procedural materials, guided by both text and image inputs.While procedural representations are compact and resolution-independent, they are limited to stationary patterns and cannot create arbitrary styles.

In concurrent work, Vecchio etal. (2023) introduced ControlMat, a diffusion-based material generative model, capable of generating tileable materials using text and a single photograph as input. This model was trained on a synthetic material dataset comprising 126,000126000126,000126 , 000 samples, derived from 8,61586158,6158 , 615 raw material graphs.While quite large in the material domain, this dataset is relatively small compared to the billions of text-image pairs used in text-to-image diffusion model training. This scale discrepancy leads to constrained diversity. Furthermore, this work only supports guidance of text and single photograph, limiting the scenarios range.

In contrast, our method significantly enhances material generation diversity through the efficient integration of pretrained diffusion models with material priors. We also provide a variety of user-friendly controls for guiding the generation process, expanding the scope and flexibility of applications.

2.3. Text-to-3D Generation

Transitioning 2D text-to-image approaches to 3D generation presents significant challenges, mainly due to lacking large-scale labeled 3D datasets. Recent approaches (Poole etal., 2022; Wang etal., 2023; Lin etal., 2023; Tang etal., 2023) have explored text-to-3D generation without the dependency of 3D data. (Poole etal., 2022) integrates Score Distillation Sampling (SDS) with text-to-image diffusion models. Wang etal. (2023) further improved the quality and diversity by introducing Variational Score Distillation (VSD). The development of large-scale 3D datasets (Deitke etal., 2023) enabled direct learning from 3D data (Liu etal., 2023; Shi etal., 2023). However, current 3D generation methods mainly focus on geometry modeling and fail to produce high-quality, disentangled materials.

Park etal. (2018) introduced a neural method to assign materials from a predefined set to different parts of a 3D shape. Extending this, Hu etal. (2022d) employs a translation network to establish the correspondence between 2D exemplar image and 3D shape. This allows for extracting material cues from 2D images and selecting optimal materials from a candidate pool using a perceptual metric. However, these methods are constrained by the variety of their predefined material assets and lack the ability to transfer complex spatial patterns from 2D exemplars to 3D shapes.In contrast, our generative model can produce diverse materials and effectively transfer spatial structures from 2D exemplar images to 3D models, showcasing a significant advancement in material assignment.

3. Method

3.1. Overview

DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (2)

\Description

Preliminaries

The goal of our method is to generate spatially-varying materials which are represented by the Cook-Torrance microfacet BRDF model with GGX normal distribution function (Walter etal., 2007). Specifically, we use metallic-based PBR workflow and represent surface reflectance properties as albedo map 𝒫𝒫\mathcal{P}caligraphic_P, normal map 𝒩𝒩\mathcal{N}caligraphic_N, roughness map \mathcal{R}caligraphic_R, and metallic map \mathcal{M}caligraphic_M.

DreamPBR is a Latent Diffusion Model (LDM)-based generative framework capable of producing diverse, high-quality SVBRDF maps under text and multi-modal guidance, as illustrated in Figure2.

The core generative module of our framework is the material Latent Diffusion Model (material LDM), which takes textual description T𝑇Titalic_T as inputs to encode high-dimensional surface reflectance properties into a compact latent representation z𝑧zitalic_z.This representation effectively compresses complex material data and guides the SVBRDF decoder in reconstructing detailed SVBRDF maps (i.e. albedo, normal, roughness, and metallic) S={𝒫,𝒩,,}𝑆𝒫𝒩S=\{\mathcal{P},\mathcal{N},\mathcal{R},\mathcal{M}\}italic_S = { caligraphic_P , caligraphic_N , caligraphic_R , caligraphic_M }.Our critical observation is that while pre-trained text-to-image diffusion models can capture a wide range of natural images that fulfill the diversity needs of material generation, their flexibility often leads to less plausible materials due to the absence of material priors.Instead of training material LDM from scratch with limited material data, we opted to fine-tune a pre-trained text-to-image diffusion model with target material data. This strategy effectively tailors the model from a broad image domain to a specific material domain, ensuring both diversity and authenticity of output.

Our text-to-material framework seamlessly integrates three types of control modules to enhance material generation capabilities.First, we introduce the Pixel Control module GPsubscript𝐺𝑃G_{P}italic_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT that takes pixel-aligned inputs (e.g. sketches, masks), utilizing the ControlNet architecture (Zhang etal., 2023). It adds conditional controls into diffusion models, providing spatial guidance for material generation.Second, we use Style Control module GIsubscript𝐺𝐼G_{I}italic_G start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT to extract image features from the input image prompt, which are then utilized to adapt material LDM via cross attention.Third, we propose a Shape control module GSsubscript𝐺𝑆G_{S}italic_G start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT to generate SVBRDF maps automatically for a given segmented 3D shape. This module can leverage large language models to generate text prompts corresponding to different parts of the input shape. It also supports taking a 2D photo exemplar as additional input, enabling the generation of material maps for each segmented part, guided by the segmented 2D image.In the rest of the current section, we will dive into the key components of our framework. Section3.2 introduce our core text-to-material module that enables tileable, diverse material generation. Next, Section3.3 describes the SVBRDF decoder, responsible for reconstructing high-resolution SVBRDF maps from a unified latent space. Finally, Section3.4 discusses the Multi-modal control module, providing image and 3D control capabilities to the diffusion model.

3.2. Physically-based material diffusion

Our material LDM transforms text features τ(T)𝜏𝑇\tau(T)italic_τ ( italic_T ), extracted by CLIP’s text encoder τ()𝜏\tau(\cdot)italic_τ ( ⋅ ) (Radford etal., 2021) from user prompts T𝑇Titalic_T, into a latent representation z𝑧zitalic_z of SVBRDF maps S𝑆Sitalic_S. The latent space is characterized by a Variational Autoencoder (VAE) architecture \mathcal{E}caligraphic_E (Kingma and Welling, 2014; Rezende etal., 2014).Specifically, for an albedo map 𝒫H×W×C,C=3formulae-sequence𝒫superscript𝐻𝑊𝐶𝐶3\mathcal{P}\in\mathbb{R}^{H\times W\times C},C=3caligraphic_P ∈ blackboard_R start_POSTSUPERSCRIPT italic_H × italic_W × italic_C end_POSTSUPERSCRIPT , italic_C = 3, the map is compressed into latent space z=(𝒫)h×w×c𝑧𝒫superscript𝑤𝑐z=\mathcal{E}(\mathcal{P})\in\mathbb{R}^{h\times w\times c}italic_z = caligraphic_E ( caligraphic_P ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_h × italic_w × italic_c end_POSTSUPERSCRIPT. Consistent with Rombach etal. (2022), we adopt the parameters c=4,h=H/8,w=W/8formulae-sequence𝑐4formulae-sequence𝐻8𝑤𝑊8c=4,h=H/8,w=W/8italic_c = 4 , italic_h = italic_H / 8 , italic_w = italic_W / 8.

The core component of diffusion model is the denoising U-Net module (Ronneberger etal., 2015) which is conditioned on timestep t𝑡titalic_t. Following Denoising Diffusion Probabilistic Models (DDPM) (Ho etal., 2020), our model employs a deterministic forward diffusion process q(zt|zt1)𝑞conditionalsubscript𝑧𝑡subscript𝑧𝑡1q(z_{t}|z_{t-1})italic_q ( italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | italic_z start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) to transform latent vectors z𝑧zitalic_z towards an isotropic Gaussian distribution. The U-Net network is specifically trained to reverse the diffusion process q(zt1|zt)𝑞conditionalsubscript𝑧𝑡1subscript𝑧𝑡q(z_{t-1}|z_{t})italic_q ( italic_z start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT | italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ), iteratively denoising the Gaussian noise back into latent vectors.Adopting the strategy proposed by Rombach etal. (2022), we incorporates the text feature τ(T)M×dτ𝜏𝑇superscript𝑀subscript𝑑𝜏\tau(T)\in\mathbb{R}^{M\times d_{\tau}}italic_τ ( italic_T ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_M × italic_d start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT into the intermediate layer of UNet through a cross-attention mechanism Attention(Q,K,V)=softmax(QKTd)VAttention𝑄𝐾𝑉softmax𝑄superscript𝐾𝑇𝑑𝑉\operatorname{Attention}(Q,K,V)=\operatorname{softmax}\left(\frac{QK^{T}}{%\sqrt{d}}\right)\cdot Vroman_Attention ( italic_Q , italic_K , italic_V ) = roman_softmax ( divide start_ARG italic_Q italic_K start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT end_ARG start_ARG square-root start_ARG italic_d end_ARG end_ARG ) ⋅ italic_V, where Q=WQiφi(zt),K=WKiτ(T),V=WViτ(T)formulae-sequence𝑄superscriptsubscript𝑊𝑄𝑖subscript𝜑𝑖subscript𝑧𝑡formulae-sequence𝐾superscriptsubscript𝑊𝐾𝑖𝜏𝑇𝑉superscriptsubscript𝑊𝑉𝑖𝜏𝑇Q=W_{Q}^{i}\cdot\varphi_{i}\left(z_{t}\right),K=W_{K}^{i}\cdot\tau(T),V=W_{V}^%{i}\tau(T)italic_Q = italic_W start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ⋅ italic_φ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) , italic_K = italic_W start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ⋅ italic_τ ( italic_T ) , italic_V = italic_W start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT italic_τ ( italic_T ), φi(zt)N×dϵisubscript𝜑𝑖subscript𝑧𝑡superscript𝑁superscriptsubscript𝑑italic-ϵ𝑖\varphi_{i}\left(z_{t}\right)\in\mathbb{R}^{N\times d_{\epsilon}^{i}}italic_φ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_N × italic_d start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT represents an intermediate representation of the UNet ϵθsubscriptitalic-ϵ𝜃\epsilon_{\theta}italic_ϵ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT, and WVid×dϵisuperscriptsubscript𝑊𝑉𝑖superscript𝑑superscriptsubscript𝑑italic-ϵ𝑖W_{V}^{i}\in\mathbb{R}^{d\times d_{\epsilon}^{i}}italic_W start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d × italic_d start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT, WQid×dτi,WKid×dτiformulae-sequencesuperscriptsubscript𝑊𝑄𝑖superscript𝑑superscriptsubscript𝑑𝜏𝑖superscriptsubscript𝑊𝐾𝑖superscript𝑑superscriptsubscript𝑑𝜏𝑖W_{Q}^{i}\in\mathbb{R}^{d\times d_{\tau}^{i}},W_{K}^{i}\in\mathbb{R}^{d\times d%_{\tau}^{i}}italic_W start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d × italic_d start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT , italic_W start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d × italic_d start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT are learnable projection matrices.

Our material LDM is fine-tuned on text-material pairs via:

(1)ldm:=𝔼(𝒫),T,ϵN(0,1),t[ϵϵθ(zt,t,τθ(T))22].assignsubscript𝑙𝑑𝑚subscript𝔼formulae-sequencesimilar-to𝒫𝑇italic-ϵ𝑁01𝑡delimited-[]superscriptsubscriptnormitalic-ϵsubscriptitalic-ϵ𝜃subscript𝑧𝑡𝑡subscript𝜏𝜃𝑇22.\mathcal{L}_{ldm}:=\mathbb{E}_{\mathcal{E}(\mathcal{P}),T,\epsilon\sim N(0,1),%t}\left[\left\|\epsilon-\epsilon_{\theta}\left(z_{t},t,\tau_{\theta}(T)\right)%\right\|_{2}^{2}\right]\text{. }caligraphic_L start_POSTSUBSCRIPT italic_l italic_d italic_m end_POSTSUBSCRIPT := blackboard_E start_POSTSUBSCRIPT caligraphic_E ( caligraphic_P ) , italic_T , italic_ϵ ∼ italic_N ( 0 , 1 ) , italic_t end_POSTSUBSCRIPT [ ∥ italic_ϵ - italic_ϵ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t , italic_τ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_T ) ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] .

Seamless tileable texture synthesis

Creating tileable texture maps is critical in material generation, involving meeting two requirements: a) maintenance of consistent spatial patterns and visual appearance, and b) the ability to tile textures without visible artifacts like seams and blocks.

While zero padding is the standard practice in CNNs, we found that circular padding is particularly effective for seamless content generation. We employ circular padding in all convolutional layers of our generative model for two main reasons:

  1. (1)

    Continuity across boundaries. Unlike classic padding methods such as zero padding, which may introduce artificial edges, circular padding ensures boundary continuity. It wraps image content around both horizontal and vertical boundaries, providing a seamless transition when tiling.

  2. (2)

    Pattern preservation. Circular padding mainly affects the boundary area of the image, leaving the central area and overall texture patterns unchanged.

Our tileable generation algorithm can serve two purposes: firstly, it can inherently produce tileable material maps without additional post-processing. Secondly, it can transform a non-tileable texture into a tileable version through an image-to-image generation pipeline, maintaining visual similarity with the original.

3.3. Render-aware SVBRDF decoder

The SVBRDF decoder, denoted as 𝒟={𝒟P,𝒟S}𝒟subscript𝒟𝑃subscript𝒟𝑆\mathcal{D}=\{{\mathcal{D}_{P},\mathcal{D}_{S}}\}caligraphic_D = { caligraphic_D start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT , caligraphic_D start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT }, decodes the unified latent representation z𝑧zitalic_z into SVBRDFs S{𝒫,𝒩,,}=𝒟(z)𝑆𝒫𝒩𝒟𝑧S\coloneqq\{\mathcal{P},\mathcal{N},\mathcal{R},\mathcal{M}\}=\mathcal{D}(z)italic_S ≔ { caligraphic_P , caligraphic_N , caligraphic_R , caligraphic_M } = caligraphic_D ( italic_z ). Here, 𝒫,𝒩H×W×3𝒫𝒩superscript𝐻𝑊3\mathcal{P},\mathcal{N}\in\mathbb{R}^{H\times W\times 3}caligraphic_P , caligraphic_N ∈ blackboard_R start_POSTSUPERSCRIPT italic_H × italic_W × 3 end_POSTSUPERSCRIPT, ,H×W×1superscript𝐻𝑊1\mathcal{M},\mathcal{R}\in\mathbb{R}^{H\times W\times 1}caligraphic_M , caligraphic_R ∈ blackboard_R start_POSTSUPERSCRIPT italic_H × italic_W × 1 end_POSTSUPERSCRIPT. In our implementation, we set H=W=512𝐻𝑊512H=W=512italic_H = italic_W = 512.Specifically, we utilize separate decoder networks: 𝒟P(z)subscript𝒟𝑃𝑧\mathcal{D}_{P}(z)caligraphic_D start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_z ) for the albedo map 𝒫𝒫\mathcal{P}caligraphic_P, and 𝒟S(z)subscript𝒟𝑆𝑧\mathcal{D}_{S}(z)caligraphic_D start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ( italic_z ) for other property maps {𝒩,,}𝒩\{\mathcal{N},\mathcal{R},\mathcal{M}\}{ caligraphic_N , caligraphic_R , caligraphic_M }. These decoder networks follow the decoder architecture in VAE proposed by (Kingma and Welling, 2014; Rezende etal., 2014), and are initialized with the weights from a pre-trained VAE decoder.

Training of PBR decoder

The training loss function for our PBR decoder 𝒟Ssubscript𝒟𝑆\mathcal{D}_{S}caligraphic_D start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT comprises the following terms:

(2)PBR=map+perp+gan+reg+render,subscriptPBRsubscript𝑚𝑎𝑝subscript𝑝𝑒𝑟𝑝subscript𝑔𝑎𝑛subscript𝑟𝑒𝑔subscript𝑟𝑒𝑛𝑑𝑒𝑟\displaystyle\mathcal{L}_{\text{PBR}}=\mathcal{L}_{map}+\mathcal{L}_{perp}+%\mathcal{L}_{gan}+\mathcal{L}_{reg}+\mathcal{L}_{render},caligraphic_L start_POSTSUBSCRIPT PBR end_POSTSUBSCRIPT = caligraphic_L start_POSTSUBSCRIPT italic_m italic_a italic_p end_POSTSUBSCRIPT + caligraphic_L start_POSTSUBSCRIPT italic_p italic_e italic_r italic_p end_POSTSUBSCRIPT + caligraphic_L start_POSTSUBSCRIPT italic_g italic_a italic_n end_POSTSUBSCRIPT + caligraphic_L start_POSTSUBSCRIPT italic_r italic_e italic_g end_POSTSUBSCRIPT + caligraphic_L start_POSTSUBSCRIPT italic_r italic_e italic_n italic_d italic_e italic_r end_POSTSUBSCRIPT ,
(3)render(x,y)=log(x+0.01),log(y+0.01)1,subscriptrender𝑥𝑦subscript𝑙𝑜𝑔𝑥0.01𝑙𝑜𝑔𝑦0.011\displaystyle\mathcal{L}_{\text{render}}(x,y)=\lVert log(x+0.01),log(y+0.01)%\rVert_{1},caligraphic_L start_POSTSUBSCRIPT render end_POSTSUBSCRIPT ( italic_x , italic_y ) = ∥ italic_l italic_o italic_g ( italic_x + 0.01 ) , italic_l italic_o italic_g ( italic_y + 0.01 ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ,

where mapsubscript𝑚𝑎𝑝\mathcal{L}_{map}caligraphic_L start_POSTSUBSCRIPT italic_m italic_a italic_p end_POSTSUBSCRIPT is L1subscript𝐿1L_{1}italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT loss on the material property maps, perpsubscript𝑝𝑒𝑟𝑝\mathcal{L}_{perp}caligraphic_L start_POSTSUBSCRIPT italic_p italic_e italic_r italic_p end_POSTSUBSCRIPT is perceptual loss based on LPIPS (Zhang etal., 2018), gansubscript𝑔𝑎𝑛\mathcal{L}_{gan}caligraphic_L start_POSTSUBSCRIPT italic_g italic_a italic_n end_POSTSUBSCRIPT is the generative adversarial loss, regsubscript𝑟𝑒𝑔\mathcal{L}_{reg}caligraphic_L start_POSTSUBSCRIPT italic_r italic_e italic_g end_POSTSUBSCRIPT is the Kullback-Leibler divergence penalty, and rendersubscript𝑟𝑒𝑛𝑑𝑒𝑟\mathcal{L}_{render}caligraphic_L start_POSTSUBSCRIPT italic_r italic_e italic_n italic_d italic_e italic_r end_POSTSUBSCRIPT is L1𝐿1L1italic_L 1 log rendering loss applied to the rendered images.

For the rendering loss, we adopt the sampling scheme proposed by Deschaintre etal. (2018b) to render nine images per material map. The images include three images rendered with independently distant light and view directions, and six images using near-field mirrored view and lighting directions. The rendering loss yields desirable SVBRDF reconstructions, achieved by encouraging the training process to focus on minimizing errors in crucial material parameters rather than treating them with equal importance.

Highlight-aware albedo decoder

As previously mentioned in Section3.2, our material LDM training utilizes the standard VAE decoder to map the latent space to the albedo map.While effective in generating plausible RGB images, this decoder tends to produce images with strong highlights, especially for shiny materials such as leather and metal.

To address this, we introduce a highlight-aware albedo decoder 𝒟Psubscript𝒟𝑃\mathcal{D}_{P}caligraphic_D start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT, which is finetuned on a synthetic shaded-to-albedo dataset, ensuring robust regularization to effectively minimize highlight artifacts in albedo maps. For each material sample in our SVBRDF dataset, we simulate various lighting conditions and viewpoints by randomly positioning point lights and cameras parallel to the material plane and then rendering SVBRDFs to reference shaded images by a physically-based renderer.

During training, the default VAE image encoder maps the shaded images into latent space, which are then decoded back to image space by our specialized albedo decoder. The training process for this decoder follows the original VAE loss function (Kingma and Welling, 2014).

Material super-resolution

High-resolution material maps are essential for achieving photorealistic renderings. However, due to the memory and performance constraints, current diffusion models typically generate images at a resolution of 512×512512512512\times 512512 × 512, which falls short of high-quality production rendering.

We introduce a material super-resolution module comprising four super-resolution networks SR𝑆𝑅SRitalic_S italic_R, each following the Real-ESRGAN architecture(Wang etal., 2021). These super-resolution networks, denoted as SRP,SRN,SRR,SRM𝑆subscript𝑅𝑃𝑆subscript𝑅𝑁𝑆subscript𝑅𝑅𝑆subscript𝑅𝑀SR_{P},SR_{N},SR_{R},SR_{M}italic_S italic_R start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT , italic_S italic_R start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT , italic_S italic_R start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT , italic_S italic_R start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT, are designed to augment the resolution of different SVBRDF property maps to 2,048×2,048204820482,048\times 2,0482 , 048 × 2 , 048.

We fine-tune the Real-ESRGAN with material data, which is trained on purely synthetic data, to more effectively capture the high-frequency details of materials. We incorporate a rendering loss (similar to Equation3) into the training of the super-resolution module to ensure that the generated details contribute to high-frequency shading effects rather than visual artifacts. We should note that special care must be taken for normal maps during augmentation involving flipping and rotation. The directions stored in a normal map must be adjusted consistently with the map orientation to ensure consistent knowledge about surface normals.

3.4. Multi-model control

We propose three control modules for DreamPBR: Pixel Control, Style Control, and Shape Control. These modules are designed to be decoupled, allowing for flexible combinations of multiple controls.

3.4.1. Pixel Control

Spatial property guidance is widely used in material creation by artists. Our Pixel Control module GPsubscript𝐺𝑃G_{P}italic_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT takes spatial control maps IPsubscript𝐼𝑃I_{P}italic_I start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT as input, utilizing the ControlNet architecture (Zhang etal., 2023), to guide the generation of spatially-consistent SVBRDFs. It supports controlled generation under sketch guidance and allows for image-to-image material inpainting with a binary mask.

Our material LDM, as described in Section3.2, is adapted in the material domain, enabling plausible material generation controlled by pre-trained ControlNet checkpoints, which are trained with 2D supervision. However, we found that fine-tuning pre-trained ControlNet with material data significantly improves both the controllability and the quality of generated materials. Specifically, we initialize our ControlNet using the ControlNet 1.1 Scribble checkpoint and fine-tune it on our SVBRDFs dataset. To generate the sketch guidance, we employ Pidinet (Su etal., 2021) for extracting sketches from albedo maps.

3.4.2. Style Control

The Style Control module GIsubscript𝐺𝐼G_{I}italic_G start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT takes image prompt ISsubscript𝐼𝑆I_{S}italic_I start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT as input and extracts the style characteristics to guide material generation. Inspired by Ye etal. (2023), image prompts Issubscript𝐼𝑠I_{s}italic_I start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT are first encoded into image features by CLIP’s image encoder, and then embedded into material LDM using a decoupled cross-attention adaptation module. Multimodal material generation can be achieved by accompanying the image prompt with a text prompt.

Style Control module can effectively capture the appearance properties and structural information from the input images, to generate realistic and coherent material maps. This functionality is particularly useful in scenarios where materials need to be created based on specific exemplar images, which is a frequent requirement in the material design industry.The interaction of the Style Control module with the Shape Control module will be detailed in Section3.4.3.

3.4.3. Shape Control

The Shape Control module GSsubscript𝐺𝑆G_{S}italic_G start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT takes a segmented 3D model Os={O,s}subscript𝑂𝑠𝑂𝑠O_{s}=\{O,s\}italic_O start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT = { italic_O , italic_s } (s𝑠sitalic_s denotes the geometry segmentation) and an optional photo exemplar Iosubscript𝐼𝑜I_{o}italic_I start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT as input and automatically generates high-quality material maps for each segmentation.When provided with only a segmented 3D model and a basic text prompt, we leverage large language models(LLMs) such as ChatGPT (Achiam etal., 2023), to enrich the text descriptions for each segmentation. For instance, given a 3D chair model, the language model can generate diverse text descriptions for each part like seat, leg, and armrest, each featuring varied design styles. Furthermore, integration with existing Pixel Control and Style Control modules supports enhanced SVBRDF generation, ensuring superior quality and detailed material characteristics.

Our model integrates the material transfer pipeline TMT (Hu etal., 2022d) to automatically assign diverse generated materials to 3D shapes based on an image exemplar.The TMT pipeline involves two stages: firstly, translating color from exemplar image to the projection of 3D shape and vice versa for segmentation results; secondly, assigning materials to projected parts using a material classifier network, based on the translated image.Unlike Hu etal. (2022d), we do not rely on predefined material collections in material assignment. Instead, we use predicted material labels of TMT directly as text prompts and translated images as image prompts in the Style Control module, allowing high-quality SVBRDF generation for each part.The proposed algorithm offers two significant advantages over traditional material transfer models: it expands material diversity beyond limited predefined material collections and transfers not only color and category information but also comprehensive material attributes including styles and spatial structures from 2D exemplar to 3D shapes, leveraging the capabilities of our Style Control module.

4. Results

4.1. Implementation Details

TypeRenderSVBRDFRenderSVBRDFRenderSVBRDFRenderSVBRDFRenderSVBRDF
BrickDreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (3)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (4)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (5)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (6)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (7)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (8)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (9)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (10)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (11)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (12)
snow-covered bricks, winter, outdoor, housecoastal barrier bricks, sea-salt resistant, outdoor, barrierstenciled brick floor, paving, terracotta, scratchednarrow bricks, wallsblackened fireplace bricks, charred
FabricDreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (13)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (14)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (15)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (16)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (17)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (18)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (19)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (20)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (21)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (22)
tablecloth, delicatedenim jacket texture, clothinghand woven carpet, artisan, carpetfloral cotton dress, clothingbackpack fabric, sturdy
GroundDreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (23)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (24)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (25)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (26)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (27)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (28)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (29)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (30)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (31)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (32)
ice glazed slippery, outdoor, winteraerial mud, road, tracksdry rocky groundmarble floor, polished, indoorstone ground
LeatherDreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (33)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (34)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (35)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (36)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (37)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (38)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (39)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (40)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (41)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (42)
perforated leather, breathableblack leatherdecoration, indoorleather white, smoothreptile skin leather, textured
MetalDreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (43)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (44)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (45)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (46)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (47)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (48)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (49)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (50)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (51)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (52)
space cruiser panels, scifiwrought iron gate, ornate, outdoorgolden metal wall, oldanodized metal surface, industrialnickel plated hardware, smooth
OrganicDreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (53)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (54)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (55)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (56)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (57)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (58)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (59)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (60)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (61)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (62)
alien slimeforest leaves, natural, autumn, dirtdragon scalesstylized animal furhoneycomb structure, geometric, natural, beehive
PlasticDreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (63)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (64)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (65)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (66)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (67)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (68)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (69)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (70)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (71)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (72)
plastic pattern, syntheticyoga matsynthetic plastic, roughreflective safety vest, clothingchildrens playground slide, colorful
TileDreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (73)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (74)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (75)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (76)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (77)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (78)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (79)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (80)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (81)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (82)
elegant, interior decorationart deco style tiles, vintage, indoor, decorativevintage ceiling tiles, indoorpatterned bw vinyl, floorsencaustic cement tiles, colorful, indoor, floor
WallDreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (83)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (84)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (85)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (86)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (87)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (88)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (89)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (90)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (91)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (92)
dry stone wall, natural, outdoorstreet art graffiti, colorful, urbanvictorian wallpaper, patterned, indoor, historicstucco finish, mediterraneancliff, outdoor
WoodDreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (93)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (94)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (95)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (96)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (97)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (98)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (99)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (100)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (101)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (102)
blue, worn painted wood siding, wallsparquet wood flooring, geometriccharcoalvarnished walnut, glossy, indoorbamboo wall covering, eco-friendly

\Description

fig:PBR

PatternPromptRenderSVBRDFPromptRenderSVBRDFPromptRenderSVBRDF
DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (103)a PBR material of brick, narrow bricks, wallsDreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (104)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (105)a PBR material of leather, smooth, white, cleanDreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (106)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (107)a PBR material of metal, ornate celtic goldDreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (108)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (109)
a PBR material of fabric, plush toy fur, soft, indoorDreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (110)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (111)a PBR material of plastic, synthetic turf blades, green, sportsDreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (112)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (113)a PBR material of tile, glass mosaic art, translucent, decorativeDreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (114)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (115)
DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (116)a PBR material of ground, marble floor tiles, polished, indoor, luxuryDreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (117)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (118)a PBR material of fabric, dirty carpet, carpet, textile, faded, floorDreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (119)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (120)a PBR material of wood, oak flooring, classic, indoorDreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (121)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (122)
a PBR material of leather, fabric leather, clean, seat, chair, couchDreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (123)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (124)a PBR material of plastic, yoga matDreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (125)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (126)a PBR material of fabric, hand woven carpet, artisan, indoorDreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (127)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (128)
DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (129)a PBR material of tile, bathroom floor tiles, non-slip, indoorDreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (130)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (131)a PBR material of tile, slate walkway tiles, rugged, outdoorDreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (132)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (133)a PBR material of tile, art deco style tiles, vintage, indoor, decorativeDreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (134)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (135)
a PBR material of wall, tiled bathroom wall, moisture-resistant, indoorDreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (136)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (137)a PBR material of metal, scratched scuffed metalDreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (138)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (139)a PBR material of brick, sewer brick, wallsDreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (140)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (141)
DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (142)a PBR material of brick, brick floor, outdoor, clean, man madeDreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (143)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (144)a PBR material of metal, chrome car detailing, reflective, car trimDreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (145)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (146)a PBR material of wood, burnt wood finish, charred, artistic, decorDreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (147)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (148)
a PBR material of tile, patterned bw vinyl, floorsDreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (149)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (150)a PBR material of wall, victorian wallpaper, patterned, indoor, historicDreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (151)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (152)a PBR material of fabric, hand woven carpet, artisan, indoor, carpetDreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (153)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (154)
DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (155)a PBR material of wall, Hello Kitty sticker wallpaper, colorful, indoor, nursery, easy-applyDreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (156)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (157)a PBR material of wall, street art graffiti, colorful, outdoor, urbanDreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (158)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (159)a PBR material of brick, multi-colored street bricks, vibrant, outdoorDreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (160)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (161)
a PBR material of fabric, embroidered linen, delicate, indoor, tableclothDreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (162)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (163)a PBR material of metal, metal plate, scifiDreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (164)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (165)a PBR material of fabric, carpet, Hello Kitty outdoor picnic mat, durable, foldableDreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (166)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (167)
DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (168)a PBR material of ground, marble floor tiles, polished, indoor, luxuryDreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (169)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (170)a PBR material of leather, black motorcycle jacket, tough, clothing, jacketDreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (171)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (172)a PBR material of ground, forest leaves, natural, leaves, autumnDreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (173)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (174)
a PBR material of metal, colored metal plate, scifiDreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (175)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (176)a PBR material of brick, stenciled brick floor, man made, worn, paving, dry, terracottaDreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (177)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (178)a PBR material of fabric, loose tableclothDreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (179)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (180)

\Description

fig:pixel_control_1

PromptRenderSVBRDFRenderSVBRDFRenderSVBRDFRenderSVBRDF
a PBR material of metal, space cruiser panelsDreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (181)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (182)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (183)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (184)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (185)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (186)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (187)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (188)
a PBR material of wall, street art graffiti, colorful, outdoor, urbanDreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (189)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (190)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (191)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (192)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (193)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (194)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (195)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (196)
a PBR material of tiles, encaustic cement tiles, colorful, indoor, floorDreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (197)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (198)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (199)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (200)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (201)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (202)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (203)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (204)

\Description

fig:pixel_control_2

StyleRenderSVBRDFRenderSVBRDFRenderSVBRDFRenderSVBRDF
DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (205)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (206)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (207)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (208)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (209)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (210)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (211)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (212)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (213)
a PBR material of fabric, carpeta PBR material of ground, stone, outdoora PBR material of wooda PBR material of tile, marble
DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (214)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (215)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (216)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (217)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (218)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (219)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (220)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (221)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (222)
a PBR material of tile, encaustic cementa PBR material of wall, concrete wall, outdoor, cracked, man made, rough, painteda PBR material of brick, street brick, outdoora PBR material of wood, varnished walnut, painted, artistic
DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (223)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (224)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (225)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (226)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (227)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (228)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (229)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (230)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (231)
a PBR material of brick, street brick, outdoor, arta PBR material of leathera PBR material of ground, sidewalka PBR material of tile, encaustic cement
DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (232)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (233)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (234)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (235)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (236)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (237)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (238)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (239)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (240)
a PBR material of fabric, hand woven carpeta PBR material of tile, marblea PBR material of wall, wallpapera PBR material of wood, synthetic wood, painted

\Description

fig:style_control

a PBR material of wood
DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (241)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (242)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (243)
DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (244)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (245)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (246)
DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (247)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (248)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (249)
a PBR material of tile, encaustic cement tiles, indoor, floor
DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (250)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (251)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (252)
DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (253)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (254)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (255)
DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (256)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (257)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (258)

\Description

fig:seed

OutputExpansionOutputExpansion
DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (259)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (260)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (261)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (262)
DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (263)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (264)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (265)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (266)
DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (267)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (268)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (269)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (270)
DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (271)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (272)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (273)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (274)

\Description

fig:seamless

InputYellow flowerRed flower
DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (275)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (276)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (277)
Blue flowerCyan flowerPurple flower
DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (278)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (279)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (280)
Pink flowerLeafGrass
DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (281)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (282)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (283)

\Description

fig:inpainting

PromptStylePixelRenderSVBRDF
a PBR material of tiles, marbleDreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (284)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (285)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (286)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (287)
a PBR material of wood, indoorDreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (288)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (289)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (290)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (291)
a PBR material of tiles, art deco style tiles, vintage, indoor, decorativeDreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (292)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (293)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (294)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (295)
a PBR material of fabric, patchwork quilt, colorful, indoor, beddingDreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (296)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (297)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (298)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (299)
a PBR material of fabric, hand woven carpet, cute bunny, artisan, indoorDreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (300)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (301)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (302)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (303)

\Description

fig:multiModal

DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (304)

\Description

fig:shape

MaterialGANTileGenOurs
StoneDreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (305)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (306)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (307)
DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (308)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (309)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (310)
DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (311)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (312)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (313)
DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (314)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (315)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (316)
MetalDreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (317)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (318)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (319)
DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (320)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (321)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (322)
DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (323)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (324)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (325)
DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (326)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (327)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (328)

\Description

fig:materialgan

OursDreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (329)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (330)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (331)
TileGenDreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (332)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (333)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (334)
OursDreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (335)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (336)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (337)
TileGenDreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (338)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (339)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (340)
OursDreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (341)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (342)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (343)
TileGenDreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (344)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (345)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (346)
OursDreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (347)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (348)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (349)
TileGenDreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (350)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (351)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (352)

\Description

fig:tilegen_con

RenderSVBRDFRenderSVBRDF
ReferenceDreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (353)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (354)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (355)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (356)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (357)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (358)
w/o rendersubscriptrender\mathcal{L}_{\text{render}}caligraphic_L start_POSTSUBSCRIPT render end_POSTSUBSCRIPTDreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (359)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (360)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (361)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (362)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (363)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (364)
OursDreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (365)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (366)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (367)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (368)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (369)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (370)
LPIPSRMSE
RenderAlbedoMetallicNormalRoughness
w/o rendersubscriptrender\mathcal{L}_{\text{render}}caligraphic_L start_POSTSUBSCRIPT render end_POSTSUBSCRIPT0.1070.03610.01260.05420.0406
Ours(w/ rendersubscriptrender\mathcal{L}_{\text{render}}caligraphic_L start_POSTSUBSCRIPT render end_POSTSUBSCRIPT)0.1010.03570.00860.05310.0365

\Description

fig:decoder

Reference
DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (371)
ReferenceLow Res.Pretrainedw/o rendersubscriptrender\mathcal{L}_{\text{render}}caligraphic_L start_POSTSUBSCRIPT render end_POSTSUBSCRIPTOurs(w/ rendersubscriptrender\mathcal{L}_{\text{render}}caligraphic_L start_POSTSUBSCRIPT render end_POSTSUBSCRIPT)
DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (372)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (373)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (374)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (375)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (376)
LPIPSRMSE
RenderAlbedoMetal.NormalRough.
Pretrained0.4500.02720.08160.05980.0588
w/o rendersubscriptrender\mathcal{L}_{\text{render}}caligraphic_L start_POSTSUBSCRIPT render end_POSTSUBSCRIPT0.3420.02480.06520.04740.0451
Ours(w/ rendersubscriptrender\mathcal{L}_{\text{render}}caligraphic_L start_POSTSUBSCRIPT render end_POSTSUBSCRIPT)0.3210.02110.06430.03980.0445

\Description

fig:SR

w/o HAw/ HAw/o HAw/ HAw/o HAw/ HA
DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (377)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (378)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (379)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (380)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (381)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (382)
DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (383)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (384)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (385)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (386)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (387)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (388)
Inputw/o HAw/ HAInputw/o HAw/ HA
DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (389)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (390)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (391)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (392)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (393)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (394)
InputRefer.w/o HAw/ HAInputRefer.w/o HAw/ HA
DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (395)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (396)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (397)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (398)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (399)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (400)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (401)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (402)
highlight inputsnon-highlight inputs
L1PSNRLPIPSL1PSNRLPIPS
w/o HA0.040925.74600.19280.020133.26210.1220
w/ HA0.021132.65780.14520.020233.29040.1241

\Description

fig:highlight_decoder

w/o ftDreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (403)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (404)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (405)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (406)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (407)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (408)
OursDreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (409)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (410)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (411)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (412)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (413)DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (414)

\Description

fig:ablation_pixel

4.1.1. Dataset Generation

Our dataset comprises a total of 711 PBR materials, each including four 2k𝑘kitalic_k texture maps: albedo, normal, metallic, and roughness, along with corresponding textual labels. The data are sourced from PolyHaven 111https://polyhaven.com/ and freePBR 222https://freepbr.com/. We categorized the data into ten types manually: Brick (58), Fabric (60), Ground (99), Leather (45), Metal (130), Organic (45), Plastic (40), Tile (75), Wall (69), and Wood (90).

The input text prompt 𝒯𝒯\mathcal{T}caligraphic_T is in the format of “a PBR material of [type], [name], [tags]” during the finetuning of material-LDM, where ‘type’ refers to the type of material, ‘name’ (title name) and ‘tags’ for each material are given by the website. These tags are randomly retained at a ratio of 30%100%percent30percent10030\%-100\%30 % - 100 % during training. To address the issue of uneven distribution in the original data, we selected high-quality and representative data within categories of large volumes and randomly duplicated existing data for categories with smaller data volumes, which helps to balance the sample sizes across all categories, ensuring more uniform training data distribution.

For the 2k𝑘kitalic_k textures we obtained, we perform horizontal flipping, vertical flipping, random rotation, and multi-scale cropping and adjust the direction of the normal maps accordingly, eventually resizing them to 512×512512512512\times 512512 × 512 pixels as our training data.After augmenting textures, we render each of them with randomly sampled viewpoints and lightings by Laine etal. (2020). The rendering images are also used to train our highlight-aware albedo decoder.

Concerning the paired data for training ControlNet, we utilized Pidinet (Su etal., 2021) to extract sketches from the albedo maps as mentioned in Section3.4.1.

4.1.2. Other Details

DreamPBR was trained on quadruple Nvidia RTX 3090 GPUs. During the training of Material LDM, we employ Adam as our optimizer with a base learning rate of 1.6×1031.6superscript1031.6\times 10^{-3}1.6 × 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT and closed learning rate scaling. Starting with the stable-diffusion-v1-5 checkpoint for 9000 epochs, we finetuned it for approximately 10 days. For the training of the PBR decoder, we set the base learning rate to 4.5×1064.5superscript1064.5\times 10^{-6}4.5 × 10 start_POSTSUPERSCRIPT - 6 end_POSTSUPERSCRIPT and enabled scale_lr, taking 4 days total in which the output channels of the decoder were set to 8, with albedo and normal having three channels each, and metallic and roughness being single-channel. For the highlight-aware albedo decoder, we set the base learning rate to 4.5×1064.5superscript1064.5\times 10^{-6}4.5 × 10 start_POSTSUPERSCRIPT - 6 end_POSTSUPERSCRIPT and enabled scale_lr, taking 2 days total in which the output channels of the decoder were set to 3. We incorporate rendering loss as detailed in Section Section3.3 during the training process above.

During the training of the Rendering-aware super-resolution module, we initially utilized the preset weights from Real-ESRGAN (Wang etal., 2021) and finetuned four super-resolution modules specifically for albedo, normal, metallic, and roughness textures. These modules were finetuned using the learning rates of 1×1041superscript1041\times 10^{-4}1 × 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT and 10000 total iter. Furthermore, we combined the training of all four modules in a model to render the result of each module during training and incorporated rendering loss.

To enhance image control performance, we set the learning rate to 1×1051superscript1051\times 10^{-5}1 × 10 start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT for training ControlNet, which requires about 2 days to complete. For Style Control, we directly utilize the ip-adapter_sd15 checkpoint along with our finetuned checkpoint, as we have observed satisfactory results.

4.2. Generation Results

DreamPBR is capable of generating realistic or magic materials with only descriptions. To demonstrate the ability to synthesize wide materials,we obtain a mount of descriptions of materials in by LLM for each type, which is used to sample materials with DreamPBR. The generated textures are enhanced by the super-resolution module and are then rendered as shown in Figure3.In our sampled 400 textures, they show high consistency with text and the mean of CLIP Score between rendering images and given prompts is 30.198.Besides the consistency of text and images, the diversity of results is quite important for text-driven generative models as well. As demonstrated in Figure7, we further sample several textures with the same prompt but different random seeds, DreamPBR succeeds in producing diverse textures that follow the descriptions we specify.

4.2.1. Tileable texture generation

Although the users would introduce various controls, we can generate seamless tileable textures all the time, which allows users to apply the generated textures in different scales and different scenes. In Figure8, we present several tileable textures from direct and guided generation with their splicing results, showing the effectiveness of circular padding in our method as mentioned in Section3.2.

4.2.2. Results of Pixel Control

By finetuning an additional ControlNet, DreamPBR is able to generate textures according to given patterns. In practice, a designer could decide on a pattern in advance, and then try different materials. It may also be the other way around. For those two situations, DreamPBR ensures reasonable textures for certain patterns or materials as demonstrated in Figure4 and Figure5.

With additional control of binary images, inpainting is also a usual method for users to obtain specified results so we present several inpainting results in Figure9 to replace a region in texture with another region users describe.

4.2.3. Results of Style Control

A styled image expresses more easily for a person than only text like Su etal. (2021) does. To do so, we evaluate the adaptation of Su etal. (2021) for our Style Control. Specifically, we obtain several styled images online and present the generation results under different styles from images as shown in Figure6. Figure10 illustrates the situation in that users would like to combine Style Control with Pixel Control, which enables users to generate the results they want more freely.

4.2.4. Results of Shape Control

With the ability to generate various textures, DreamPBR can be extended to non-planar objects such as chairs. Specifically by giving a segmented object, we can utilize dialogue with a large language model to get different descriptions of each region. For more specified objects, a more direct way is to be in conjunction with cropped areas from exemplar images used with pixel control and style control. Thanks to the tileable features, the results from our pipeline of Shape Control are shown in Figure11.

4.3. Comparative Experiments

Leveraging the state-of-the-art generative model, StableDiffusion, DreamPBR is very competitive with previous methods for materials generation.We compare the results generated from DreamPBR of different materials against MaterialGAN (Guo etal., 2020) and TileGen (Zhou etal., 2022) in Figure12. Notably, there are only two categories provided in the competing methods so our results are generated by giving prompts, “a PBR material of ground, stone” and “a PBR material of metal”. The comparison shows that DreamPBR can generate textures following the distribution of realistic data from datasets like GAN-based methods as well as magic textures from prior information for 2D images.

Moreover, we compare our Pixel Control with those of TileGen in generation with sketches guidance. The comparison results are shown in Figure13, in which we demonstrate different generation results of TileGen and ours with the same binary masks. DreamPBR surpasses TileGen in sketches-driven generation and shows fewer artifacts and more precise controls than previous research on material generation like TileGen.

4.4. Ablation Study

The training of DreamPBR consists of some alternative modules and additional loss functions. In this section, we focus on evaluating the effect of each of the designs. To evaluate them, we randomly selected 100 textures from our obtained data that were not used in the whole training stage.

4.4.1. PBR Decoder

When the PBR Decoder is trained, we introduce rendersubscriptrender\mathcal{L}_{\text{render}}caligraphic_L start_POSTSUBSCRIPT render end_POSTSUBSCRIPT to solve the regression problem from images rendered with random lights and viewpoints, which enforces that the decoded textures are realistic after being rendered. It reduces the search space of output values compared to the one that rendering images is not used.We trained two PBR decoders with and without rendersubscriptrender\mathcal{L}_{\text{render}}caligraphic_L start_POSTSUBSCRIPT render end_POSTSUBSCRIPT, and evaluated their effectiveness of them by comparing the outputs with reference textures. Figure14 presents the comparison results, in which our rendering-aware decoder is capable of achieving more realistic results in rendered results and more consistent results in generated textures.

4.4.2. Super-Resolution Module

Although the super-resolution models originally show great results in natural images, we finetune it again with our material data and employ a novel rendering loss rendersubscriptrender\mathcal{L}_{\text{render}}caligraphic_L start_POSTSUBSCRIPT render end_POSTSUBSCRIPT from the level of perception. In practice, we finetune super-resolution modules for each component of textures based on the pre-trained Real-ESRGAN as our baseline. With four single modules(albedo, metallic, normal, and roughness), we jointly finetune them and introduce the rendersubscriptrender\mathcal{L}_{\text{render}}caligraphic_L start_POSTSUBSCRIPT render end_POSTSUBSCRIPT by rendering four textures after super-resolution to image space.The comparison results are shown in Figure15. Similar to the training of PBR Decoder, the finetuning super-resolution modules with rendersubscriptrender\mathcal{L}_{\text{render}}caligraphic_L start_POSTSUBSCRIPT render end_POSTSUBSCRIPT contributes to better results.

4.4.3. Highlight-aware decoder

As mentioned in Section3.3, we introduce a highlight-aware albedo decoder to remove the potential highlights in generated RGB images.For a good de-highlight module, there are two key points to be taken into account: 1) effectively removing the highlights in images, and 2) leaving them unchanged for those without highlights. In practice, only training on rendered images potentially affects the decoded albedo(without highlights), so we finetune the highlight-aware decoder by randomly choosing rendered images from different lights or pure albedo maps. Furthermore, we compare the outputs of the highlight-aware decoder with the ones of the initially pre-trained decoder in Figure16, suggesting that our decoder addresses the issues of two key points above.

4.4.4. Pixel Control

To realize the sketch-guidance control, we embed a pre-trained ControlNet in DreamPBR. However, different from the IP-Adapter for Style Control focuses on incorporating semantics of images in clip space independent of training data, the initial ControlNet leads to domain shift, from the albedo domain back to the image domain, in our experiments. To address this problem, we finetuned the ControlNet with our sketch-albedo pairs as mentioned above. The comparison of ControlNet before and after being finetuned is shown in Figure17.

4.5. Limitations

Despite the promising capabilities of DreamPBR in generating high-quality and diverse material textures, our method encounters certain limitations that merit further exploration and improvement. We employ normal maps to reveal surface details in textures. However, using normal maps without displacement maps leaves self-occlusion ignored when rendering them with those textures, which makes the rendering results unrealistic. In addition, although a more lengthy description contributes to a more detailed texture that the user wants, it is also complex work for users to produce such a detailed description like “a PBR material of the wall, concrete wall, outdoor, cracked, man-made, rough, painted…”.

5. Conclusions and Future work

In this paper, we propose DreamPBR, a novel diffusion-based generative framework for creating physically-based material textures. Our methods do not rely on large data sets as image generation does but transfer their original prior information to desired textures. Given text descriptions and other optional multi-modal conditions, we can generate textures that are highly consistent with text descriptions and the other conditions such as styles of RGB images and patterns of binary images. By using DreamPBR, one can create planar textures freely according to their imagination. Specifically, we start with finetuning diffusion models for albedo generation and then decompose albedo to other SVBRDFs(normal, metallic, and roughness) by our highlight-aware decoder and PBR Decoder. For higher-resolution textures, we easily introduce an additional loss function in rendering images to our super-resolution module and bring significant improvement visually. With the properties above, DreamPBR can also produce some textures for simple geometries by dialogue with LLM.

For future work, although DreamPBR currently targets planar textures, it could be extended to complex geometries with further development of retopology. Additionally, because of our effective PBR Decoder and highlight-aware decoder, DreamPBR has the potential to be used in SVBRDF estimation.Lastly, there are inevitably problems such as limited resolution and time-consuming inference when utilizing diffusion models, which is also a challenging problem in the future.

References

  • (1)
  • Achiam etal. (2023)Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, FlorenciaLeoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, etal. 2023.Gpt-4 technical report.arXiv preprint arXiv:2303.08774 (2023).
  • Aittala etal. (2016)Miika Aittala, Timo Aila, and Jaakko Lehtinen. 2016.Reflectance Modeling by Neural Texture Synthesis.ACM Trans. Graph. 35, 4, Article 65 (jul 2016), 13pages.https://doi.org/10.1145/2897824.2925917
  • Aittala etal. (2015)Miika Aittala, Tim Weyrich, and Jaakko Lehtinen. 2015.Two-Shot SVBRDF Capture for Stationary Materials.ACM Trans. Graph. 34, 4, Article 110 (jul 2015), 13pages.https://doi.org/10.1145/2766967
  • Deitke etal. (2023)Matt Deitke, Ruoshi Liu, Matthew Wallingford, Huong Ngo, Oscar Michel, Aditya Kusupati, Alan Fan, Christian Laforte, Vikram Voleti, SamirYitzhak Gadre, Eli VanderBilt, Aniruddha Kembhavi, Carl Vondrick, Georgia Gkioxari, Kiana Ehsani, Ludwig Schmidt, and Ali Farhadi. 2023.Objaverse-XL: A Universe of 10M+ 3D Objects.arXiv:2307.05663[cs.CV]
  • DenisZavadski and Rother (2023)Johann-FriedrichFeiden DenisZavadski and Carsten Rother. 2023.ControlNet-XS: Designing an Efficient and Effective Architecture for Controlling Text-to-Image Diffusion Models.(2023).
  • Deschaintre etal. (2018a)Valentin Deschaintre, Miika Aittala, Fredo Durand, George Drettakis, and Adrien Bousseau. 2018a.Single-image svbrdf capture with a rendering-aware deep network.ACM Transactions on Graphics (ToG) 37, 4 (2018), 1–15.
  • Deschaintre etal. (2018b)Valentin Deschaintre, Miika Aittala, Fredo Durand, George Drettakis, and Adrien Bousseau. 2018b.Single-image svbrdf capture with a rendering-aware deep network.ACM Transactions on Graphics (ToG) 37, 4 (2018), 1–15.
  • Deschaintre etal. (2019)Valentin Deschaintre, Miika Aittala, Fr’edo Durand, George Drettakis, and Adrien Bousseau. 2019.Flexible SVBRDF Capture with a Multi-Image Deep Network.Computer Graphics Forum (Proceedings of the Eurographics Symposium on Rendering) 38, 4 (July 2019).http://www-sop.inria.fr/reves/Basilic/2019/DADDB19
  • Dong (2019)Yue Dong. 2019.Deep appearance modeling: A survey.Visual Informatics 3, 2 (2019), 59–68.https://doi.org/10.1016/j.visinf.2019.07.003
  • Dong etal. (2014)Yue Dong, Guojun Chen, Pieter Peers, Jiawan Zhang, and Xin Tong. 2014.Appearance-from-Motion: Recovering Spatially Varying Surface Reflectance under Unknown Lighting.ACM Trans. Graph. 33, 6, Article 193 (nov 2014), 12pages.https://doi.org/10.1145/2661229.2661283
  • Gao etal. (2019)Duan Gao, Xiao Li, Yue Dong, Pieter Peers, Kun Xu, and Xin Tong. 2019.Deep inverse rendering for high-resolution svbrdf estimation from an arbitrary number of images.ACM Transactions on Graphics (TOG) 38, 4 (2019), 1–15.
  • Goodfellow etal. (2014)IanJ. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014.Generative Adversarial Networks.(2014).arXiv:1406.2661[stat.ML]
  • Guarnera etal. (2016)D. Guarnera, G.C. Guarnera, A. Ghosh, C. Denk, and M. Glencross. 2016.BRDF Representation and Acquisition.Computer Graphics Forum 35, 2 (2016), 625–650.https://doi.org/10.1111/cgf.12867arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1111/cgf.12867
  • Guerrero etal. (2022)Paul Guerrero, Milos Hasan, Kalyan Sunkavalli, Radomir Mech, Tamy Boubekeur, and Niloy Mitra. 2022.MatFormer: A Generative Model for Procedural Materials.ACM Trans. Graph. 41, 4, Article 46 (2022).https://doi.org/10.1145/3528223.3530173
  • Guo etal. (2021)Jie Guo, Shuichang Lai, Chengzhi Tao, Yuelong Cai, Lei Wang, Yanwen Guo, and Ling-Qi Yan. 2021.Highlight-Aware Two-Stream Network for Single-Image SVBRDF Acquisition.ACM Trans. Graph. 40, 4, Article 123 (jul 2021), 14pages.https://doi.org/10.1145/3450626.3459854
  • Guo etal. (2023)Jie Guo, Shuichang Lai, Qinghao Tu, Chengzhi Tao, Changqing Zou, and Yanwen Guo. 2023.Ultra-High Resolution SVBRDF Recovery from a Single Image.ACM Trans. Graph. 42, 3, Article 33 (jun 2023), 14pages.https://doi.org/10.1145/3593798
  • Guo etal. (2020)Yu Guo, Cameron Smith, Miloš Hašan, Kalyan Sunkavalli, and Shuang Zhao. 2020.MaterialGAN: Reflectance Capture Using a Generative SVBRDF Model.ACM Trans. Graph. 39, 6, Article 254 (nov 2020), 13pages.https://doi.org/10.1145/3414685.3417779
  • Henzler etal. (2021)Philipp Henzler, Valentin Deschaintre, NiloyJ. Mitra, and Tobias Ritschel. 2021.Generative Modelling of BRDF Textures from Flash Images.ACM Trans. Graph. 40, 6, Article 284 (dec 2021), 13pages.https://doi.org/10.1145/3478513.3480507
  • Ho etal. (2020)Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020.Denoising Diffusion Probabilistic Models.arXiv:2006.11239[cs.LG]
  • Hu etal. (2022c)EdwardJ Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. 2022c.LoRA: Low-Rank Adaptation of Large Language Models. In International Conference on Learning Representations.https://openreview.net/forum?id=nZeVKeeFYf9
  • Hu etal. (2022d)Ruizhen Hu, Xiangyu Su, Xiangkai Chen, Oliver van Kaick, and Hui Huang. 2022d.Photo-to-Shape Material Transfer for Diverse Structures.ACM Transactions on Graphics (Proceedings of SIGGRAPH) 39, 6 (2022), 113:1–113:14.
  • Hu etal. (2019)Yiwei Hu, Julie Dorsey, and Holly Rushmeier. 2019.A Novel Framework for Inverse Procedural Texture Modeling.ACM Trans. Graph. 38, 6, Article 186 (nov 2019), 14pages.https://doi.org/10.1145/3355089.3356516
  • Hu etal. (2022a)Yiwei Hu, Paul Guerrero, Milos Hasan, Holly Rushmeier, and Valentin Deschaintre. 2022a.Node Graph Optimization Using Differentiable Proxies. In ACM SIGGRAPH 2022 Conference Proceedings (Vancouver, BC, Canada) (SIGGRAPH ’22). Association for Computing Machinery, New York, NY, USA, Article 5, 9pages.https://doi.org/10.1145/3528233.3530733
  • Hu etal. (2023)Yiwei Hu, Paul Guerrero, Milos Hasan, Holly Rushmeier, and Valentin Deschaintre. 2023.Generating Procedural Materials from Text or Image Prompts. In Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Proceedings (SIGGRAPH ’23). ACM.https://doi.org/10.1145/3588432.3591520
  • Hu etal. (2022b)Yiwei Hu, Chengan He, Valentin Deschaintre, Julie Dorsey, and Holly Rushmeier. 2022b.An Inverse Procedural Modeling Pipeline for SVBRDF Maps.ACM Trans. Graph. 41, 2, Article 18 (jan 2022), 17pages.https://doi.org/10.1145/3502431
  • Hui etal. (2017)Zhuo Hui, Kalyan Sunkavalli, Joon-Young Lee, Sunil Hadap, Jian Wang, and AswinC. Sankaranarayanan. 2017.Reflectance Capture Using Univariate Sampling of BRDFs. In 2017 IEEE International Conference on Computer Vision (ICCV). 5372–5380.https://doi.org/10.1109/ICCV.2017.573
  • Isola etal. (2018)Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and AlexeiA. Efros. 2018.Image-to-Image Translation with Conditional Adversarial Networks.arXiv:1611.07004[cs.CV]
  • Karras etal. (2018)Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. 2018.Progressive Growing of GANs for Improved Quality, Stability, and Variation.arXiv:1710.10196[cs.NE]
  • Karras etal. (2021)Tero Karras, Miika Aittala, Samuli Laine, Erik Härkönen, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. 2021.Alias-Free Generative Adversarial Networks. In Proc. NeurIPS.
  • Karras etal. (2019)Tero Karras, Samuli Laine, and Timo Aila. 2019.A Style-Based Generator Architecture for Generative Adversarial Networks.(2019).arXiv:1812.04948[cs.NE]
  • Karras etal. (2020)Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. 2020.Analyzing and Improving the Image Quality of StyleGAN. In Proc. CVPR.
  • Kingma and Welling (2014)DiederikP. Kingma and Max Welling. 2014.Auto-Encoding Variational Bayes. In 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings.
  • Kodali etal. (2017)Naveen Kodali, Jacob Abernethy, James Hays, and Zsolt Kira. 2017.On Convergence and Stability of GANs.arXiv:1705.07215[cs.AI]
  • Laine etal. (2020)Samuli Laine, Janne Hellsten, Tero Karras, Yeongho Seol, Jaakko Lehtinen, and Timo Aila. 2020.Modular Primitives for High-Performance Differentiable Rendering.ACM Transactions on Graphics 39, 6 (2020).
  • Li etal. (2017)Xiao Li, Yue Dong, Pieter Peers, and Xin Tong. 2017.Modeling surface appearance from a single photograph using self-augmented convolutional neural networks.ACM Transactions on Graphics (ToG) 36, 4 (2017), 1–11.
  • Li etal. (2019)Xiao Li, Yue Dong, Pieter Peers, and Xin Tong. 2019.Synthesizing 3D Shapes from Silhouette Image Collections using Multi-projection Generative Adversarial Networks.arXiv:1906.03841[cs.CV]
  • Li etal. (2021)Yuheng Li, Yijun Li, Jingwan Lu, Eli Shechtman, YongJae Lee, and KrishnaKumar Singh. 2021.Collaging class-specific gans for semantic image synthesis. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 14418–14427.
  • Li etal. (2018)Zhengqin Li, Kalyan Sunkavalli, and Manmohan Chandraker. 2018.Materials for Masses: SVBRDF Acquisition with a Single Mobile Phone Image. In Computer Vision – ECCV 2018: 15th European Conference, Munich, Germany, September 8–14, 2018, Proceedings, Part III (Munich, Germany). Springer-Verlag, Berlin, Heidelberg, 74–90.https://doi.org/10.1007/978-3-030-01219-9_5
  • Lin etal. (2023)Chen-Hsuan Lin, Jun Gao, Luming Tang, Towaki Takikawa, Xiaohui Zeng, Xun Huang, Karsten Kreis, Sanja Fidler, Ming-Yu Liu, and Tsung-Yi Lin. 2023.Magic3d: High-resolution text-to-3d content creation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 300–309.
  • Liu etal. (2022)Luping Liu, Yi Ren, Zhijie Lin, and Zhou Zhao. 2022.Pseudo Numerical Methods for Diffusion Models on Manifolds.arXiv:2202.09778[cs.CV]
  • Liu etal. (2023)Ruoshi Liu, Rundi Wu, BasileVan Hoorick, Pavel Tokmakov, Sergey Zakharov, and Carl Vondrick. 2023.Zero-1-to-3: Zero-shot One Image to 3D Object.arXiv:2303.11328[cs.CV]
  • Mou etal. (2023)Chong Mou, Xintao Wang, Liangbin Xie, Yanze Wu, Jian Zhang, Zhongang Qi, Ying Shan, and Xiaohu Qie. 2023.T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models.arXiv preprint arXiv:2302.08453 (2023).
  • Palma etal. (2012)Gianpaolo Palma, Marco Callieri, Matteo Dellepiane, and Roberto Scopigno. 2012.A Statistical Method for SVBRDF Approximation from Video Sequences in General Lighting Conditions.Computer Graphics Forum (2012).https://doi.org/10.1111/j.1467-8659.2012.03145.x
  • Park etal. (2018)Keunhong Park, Konstantinos Rematas, Ali Farhadi, and StevenM. Seitz. 2018.PhotoShape: Photorealistic Materials for Large-Scale Shape Collections.ACM Trans. Graph. 37, 6, Article 192 (Nov. 2018).
  • Park etal. (2019)Taesung Park, Ming-Yu Liu, Ting-Chun Wang, and Jun-Yan Zhu. 2019.Semantic Image Synthesis with Spatially-Adaptive Normalization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
  • Poole etal. (2022)Ben Poole, Ajay Jain, JonathanT. Barron, and Ben Mildenhall. 2022.DreamFusion: Text-to-3D using 2D Diffusion.arXiv (2022).
  • Radford etal. (2021)Alec Radford, JongWook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, etal. 2021.Learning transferable visual models from natural language supervision. In International conference on machine learning. PMLR, 8748–8763.
  • Ramesh etal. (2022)Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. 2022.Hierarchical text-conditional image generation with clip latents.arXiv preprint arXiv:2204.06125 1, 2 (2022), 3.
  • Reed etal. (2016a)Scott Reed, Zeynep Akata, Santosh Mohan, Samuel Tenka, Bernt Schiele, and Honglak Lee. 2016a.Learning What and Where to Draw.arXiv:1610.02454[cs.CV]
  • Reed etal. (2016b)Scott Reed, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran, Bernt Schiele, and Honglak Lee. 2016b.Generative Adversarial Text to Image Synthesis.arXiv:1605.05396[cs.NE]
  • Rezende etal. (2014)DaniloJimenez Rezende, Shakir Mohamed, and Daan Wierstra. 2014.Stochastic backpropagation and approximate inference in deep generative models. In International conference on machine learning. PMLR, 1278–1286.
  • Riviere etal. (2016)J. Riviere, P. Peers, and A. Ghosh. 2016.Mobile Surface Reflectometry.Computer Graphics Forum 35, 1 (2016), 191–202.https://doi.org/10.1111/cgf.12719arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1111/cgf.12719
  • Rombach etal. (2022)Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2022.High-Resolution Image Synthesis with Latent Diffusion Models.arXiv:2112.10752[cs.CV]
  • Ronneberger etal. (2015)Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015.U-Net: Convolutional Networks for Biomedical Image Segmentation.arXiv:1505.04597[cs.CV]
  • Sartor and Peers (2023)Sam Sartor and Pieter Peers. 2023.Matfusion: a generative diffusion model for svbrdf capture. In SIGGRAPH Asia 2023 Conference Papers. 1–10.
  • Shi etal. (2020)Liang Shi, Beichen Li, Miloš Hašan, Kalyan Sunkavalli, Tamy Boubekeur, Radomir Mech, and Wojciech Matusik. 2020.MATch: Differentiable Material Graphs for Procedural Material Capture.ACM Trans. Graph. 39, 6 (Dec. 2020), 1–15.
  • Shi etal. (2023)Ruoxi Shi, Hansheng Chen, Zhuoyang Zhang, Minghua Liu, Chao Xu, Xinyue Wei, Linghao Chen, Chong Zeng, and Hao Su. 2023.Zero123++: a Single Image to Consistent Multi-view Diffusion Base Model.arXiv:2310.15110[cs.CV]
  • Sohl-Dickstein etal. (2015)Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. 2015.Deep unsupervised learning using nonequilibrium thermodynamics. In International conference on machine learning. PMLR, 2256–2265.
  • Song etal. (2020)Jiaming Song, Chenlin Meng, and Stefano Ermon. 2020.Denoising diffusion implicit models.arXiv preprint arXiv:2010.02502 (2020).
  • Song etal. (2021)Yang Song, Jascha Sohl-Dickstein, DiederikP Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. 2021.Score-Based Generative Modeling through Stochastic Differential Equations. In International Conference on Learning Representations.https://openreview.net/forum?id=PxTIG12RRHS
  • Su etal. (2021)Zhuo Su, Wenzhe Liu, Zitong Yu, Dewen Hu, Qing Liao, Qi Tian, Matti Pietikäinen, and Li Liu. 2021.Pixel difference networks for efficient edge detection. In Proceedings of the IEEE/CVF international conference on computer vision. 5117–5127.
  • Tang etal. (2023)Jiaxiang Tang, Jiawei Ren, Hang Zhou, Ziwei Liu, and Gang Zeng. 2023.DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation.arXiv preprint arXiv:2309.16653 (2023).
  • Tulyakov etal. (2017)Sergey Tulyakov, Ming-Yu Liu, Xiaodong Yang, and Jan Kautz. 2017.MoCoGAN: Decomposing Motion and Content for Video Generation.arXiv:1707.04993[cs.CV]
  • Vecchio etal. (2023)Giuseppe Vecchio, Rosalie Martin, Arthur Roullier, Adrien Kaiser, Romain Rouffet, Valentin Deschaintre, and Tamy Boubekeur. 2023.ControlMat: Controlled Generative Approach to Material Capture.arXiv preprint arXiv:2309.01700 (2023).
  • Walter etal. (2007)Bruce Walter, StephenR. Marschner, Hongsong Li, and KennethE. Torrance. 2007.Microfacet Models for Refraction through Rough Surfaces. In Proceedings of the 18th Eurographics Conference on Rendering Techniques (Grenoble, France) (EGSR’07). Eurographics Association, Goslar, DEU, 195–206.
  • Wang etal. (2021)Xintao Wang, Liangbin Xie, Chao Dong, and Ying Shan. 2021.Real-esrgan: Training real-world blind super-resolution with pure synthetic data. In Proceedings of the IEEE/CVF international conference on computer vision. 1905–1914.
  • Wang etal. (2023)Zhengyi Wang, Cheng Lu, Yikai Wang, Fan Bao, Chongxuan Li, Hang Su, and Jun Zhu. 2023.ProlificDreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation.arXiv preprint arXiv:2305.16213 (2023).
  • Weinmann and Klein (2015)Michael Weinmann and Reinhard Klein. 2015.Advances in Geometry and Reflectance Acquisition (Course Notes). In SIGGRAPH Asia 2015 Courses (Kobe, Japan) (SA ’15). Association for Computing Machinery, New York, NY, USA, Article 1, 71pages.https://doi.org/10.1145/2818143.2818165
  • Xu etal. (2016)Zexiang Xu, JannikBoll Nielsen, Jiyang Yu, HenrikWann Jensen, and Ravi Ramamoorthi. 2016.Minimal BRDF Sampling for Two-Shot near-Field Reflectance Acquisition.ACM Trans. Graph. 35, 6, Article 188 (dec 2016), 12pages.https://doi.org/10.1145/2980179.2982396
  • Ye etal. (2023)Hu Ye, Jun Zhang, Sibo Liu, Xiao Han, and Wei Yang. 2023.IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models.(2023).
  • Zhang etal. (2023)Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. 2023.Adding Conditional Control to Text-to-Image Diffusion Models.
  • Zhang etal. (2018)Richard Zhang, Phillip Isola, AlexeiA Efros, Eli Shechtman, and Oliver Wang. 2018.The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In CVPR.
  • Zhou etal. (2022)Xilong Zhou, Miloš Hašan, Valentin Deschaintre, Paul Guerrero, Kalyan Sunkavalli, and Nima Kalantari. 2022.TileGen: Tileable, Controllable Material Generation and Capture.(2022).arXiv:2206.05649[cs.GR]
  • Zhou etal. (2016)Zhiming Zhou, Guojun Chen, Yue Dong, David Wipf, Yong Yu, John Snyder, and Xin Tong. 2016.Sparse-as-Possible SVBRDF Acquisition.ACM Trans. Graph. 35, 6, Article 189 (dec 2016), 12pages.https://doi.org/10.1145/2980179.2980247
  • Zhu etal. (2020)Jun-Yan Zhu, Taesung Park, Phillip Isola, and AlexeiA. Efros. 2020.Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks.arXiv:1703.10593[cs.CV]
  • Zhu etal. (2019)Minfeng Zhu, Pingbo Pan, Wei Chen, and Yi Yang. 2019.DM-GAN: Dynamic Memory Generative Adversarial Networks for Text-to-Image Synthesis.arXiv:1904.01310[cs.CV]
DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance (2024)

References

Top Articles
Latest Posts
Article information

Author: Golda Nolan II

Last Updated:

Views: 5359

Rating: 4.8 / 5 (58 voted)

Reviews: 81% of readers found this page helpful

Author information

Name: Golda Nolan II

Birthday: 1998-05-14

Address: Suite 369 9754 Roberts Pines, West Benitaburgh, NM 69180-7958

Phone: +522993866487

Job: Sales Executive

Hobby: Worldbuilding, Shopping, Quilting, Cooking, Homebrewing, Leather crafting, Pet

Introduction: My name is Golda Nolan II, I am a thoughtful, clever, cute, jolly, brave, powerful, splendid person who loves writing and wants to share my knowledge and understanding with you.