[vc_section][vc_row full_width=”stretch_row”][vc_column][stack_text_image layout=”right” image=”2319″][vc_column_text]

Artificial Intelligence for Medical, Healthcare & Beauty Industries

Generative Adversarial Networks (GANs) convergence automation using positive & negative feedback loops.

Computer vision based A.I. for use in the Medical, Healthcare and Beauty segments requires a significant time commitment to help achieve useful models.  The models increase in capacity and do not reach convergence without time investment. Read how we approach this challenge.

Mark Swartz[/vc_column_text][vc_empty_space height=”2px”][/stack_text_image][/vc_column][/vc_row][/vc_section][vc_row css=”.vc_custom_1583709465471{margin-top: -50px !important;margin-bottom: -100px !important;padding-top: -20px !important;padding-bottom: -20px !important;}”][vc_column][stack_boxed_content background=”bg–secondary”][vc_column_text]


[/vc_column_text][/stack_boxed_content][/vc_column][/vc_row][vc_section css=”.vc_custom_1583709492229{margin-top: -100px !important;}”][vc_row][vc_column width=”7/12″][vc_column_text]


Generative adversarial networks (GANs) have been widely studied since 2014. There are limited comprehensive studies that explain the connections among different GANs alternatives or solutions to problems in using them for generative modeling. GANs are not like other methods in machine learning in that they are best described as a dual-agent tug of war between a generator and discriminator. This relationship yield is unreliable in the initial training process and the subsequent goal of achieving a balance (Convergence) between these two agents. This document aims to help explain our research team’s education during our extensive lab research, which led us to a solution for automation of convergence with positive/negative feedback loops.[/vc_column_text][/vc_column][vc_column width=”2/12″][/vc_column][vc_column width=”3/12″][vc_empty_space height=”20px”][vc_column_text]

Research Info

Started: July — 24th, 2019[/vc_column_text][vc_empty_space height=”20px”][vc_column_text]


Palo Alto, United States
Cupertino, United States[/vc_column_text][vc_empty_space height=”20px”][vc_column_text]


Deep Learning, Generative Adversarial Networks, Algorithm, Theory, Applications.

[/vc_column_text][vc_empty_space height=”20px”][stack_call_to_action layout=”button-label” intro=”Contact” middle=”Middle Text” button_text=”Data Science”][/vc_column][/vc_row][/vc_section][vc_section css=”.vc_custom_1583020882993{background-color: #fafafa !important;}”][vc_row][vc_column width=”1/2″][vc_empty_space][vc_column_text]


Generative Adversarial Networks (GAN) was introduced into deep learning by Goodfellow et al. (1). GAN, a form of generative models, is trained in an adversarial deep neural network, as can be understood from its name. More specifically, GAN learns the generative model of data distribution through adversarial methods. GAN is the most successful generative model developed in recent years and has become one of the most exciting research directions in the field of artificial intelligence. Because of its excellent performance, GAN attracts notable attention since it was introduced. It is especially important that GAN can not only be used as a generative model with excellent performance.

In general, deep learning models can be divided into discriminant models and generative models (9). In the perspective of probability and statistical theory, a discriminant model is a method of modeling the relationship between unknown data y and known data x. A generating model refers to a model that can randomly generate observations, especially under the condition of given some implicit parameters (10). Due to the invention of algorithms such as Back Propagation (BP) and Dropout, the discriminant model has been evolved rapidly. The development of the generative model is lagged due to the difficulty of modeling, though the generative model has a pivotal role in the history of machine learning. When processing large amounts of data, such as images, speech, text, genomics, etc., the generative models can help us simulate the distribution of these high-dimensional data. It will be beneficial for many applications, such as super-resolution, data augmentation, image and medical image conversion, caption generation, electronic health records data generation, biomedical data generation, data imputation, and other ill-posed problems (1115).[/vc_column_text][/vc_column][vc_column width=”1/2″][vc_empty_space height=”75px”][vc_column_text]Likelihood describes the probability of the event under different conditions when the results are known (16). Sometimes we may not know the distribution function, but we know the observed data. Therefore, the maximum likelihood estimation is applied to evaluate model parameters using the observed data. Traditional generative models such as Restricted Boltzmann Machine (RBM) (1718), Gaussian Mixture Model (GMM) (19), Naive Bayes Model (NBM) (20), Hidden Markov Model (HMM) (20) and so on, are mostly based on maximum likelihood estimate. However, while the explicitly defined probability density function brings computational tractability, maximum likelihood estimation may not represent the complexity of the actual data distribution and cannot learn the high-dimensional data distributions.

The majority of generative models require the utilization of Markov chains. GAN uses latent codes to express latent dimensions, control data implicit relationships, etc., and does not require Markov chains (21). Adversarial networks can represent very sharp, even degenerate distributions, while Markov chain-based approaches require somewhat ambiguous distributions so that the chains can be mixed between patterns. Various types of loss functions can be integrated into GAN models. This allows different types of loss functions to be designed for different tasks, all of which can be learned and optimized under the GAN framework. GAN is also a nonparametric modeling method and does not require an approximate distribution of training data to be defined in advance. When probability density is not computable, some traditional generative models that rely on the statistical interpretation of data cannot be used for learning and application. But GAN can still be used in such cases.[/vc_column_text][/vc_column][/vc_row][/vc_section][vc_row][vc_column width=”1/2″][vc_empty_space][vc_column_text]

Principles of GAN

In this section, we will introduce the architecture and specific principles of GAN. Basic GAN model is composed of an input vector, a generator, and a discriminator. The generator and discriminator are implicit function expressions, usually implemented by deep neural networks (22).

We use abstract mathematical language to explain the basic principles of the GAN. The fixed distribution Pdata(x) is usually calculated based on the assumption that the data distribution for the training sample x is Pdata. However, this distribution is difficult to be determined. The traditional methods assume that the distribution Pdata(x) obeys a Gaussian mixture distribution and uses the maximum likelihood as the solution. However, when the model is complicated, it is often unable to calculate and the resulting performance is limited (23). This is due to the limited expression ability of the Gaussian distribution itself. Thus, neural networks were proposed to define the distribution Pg(x). The generator is a neural network with parameter θg. It collects the random variable z from the prior distribution and maps it to the pseudo-sample distribution through the neural network, that is, the generated data is recorded as G(z) and the data distribution is recorded as Pg(z). The input z usually uses Gaussian noise, which is a random variable or a random variable in the potential space. According to θg, a simple input distribution can be used to generate various complex distributions. The Pg(x) generated by the generator and the real image distribution Pdata(x) should be as similar as possible (24). So, for the generator, the target is to find G=arg min DivG(Pg,Pdata) 

Then the next question is how to calculate the difference between the two distributions. If the form of Pdata(x) and Pg(x) is known, it can be calculated to make Pdata(x) and Pg(x) get close. Although we don’t know the specific distribution, we can sample from it. So, GAN proposed a very magical way, discriminator, to calculate the difference between the two distributions. The discriminator was defined by the original GAN as a binary classifier (25) with θd. During training, when the input is a real sample x, the output of discriminator should be 1, otherwise, the output goes to 0. For defining discriminator, Goodfellow et al. (1) used binary cross entropy function, which is commonly used for binary classification problems.[/vc_column_text][vc_empty_space][/vc_column][vc_column width=”1/2″][vc_empty_space height=”75px”][vc_column_text]Where ŷ is the probability that the model prediction sample is a positive example, and y is the sample label. If the sample belongs to a positive example, the value is 1; otherwise, the value is 0. A specific sample may come either from the real distribution or the generated distribution. The positive and negative cases are substituted into Pdata and Pg, respectively. 

The whole object function for discriminator is: V(G,D)=Ex~Pdata[logD(x)]+Ex~Pg[log(1D(x))] By merging Equation (1) into (3), the objective function of the basic GAN is defined by Equation (4): By optimizing this objective function, we can get a GAN model. GAN’s training can be regarded as a min–max optimization process. The generator wants to deceive the discriminator, so it tries to maximize discriminator’s output when a fake sample is presented to the discriminator. Instead, the discriminator attempts to distinguish the difference between real and false samples. Consequently, discriminator tries to maximizeV(G, D) while generator tries to minimizeV(G, D), thus forming the minimax relationship. During the training of GAN, the parameters ofG(θg) andD(θd) are continuously updated. When the generator is undergoing training, the parameters of the discriminator are fixed. The data generated by the generator is marked as fake and input into the discriminator. The error is calculated between the output of the discriminator D(G(z)) and the sample label, and the parameters of generator are updated using the error of BP algorithm. When the discriminator is undergoing training, the parameters of the generator being fixed. Discriminator gets positive sample x from the real data set, and the generator generates a negative sample G(z). The output of the discriminator and sample labels are used to calculate the error. Finally, the parameters of the discriminator are updated by the error of BP algorithm.

Ideally, the generator and discriminator are in equilibrium when Pdata(x) = Pg(x). When the generator is fixed, we can take the derivative of V (D, G) to find the optimal discriminator D*(x), as shown in the Equation (5). By substituting the optimal discriminator in the Equation (3). The objective function can be further calculated as optimizing the JS divergence of Pdata(x) and Pg(x) under the optimal discriminator (26).[/vc_column_text][/vc_column][/vc_row][vc_section css=”.vc_custom_1583688271923{background-color: #fafafa !important;}”][vc_row css=”.vc_custom_1623097149563{margin-top: -25px !important;}”][vc_column][vc_empty_space height=”1px”][vc_column_text]


GAN is an excellent generative model. However, the original GAN model has many problems, such as the vanishing gradient, difficulty in training, and poor diversity (27). Many efforts have been devoted to obtaining better GANs through different optimization methods. Therefore, since 2014, theories and articles related to GAN have emerged in an endless stream, and many new GANs-based models have been proposed to improve the stability and quality of the generated results (28).

A number of review articles have summarized and classified the current GAN-related models (22, 24, 29). Creswell et al. (22) classified the evolution of GAN models from the aspects of architectural development and loss function improvement. Hong et al. (29) summarized the development of GAN models from the aspects of theoretical analysis, supervised, unsupervised, and common problems. Guo et al. (24) focused on the improvement of the model structure, the expansion of the theory, the novel application and so on. We will introduce several common improvements of GAN here.

[/vc_column_text][/vc_column][/vc_row][vc_row][vc_column width=”1/2″][stack_boxed_content shadow=”box-shadow”][vc_column_text]

Conditional Generative Adversarial Networks (CGAN)

CGAN is an improved GAN model proposed by Mirza et al. (30). Unlike the original GAN, CGAN uses a supervised approach increasing controllability of generated results. CGAN takes the random noise z and the category label c as inputs of the generator and the generated sample/real sample and category label as inputs of the discriminator to learn the correlation between labels and images. By introducing a conditional variable y into the modeling and adding conditions to the model with additional information y, the data generation process can be guided.

Deep Convolutional Generative Adversarial Networks (DCGAN)

One year after the first GAN paper was published, researchers found that the GAN model was unstable and required a lot of training skills. In 2015, Radford et al. (31) proposed an upgraded version of the GAN architecture, named DCGAN. The authors of DCGAN improved the architecture of the original GAN with deep convolutional networks (CNNs). So far, DCGAN’s network structure is still widely used and is the hottest GAN architecture and a milestone in the history of GAN. Compared with the original GAN, DCGAN almost completely uses the convolution layer instead of the fully connected layer. The discriminator is almost symmetric with the generator. The entire network does not have pooling layers and up-sampling layers. DCGAN also used Batch Normalization algorithm to solve the problem of vanishing gradient.[/vc_column_text][/stack_boxed_content][/vc_column][vc_column width=”1/2″][stack_boxed_content shadow=”box-shadow”][vc_column_text]


The objective function of the original GAN can be seen as minimizing the JS divergence between two distributions. In fact, there are many ways to measure the distance between two distributions, and JS divergence is just one of them. Defining different distance metrics can result in different objective functions. Nowozin et al. (32) applied f-divergence to GAN (f-GAN) for training generative neural samplers. The f-divergence is a function Df(PQ) that measures the difference between two probability distributions P and Q. Under the framework of f-divergence, f-GAN generalizes various divergences so that the corresponding GAN target can be derived for a specific divergence. Many common divergences (33), such as KL-divergence, Hellinger distance, and total variation distance, are the special cases of f-divergence, coinciding with a particular choice of f . Many improvements in GAN training stability are achieved by using different distance metrics between distributions, such as Energy-based GAN (EBGAN) (34), Least Squares GAN (LSGAN) (35), etc.

Wasserstein Generative Adversarial Networks (WGAN)

WGAN mainly improved GAN from the perspective of the loss function. WGAN theoretically explained the reason for the instability of GAN training, that is, cross entropy (JS divergence) is not suitable for measuring the distance between distributions with disjoint parts. Therefore, WGAN proposed a new distance measurement method, Earth Moving Distance, also known as Wasserstein distance or optimal transmission distance, which refers to the minimum transmission quality that converts the probability distribution q to p (probability density is called probability quality in discrete cases) (2636). The superiority of Wasserstein distance compared to KL divergence and JS divergence is that even if two distributions do not overlap, Wasserstein distance can still reflect their distance. The theoretical derivation and interpretation of WGAN are quite complicated. The authors of WGAN (26) pointed out that the use of Wasserstein distances needs to satisfy a strong continuity condition, i.e. Lipchitz continuity.[/vc_column_text][/stack_boxed_content][/vc_column][/vc_row][/vc_section][vc_row][vc_column][vc_column_text]Summary (10/11/2020)[/vc_column_text][/vc_column][/vc_row]