blockchain, programming, tech

Upgrading Ethereum Solidity Smart Contracts

A pretty good blog on examples “Upgrading Ethereum Solidity Smart Contracts”, by Francis Odisi

Original post: https://levelup.gitconnected.com/introduction-to-ethereum-smart-contract-upgradability-with-solidity-789cc497c56f

When developing software, we frequently need to release new versions to add new functionality or bug fixes. There’s no difference when it comes to smart contract development. Although, updating a smart contract to a new version is usually not as simple as updating other types of software of the same complexity.

Most Blockchains, especially public ones like Ethereum, implement the intrinsic concept of immutability, which in theory, does not allow anyone to change the blockchain’s “past”. The immutability is applied to all transactions in the blockchain, including transactions used to deploy smart contracts and the associated code. In other words, once the smart contract’s code is deployed to the blockchain, it will “live” forever “AS IS” — no one can change it. If a bug is found or a new functionality needs to be added, we cannot replace the code of a deployed contract.

If a smart contract is immutable, how are you able to upgrade it to newer versions? The answer lies in deploying a new smart contract to the blockchain.

But this approach raises a couple of challenges that need to be addressed. The most basic and common ones are:

  • All users that use the smart contract need to reference the address of the new contract’s version
  • The first contract’s version should be disabled, enforcing every user to use the new version
  • Usually, you need to make sure the data (state) from the old version is migrated or somehow available to the new version. In the most simple scenario, this means you need to copy/migrate the state from the old version to the new contract’s version

The sections below describe these challenges in more detail. To better illustrate it, we’ll use the two versions below of MySmartContract as a reference:

Version 1

contract MySmartContract {
uint32 public counter; constructor() public {
counter = 0;
} function incrementCounter() public {
counter += 2; // This "bug" is intentional.
}
}

Version 2

contract MySmartContract {
uint32 public counter; constructor(uint32 _counter) public {
counter = _counter;
} function incrementCounter() public {
counter++;
}
}

Users to Reference the New Contract’s Address

When deployed to the blockchain, every instance of the smart contract is assigned to a unique address. This address is used to reference the smart contract’s instance in order to invoke its methods and read/write data from/to the contract’s storage (state).

When you deploy an updated version of the contract to the blockchain, the new instance of the contract will be deployed at a new address. This new address is different from the first contract’s address. This means that all users, other smart contracts and/or dApps (decentralized applications) that interact with the smart contract will need to be updated so they use the address of the updated version. Spoiler: there are some options to avoid this issue, that you’ll see at the end of this section.

So, let’s consider the following scenario:

You created MySmartContract using the code of Version 1 above. It is deployed to the blockchain at address A1 (this is not a real Ethereum address – used only for illustration purposes). All users that want to interact with Version 1 need to use the address A1 to reference it.

Now, after a while, we noticed the bug in the method incrementCounter: it is incrementing the counter by 2, instead of incrementing it by 1. A fix is implemented, resulting in Version 2 of MySmartContract. This new contract’s version is deployed to the blockchain at address D5. At this point, if a user wants to interact with Version 2, it needs to use the address D5, not A1. This is the reason why all users that are interacting with MySmartContract will need to update so they refer to the new address D5.

You probably agree that forcing users to update is not the best approach, considering that updating a smart contract’s version should be as transparent as possible to users using it.

There are different strategies that can be used to address this problem. Some design patterns like Registry, different types of Proxies can be used to make it easier to upgrade and provide transparency to users. Another great option is to use the Ethereum Name Service and register a user-friend name that resolves to your contract’s address. With this option, users of the contract don’t need to know the contract’s address, only its user-friendly name. As a result, upgrading to a new address would be transparent to your contract’s users. The specific strategy adopted depends on the scenario in which the smart contract will be used.

Disabling Old Versions of the Contract

We learned in the section above that all users would need an update to use Version 2‘s address (D5) or our contract should implement some mechanism to make this process transparent to users. Despite that, if you’re the owner of the contract, you probably want to enforce that all users use only the most up to date version D5. If a user inadvertently or not uses A1, you want to guarantee that Version 1 is deprecated and unavailable for usage.

In such scenarios, you could implement a technique to stop MySmartContract‘s Version 1. This technique is implemented by a Design Pattern named Circuit Breaker. It’s also commonly referred to as Pausable Contracts or Emergency Stop.

A Circuit Breaker, in general terms, stops a smart contract functionalities. Additionally, it can enable specific functionalities that will be available only when the contract is stopped. This pattern commonly implements some sort of access restriction, so only allowed actors (like an admin or owner) have the required permission to trigger the Circuit Breaker and stop the contract.

Some scenarios where this pattern can be used are:

  • Stopping a contract’s functionalities when a bug is found
  • Stop some contract’s functionalities after a certain state is reached (frequently used together with a State Machine pattern)
  • Stop the contract’s functionalities during upgrades processes, so external actors cannot change the contract’s state during the upgrade;
  • Stop a deprecated version of a contract after a new version is deployed

Now let’s see how you could implement a Circuit Breaker to stop MySmartContract‘s incrementCounter function, so counter wouldn’t change during the migration process. This modification would need to be in place in Version 1, when it was first deployed.

// Version 1 implementing a Circuit Breaker with access restriction to owner
contract MySmartContract {
uint32 public counter;
bool private stopped = false;
address private owner; /**
@dev Checks if the contract is not stopped; reverts if it is.
*/
modifier isNotStopped {
require(!stopped, 'Contract is stopped.');
_;
} /**
@dev Enforces the caller to be the contract's owner.
*/
modifier isOwner {
require(msg.sender == owner, 'Sender is not owner.');
_;
} constructor() public {
counter = 0;
// Sets the contract's owner as the address that deployed the contract.
owner = msg.sender;
} /**
@notice Increments the contract's counter if contract is active.
@dev It will revert if the contract is stopped. See modifier "isNotStopped"
*/
function incrementCounter() isNotStopped public {
counter += 2; // This is an intentional bug.
} /**
@dev Stops / Unstops the contract.
*/
function toggleContractStopped() isOwner public {
stopped = !stopped;
}
}

In the code above you can see that Version 1 of MySmartContract now implements a modifier isNotStopped. This modifier will revert the transaction if the contract is stopped. The function incrementCounter was changed to use the modifier isNotStopped, so it will only execute when the contract is NOT stopped.

With this implementation, right before the migration starts, the owner of the contract can invoke the function toggleContractStopped and stop the contract. Note that this function uses the modifier isOwner to restrict access to the contract’s owner.

To learn more about Circuit Breakers, make sure you check Consensys’ post about Circuit Breakers and OpenZeppelin’s reference implementation of Pausable contracts.

Contract’s Data (State) Migration

Most smart contracts need to keep some sort of state in its internal storage. The number of state variables required by each contract varies greatly depending on the use case. In our example, the original MySmartContract‘s Version 1 has a single state variable counter.

Now consider that Version 1 of MySmartContract has been in use for a while. By the time you find the bug in incrementCounter function, the value of counter is already at 100. This scenario would raise some questions:

  • What will you do with the state of MySmartContract Version 2?
  • Can you reset the counter to 0 (zero) in Version 2 or should you migrate the state from Version 1 to make sure counter is initialized with 100 in Version 2?

The answers to these questions will depend on the use case. In the example of this article, which is a really simple scenario and counter has no important usage, you wouldn’t have any issues if counter is reset to 0. But, this is not the desired approach in most cases.

Let’s suppose you cannot reset the value to 0 and need to set counter to 100 in Version 2. In a simple contract as MySmartContract this wouldn’t be difficult. You could change the constructor of Version 2 to receive the initial value of counter as a parameter. At deployment, you would pass the value 100 to the constructor, and this would solve your problem.

After implementing this approach, the constructor of MySmartContract Version 2 would look like this:

constructor(uint32 _counter) public {
counter = _counter;
}

If your use case is as simple as presented above (or similar), this is probably the way to go from a data migration perspective. The complexity of implementing other approaches wouldn’t be worth it. But, bear in mind that most production-ready smart contracts are not as simple as MySmartContract and frequently have a more complex state.

Now consider a contract that uses multiple structs, mappings, and arrays. If you need to copy data between contract versions with such complex storage, you would probably face one or more the challenges below:

  • A bunch of transactions to be processed on the blockchain, which may take a considerable amount of time, depending on the data set
  • Additional code to handle reading data from “Version 1” and writing it to “Version 2” (unless done manually)
  • Spend real money to pay for gas. Remember that you need to pay gas to process transactions in the blockchain. According to the Ethereum Yellow Paper — Appendix G. Fee Schedule, the SSTORE operation, upcode used to write data to Ethereum, costs 20000 gas units “when the storage value is set to non-zero from zero” and 5000 gas units “when storage value’s zeroness remains unchanged”.
  • Freeze Version 1‘s state by using some mechanism (like a Circuit Breaker) to make sure no more data is appended to Version 1 during the migration.
  • Implement access restriction mechanisms to avoid external parties (not related to the migration) from invoking functions of Version 2 during the migration. This would be required to make sure Version 1‘s data could be copied/migrated to Version 2 without being compromised and/or corrupted in Version 2;

In contracts with more complex state, the work required to perform an upgrade is quite significant and can incur considerable gas costs to copy data over the blockchain. Using Libraries and Proxies can help you develop smart contracts that are easier to upgrade. With this approach, the data would be kept in a contract that stores the state but bears no logic (state contract). The second contract or library implements the logic, but bears no state (logic contract). So when a bug is found in the logic, you only need to upgrade the logic contract, without worrying about migrating the state stored in the state contract (see Note below).

Note: This approach generally uses Delegatecall. The state contract invokes the functions in the logic contract using delegatecall. The logic contract then executes its logic in the context of state contract, which means that “storage, current address and balance still refer to the calling contract, only the code is taken from the called address.” (from Solidity docs referenced above).

Making MySmartContract Easier to Upgrade

Below you can see how Version 1 and Version 2 would look like if we implement the changes described here in this article. It’s important to mention again that the strategies used for MySmartContract are acceptable considering its simplicity: state variables and logic.

First, let’s see Version 1 changes:

Version 1 — Without Upgradable Mechanisms

contract MySmartContract {
uint32 public counter; constructor() public {
counter = 0;
} function incrementCounter() public {
counter += 2; // This "bug" is intentional.
}
}

In the code below, Version 1 implements a Circuit Breaker with an Access Restriction mechanism that allows the owner to stop the contract once it is deprecated.

Version 1 — With Deprecation Mechanism

contract MySmartContract {
uint32 public counter;
bool private stopped = false;
address private owner; /**
@dev Checks if the contract is not stopped; reverts if it is.
*/
modifier isNotStopped {
require(!stopped, 'Contract is stopped.');
_;
} /**
@dev Enforces the caller to be the contract's owner.
*/
modifier isOwner {
require(msg.sender == owner, 'Sender is not owner.');
_;
} constructor() public {
counter = 0;
// Sets the contract's owner as the address that deployed the contract.
owner = msg.sender;
} /**
@notice Increments the contract's counter if contract is active.
@dev It will revert is the contract is stopped. See modifier "isNotStopped"
*/
function incrementCounter() isNotStopped public {
counter += 2; // This is an intentional bug.
} /**
@dev Stops / Unstops the contract.
*/
function toggleContractStopped() isOwner public {
stopped = !stopped;
}
}

Now let’s see how Version 2 would look like: Version 2 – Without Upgradable Mechanisms

contract MySmartContract {
uint32 public counter; constructor(uint32 _counter) public {
counter = _counter;
} function incrementCounter() public {
counter++;
}
}

In the code below Version 2 implements the same Circuit Breaker and Access Restriction mechanisms as Version 1. In addition, it implements a constructor that allows setting the initial value of counter during deployment. This mechanism can be used, which can be used during an upgrade to copy data from an old version.

Version 2 — With Simple Upgradable Mechanism

contract MySmartContract {
uint32 public counter;
bool private stopped = false;
address private owner; /**
@dev Checks if the contract is not stopped; reverts if it is.
*/
modifier isNotStopped {
require(!stopped, 'Contract is stopped.');
_;
} /**
@dev Enforces the caller to be the contract's owner.
*/
modifier isOwner {
require(msg.sender == owner, 'Sender is not owner.');
_;
} constructor(uint32 _counter) public {
counter = _counter; // Allows setting counter's initial value on deployment.
// Sets the contract's owner as the address that deployed the contract.
owner = msg.sender;
} /**
@notice Increments the contract's counter if contract is active.
@dev It will revert is the contract is stopped. See modifier "isNotStopped"
*/
function incrementCounter() isNotStopped public {
counter++; // Fixes bug introduced in version 1.
} /**
@dev Stops / Unstops the contract.
*/
function toggleContractStopped() isOwner public {
stopped = !stopped;
}
}

Although the changes above implement some mechanisms that help upgrading smart contracts, the first challenge described in the beginning of this article, Users to Reference the New Contract’s Address, is not solved with these simple techniques. More advanced patterns like Proxies and Registry, or using the ENS to register a user-friendly name to your contract, would be required to avoid all users from upgrading to reference the new address of Version 2.

Conclusion

The principle of upgradable smart contracts is described in the Ethereum white paper’s DAO section that reads:

“*Although code is theoretically immutable, one can easily get around this and have de-facto mutability by having chunks of the code in separate contracts, and having the address of which contracts to call stored in the modifiable storage. *”

Although it is achievable, upgrading smart contracts can be quite challenging. The immutability of the blockchain adds more complexity to smart contract’s upgrades because it forces you to carefully analyze the scenario in which the smart contract will be used, understand the available mechanisms, and then decide which mechanisms are a good fit to your contract, so a potential and probable upgrade will be smooth.

Smart Contract upgradability is an active area of research. Related patterns, mechanisms and best practices are still under continuous discussion and development. Using Libraries and some Design Patterns like Circuit Breaker, Access Restriction, Proxies and Registry can help you to tackle some of the challenges. However, in more complex scenarios, these mechanisms alone may not be able to address all the issues, and you may need to consider more complex patterns like Eternal Storage, not mentioned in this article.

You can check the full source code, including related unit tests (not mentioned in this article for simplicity reasons) in this github repository.

Standard
life is fun, programming

Longest Palindrom Subsequence via Dynamic Programming


/**
* @author Victor Fang, 20140728
* finding longest palindrom subsequence.
* there can be extra letters within.
* example: BBABCBCAB has 7 char palindrom: babcbcb
*
* Dynamic Programming solution via Recursion:
* input X[]
* interim: L[i,j] = length of longest palindrom between i,j
* Base 1:
* L[i,i] = 1;
*
* Base 2:
* L[i, i+1] = 2;
*
* Recursion:
* if( X[i] == X[j] ) { L[i,j] = 2 + L[i+1, j-1] }
*
* if( X[i] != X[j] ) { L[i,j] = max(L[i+1, j], L[i, j-1]) }
*
*/
public class PalindromDP {
public static int max(int a, int b){ return (a>b)?a:b; }

public static int lps(char x[], int i, int j){

// base 1: lps[]
if(i == j) return 1;
if((j == i+1) && (x[i] == x[j])) return 2;

if( x[i] == x[j]) {
return( 2 + lps(x, i+1, j-1) ) ;
}
if( x[i] != x[j]) {
return( max(lps(x, i+1, j) , lps(x, i, j-1)) );
}
return 0;

}

public static void main(String[] args) {

String x = “abigailfang”;
//String x = “BBABCBCAB”;
System.out.println(x);
System.out.println(lps(x.toCharArray(), 0, x.length()-1));
}

}

 

Standard
programming, research resources

Simulation Solution to the Passenger Seat Assignment Probability Problem

Today I solved a pretty interesting probability problem, of course, as what hardcore data scientist usually does, using a computer. 🙂

The problem statement embedded in the code below, and the convergence rate as well as the distribution of the simulations are posted at the end.

% Victor Fang, 20140705, www.VictorFang.com
% Simulation Solution to the passenger seat probability problem in Matlab.
% Problem:
% 100 passengers have queued up to board a plane, and are lined up in the
% order of the seats on the plane (n=1..100). However, the first person
% lost his ticket and selects a random seat. The remaining passengers will
% occupy their assigned seat if it is available, or a random seat
% otherwise.
% Question : What is the probability that passenger 100 sits in seat 100?

clear all;
clc

numP = 100;
numItr = 1000;
numMeta = 100;

probList = zeros(1, numMeta);
countList = zeros(1, numMeta*numItr);

%% simulation
pc = 0;
for meta = 1:numMeta
 count = 0;

 % each simulation, one independent experiment.
 for itr = 1:numItr
 seat = zeros(1,numP);

 % 1st passenger randomly picks a seat
 r = randi(numP, 1,1);
 seat(r) = 1;

 % simulate the rest
 for j = 2:numP

 % if his seat is empty
 if(seat(j) == 0)
 seat(j) = j;
 else
 % if its seat occupied, randomly picks an empty one.
 idx = find(seat == 0);
 p = randi(length(idx), 1,1);
 r = idx(p);
 seat(r) = j;
 end
 end

 pc = pc + 1;
 if(seat(numP) == numP)
 count = count +1;
 countList(pc) = 1;
 end

 end
 prob = count/numItr;
 probList(meta) = prob;

end
%% analysis
m = mean(probList)
s = std(probList)

figure(1)
hist(probList);
xlim([0,1])
grid on;
title(['Distribution of probability, \mu=', num2str(m) , ...
 ' ; \sigma=' , num2str(s) , ' ; #sim=', num2str(numItr*numMeta)])

figure(2)
tmp = cumsum(countList)./(1:length(countList));
plot(tmp,'LineWidth',5 )
ylim([0,1])
grid on;
title(['Convergence rate of probability of assigned seat'])
xlabel('# Simulations')
ylabel('Probability')

%% results
% numItr = 1000; numMeta = 100:
% m =0.5010
% s =0.0177

dist_seat
convergence_seat

Standard
programming

column-oriented DBMS

column-oriented DBMS is a database management system (DBMS) that stores data tables as sections of columns of data rather than as rows of data, like most relational DBMSs. This has advantages for data warehousescustomer relationship management (CRM) systems, and library card catalogs, and other ad-hoc inquiry systems[1] where aggregates are computed over large numbers of similar data items.

 

Benefits

Comparisons between row-oriented and column-oriented systems are typically concerned with the efficiency of hard-disk access for a given workload, as seek time is incredibly long compared to the other delays in computers. Sometimes, reading a megabyte of sequentially stored data takes no more time than one random access.[3] Further, because seek time is improving much slower than CPU power (see Moore’s Law), this focus will likely continue on systems that rely on hard-disks for storage. Following is a set of over-simplified observations which attempt to paint a picture of the trade-offs between column- and row-oriented organizations. Unless, of course, the application can be reasonably assured to fit most/all data into memory, in which case huge optimizations are available from in-memory database systems.

  1. Column-oriented systems are more efficient when an aggregate needs to be computed over many rows but only for a notably smaller subset of all columns of data, because reading that smaller subset of data can be faster than reading all data.
  2. Column-oriented systems are more efficient when new values of a column are supplied for all rows at once, because that column data can be written efficiently and replace old column data without touching any other columns for the rows.
  3. Row-oriented systems are more efficient when many columns of a single row are required at the same time, and when row-size is relatively small, as the entire row can be retrieved with a single disk seek.
  4. Row-oriented systems are more efficient when writing a new row if all of the column data is supplied at the same time, as the entire row can be written with a single disk seek.

In practice, row-oriented architectures are well-suited for OLTP-like workloads which are more heavily loaded with interactive transactions. Column stores are well-suited for OLAP-like workloads (e.g., data warehouses) which typically involve a smaller number of highly complex queries over all data (possiblyterabytes). However, there are a number of proven row-based OLAP RDBMS that handles terabytes, or even petabytes of data, such as Teradata.

Standard
programming, research resources

Understanding Maximum Likelihood Estimation MLE

MLE: Given a set of random samples; and a model with unknown parameters,  find the most likely parameters that can fit the data best.

离散分布,离散有限参数空间

考虑一个抛硬币的例子。假设这个硬币正面跟反面轻重不同。我们把这个硬币抛80次(即,我们获取一个采样x_1=\mbox{H}, x_2=\mbox{T}, \ldots, x_{80}=\mbox{T}并把正面的次数记下来,正面记为H,反面记为T)。并把抛出一个正面的概率记为p,抛出一个反面的概率记为1-p(因此,这裡的p即相当于上边的\theta)。假设我们抛出了49个正面,31个反面,即49次H,31次T。假设这个硬币是我们从一个装了三个硬币的盒子里头取出的。这三个硬币抛出正面的概率分别为p=1/3p=1/2p=2/3.这些硬币没有标记,所以我们无法知道哪个是哪个。使用最大似然估计,通过这些试验数据(即采样数据),我们可以计算出哪个硬币的可能性最大。这个似然函数取以下三个值中的一个:

\begin{matrix}<br /><br /><br />
\mathbb{P}(\mbox{H=49, T=31 }\mid p=1/3) & = & \binom{80}{49}(1/3)^{49}(1-1/3)^{31} \approx 0.000 \\<br /><br /><br />
&&\\<br /><br /><br />
\mathbb{P}(\mbox{H=49, T=31 }\mid p=1/2) & = & \binom{80}{49}(1/2)^{49}(1-1/2)^{31} \approx 0.012 \\<br /><br /><br />
&&\\<br /><br /><br />
\mathbb{P}(\mbox{H=49, T=31 }\mid p=2/3) & = & \binom{80}{49}(2/3)^{49}(1-2/3)^{31} \approx 0.054 \\<br /><br /><br />
\end{matrix}

我们可以看到当\widehat{p}=2/3时,似然函数取得最大值。这就是p的最大似然估计。

离散分布,连续参数空间

现在假设例子1中的盒子中有无数个硬币,对于0\leq p \leq 1中的任何一个p, 都有一个抛出正面概率为p的硬币对应,我们来求其似然函数的最大值:

\begin{matrix}<br /><br /><br />
\mbox{lik}(\theta) & = & f_D(\mbox{H=49,T=80-49}\mid p) = \binom{80}{49} p^{49}(1-p)^{31} \\<br /><br /><br />
\end{matrix}

其中0\leq p \leq 1. 我们可以使用微分法来求最值。方程两边同时对p微分,并使其为零。

\begin{matrix}<br /><br /><br />
0 & = & \frac{d}{dp} \left( \binom{80}{49} p^{49}(1-p)^{31} \right) \\<br /><br /><br />
  &   & \\<br /><br /><br />
  & \propto & 49p^{48}(1-p)^{31} - 31p^{49}(1-p)^{30} \\<br /><br /><br />
  &   & \\<br /><br /><br />
  & = & p^{48}(1-p)^{30}\left[ 49(1-p) - 31p \right] \\<br /><br /><br />
\end{matrix}

在不同比例参数值下一个二项式过程的可能性曲线t = 3, n = 10;其最大似然估计值发生在其众数并在曲线的最大值处。

其解为p=0p=1,以及p=49/80.使可能性最大的解显然是p=49/80(因为p=0p=1这两个解会使可能性为零)。因此我们说最大似然估计值\widehat{p}=49/80.

这个结果很容易一般化。只需要用一个字母t代替49用以表达伯努利试验中的被观察数据(即样本)的“成功”次数,用另一个字母n代表伯努利试验的次数即可。使用完全同样的方法即可以得到最大似然估计值:

\widehat{p}=\frac{t}{n}

对于任何成功次数为t,试验总数为n的伯努利试验。

连续分布,连续参数空间

最常见的连续概率分布正态分布,其概率密度函数如下:

f(x\mid \mu,\sigma^2) = \frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{(x-\mu)^2}{2\sigma^2}}

现在有n个正态随机变量的采样点,要求的是一个这样的正态分布,这些采样点分布到这个正态分布可能性最大(也就是概率密度积最大,每个点更靠近中心点),其n个正态随机变量的采样的对应密度函数(假设其独立并服从同一分布)为:

f(x_1,\ldots,x_n \mid \mu,\sigma^2) = \left( \frac{1}{2\pi\sigma^2} \right)^\frac{n}{2} e^{-\frac{ \sum_{i=1}^{n}(x_i-\mu)^2}{2\sigma^2}}

或:

f(x_1,\ldots,x_n \mid \mu,\sigma^2) = \left( \frac{1}{2\pi\sigma^2} \right)^{n/2} \exp\left(-\frac{ \sum_{i=1}^{n}(x_i-\bar{x})^2+n(\bar{x}-\mu)^2}{2\sigma^2}\right),

这个分布有两个参数:\mu,\sigma^2.有人可能会担心两个参数与上边的讨论的例子不同,上边的例子都只是在一个参数上对可能性进行最大化。实际上,在两个参数上的求最大值的方法也差不多:只需要分别把可能性\mbox{lik}(\mu,\sigma) = f(x_1,,\ldots,x_n \mid \mu, \sigma^2)在两个参数上最大化即可。当然这比一个参数麻烦一些,但是一点也不复杂。使用上边例子同样的符号,我们有\theta=(\mu,\sigma^2).

最大化一个似然函数同最大化它的自然对数是等价的。因为自然对数log是一个连续且在似然函数的值域严格递增的上凸函数。[注意:可能性函数(似然函数)的自然对数跟信息熵以及Fisher信息联系紧密。]求对数通常能够一定程度上简化运算,比如在这个例子中可以看到:

\begin{matrix}<br /><br />
0 & = & \frac{\partial}{\partial \mu} \log \left( \left( \frac{1}{2\pi\sigma^2} \right)^\frac{n}{2} e^{-\frac{ \sum_{i=1}^{n}(x_i-\bar{x})^2+n(\bar{x}-\mu)^2}{2\sigma^2}} \right) \\<br /><br />
  & = & \frac{\partial}{\partial \mu} \left( \log\left( \frac{1}{2\pi\sigma^2} \right)^\frac{n}{2} - \frac{ \sum_{i=1}^{n}(x_i-\bar{x})^2+n(\bar{x}-\mu)^2}{2\sigma^2} \right) \\<br /><br />
  & = & 0 - \frac{-2n(\bar{x}-\mu)}{2\sigma^2} \\<br /><br />
\end{matrix}

这个方程的解是\widehat{\mu} = \bar{x} = \sum^{n}_{i=1}x_i/n .这的确是这个函数的最大值,因为它是\mu里头惟一的一阶导数等于零的点并且二阶导数严格小于零。

同理,我们对\sigma求导,并使其为零。

\begin{matrix}<br /><br />
0 & = & \frac{\partial}{\partial \sigma} \log \left( \left( \frac{1}{2\pi\sigma^2} \right)^\frac{n}{2} e^{-\frac{ \sum_{i=1}^{n}(x_i-\bar{x})^2+n(\bar{x}-\mu)^2}{2\sigma^2}} \right) \\<br /><br />
  & = & \frac{\partial}{\partial \sigma} \left( \frac{n}{2}\log\left( \frac{1}{2\pi\sigma^2} \right) - \frac{ \sum_{i=1}^{n}(x_i-\bar{x})^2+n(\bar{x}-\mu)^2}{2\sigma^2} \right) \\<br /><br />
  & = & -\frac{n}{\sigma} + \frac{ \sum_{i=1}^{n}(x_i-\bar{x})^2+n(\bar{x}-\mu)^2}{\sigma^3}<br /><br />
\\<br /><br />
\end{matrix}

这个方程的解是\widehat{\sigma}^2 = \sum_{i=1}^n(x_i-\widehat{\mu})^2/n.

因此,其关于\theta=(\mu,\sigma^2)最大似然估计为:

\widehat{\theta}=(\widehat{\mu},\widehat{\sigma}^2) = (\bar{x},\sum_{i=1}^n(x_i-\bar{x})^2/n).
Standard