In this homework, we're going to test empirically some of the statistical ideas we saw in class

Problem 1: Random walk on a 1d lattice

In class we looked at how a "random walk" on a 1d lattice leads to diffusion. Let's show that it's true using data

Problem 1a - generating a trajectory for a random walk

In [1]:
%pylab inline
import numpy as np


#don't make either of these too big, unless you're running on your own machine
number_of_independent_simulations = 10
number_of_steps_to_take = 10000
Populating the interactive namespace from numpy and matplotlib
In [2]:
#this function will plot results given a list of trajectories
def plot_trajectories(trajectory_list):
    for i in range(len(trajectory_list)):
        plot(trajectory_list[i])
    axhline(0,linestyle='--',color='black',lw=2)
    ylabel("Position")
    xlabel("Time")
In [3]:
# in this exercise, we need to take a random number in the range zero to one, and convert it to either a -1 or a 1
def random_to_direction(random_number):
#put a mathematical formula on the right hand sign of this equals sign that
#  makes numbers in the range 0-0.5 into a -1,
#  and numbers in the range 0.5-1.0 into +1
#you may want to use the function np.floor(x), which rounds down a number to the nearest integer
#you can also use an if statement, but it will be slower
    direction = 1-2*np.floor(random_number+0.5)
    return direction


#this list will store the trajectories for independent runs
trajectory_list = []

#first let's run the simulation in the "slow" way
# we need a for loop for independent simulations, and a for loop for making moves
for i in range(number_of_independent_simulations):
    
    #positions of the particle at time i=0 up to max time T = number_of_steps_to_take, 
    # starts as all zeros, but we will fill this in
    position_array = np.zeros(number_of_steps_to_take+1)
    
    for j in range(number_of_steps_to_take):
        # the function np.random.random() generates a number from 0 to 1
        change_in_position = random_to_direction(np.random.random()) 
        position_array[j+1] = position_array[j] + change_in_position
    trajectory_list.append(position_array)
    
plot_trajectories(trajectory_list)
In [4]:
#let's do this is a way that uses the capabilities of numpy to do this calculation much more quickly
#let's also do it as a function that returns the list of trajectories and can change number of independent sims
def run_simulations(number_of_independent_simulations,number_of_steps_to_take):
    trajectory_list = []
    # we need a for loop for independent simulations, and a for loop for making moves
    for i in range(number_of_independent_simulations):

        # generate random numbers with a length number_of_steps_to_take
        random_number_list = np.random.random(size=number_of_steps_to_take)
        # let's generate all the displacements (-1,+1) at once using the same function
        directions = random_to_direction(random_number_list)

        #positions of the particle at time i=1 up to max time T = number_of_steps_to_take,     
        position_array = np.cumsum(directions)

        trajectory_list.append(position_array)
    return trajectory_list

#this list will store the trajectories for independent runs
trajectory_list = run_simulations(5,5000)

plot_trajectories(trajectory_list)

Problem 1b - how many trajectories do you need to see diffusive behavior

Fill in the following code to compute the mean squared displacement as a function of time Then change the number of independent simulations and see if the mean-squared displacement 'converges' to diffusive behavior, i.e. looks like a straight line

In [8]:
number_of_independent_simulations = 200
trajectory_list = run_simulations(number_of_independent_simulations,1000)

#let's turn this into an array with num_independent_simulations as the rows and coordinates as the columns
trajectory_array = np.array(trajectory_list)
# you can see the shape of the array like this. Each one corresponds to a different 'axis' in the array. 
# which one is time? which one is number of independent simulations?
print("array shape:",trajectory_array.shape)

#the axis flag let's you compute a numpy function just over one dimension of the array. 
# fill in the axis on the next line corresponding to an average at a fixed time. value from 0,1,...,N-1
msd = np.var(trajectory_array,axis=0)
#this is a list of numbers from zero to T
times = np.arange(trajectory_array.shape[1])

plot(times,msd,label="mean square distlacement")
plot(times,times,label='linear in time')

#matplotlib supports latex formatting between dollar signs. some slashes need to be replaced by 
# double slash because of python 'escape characters' (e.g. \n means newline without having the double slash)
ylabel("Var(x)=$\\langle (x(t)-\mu)^2 \\rangle$")
xlabel("$t$")
legend()
array shape: (200, 1000)
Out[8]:
<matplotlib.legend.Legend at 0x7f2908543fd0>
In [10]:
# now we will see how variance in the mean final position depends on number of independent simulations
# to estimate the variance, we will have to do more independent simulations on top of the first set
n_repeats_to_get_variance_estimate = 200

#  let's use the np power function to generate a list of number of independent samples from 2 to 256
# power will raise a number a to an exponent p. if you pass it a list of numbers from p=1 to p=pmax, you will
#  get a list 2^1, 2^2, ..., 2^pmax
# put your list of numbers in the parentheses on the next line, hint, use the range or np.arange function
list_of_number_of_independent_samples = np.power(2, np.arange(1,9)      )
print(list_of_number_of_independent_samples)

list_of_mean_position_vs_time = []

for i in range(n_repeats_to_get_variance_estimate):
    avg_position_list = []
    for number_of_independent_simulations in list_of_number_of_independent_samples: 
        trajectory_list = run_simulations(number_of_independent_simulations,1000)
        # use the np.mean function to generate average position vs time, averaged over the independent simulations
        # fill in the right 'axis' here where it is blank
        avg_positions = np.mean(np.array(trajectory_list), axis = 0 )
        avg_position_list.append(avg_positions)
    list_of_mean_position_vs_time.append(np.array(avg_position_list))
    
#use the np.variance function to find the variance over n_repeats in the mean position at every time
# for each number of trials
variance_array = np.var(np.array(list_of_mean_position_vs_time),axis=0)

#plot variance vs number of independent samples at times 10, 30, 100, 300, 1000
figure()
for t in [9,29,99,299,999]:
    plot(list_of_number_of_independent_samples,variance_array[:,t],label='time=%.f'%(t+1))
legend(loc=0)
xscale('log')
yscale('log')
xlabel("$N_{samples}$")
ylabel("Var($\\mu_{N_samples}$)")
[  2   4   8  16  32  64 128 256]
Out[10]:
Text(0,0.5,'Var($\\mu_{N_samples}$)')
In [11]:
# we already showed in class and above that the expected variance is proportional to t, so 
#  if we divide the variance by t, the data should collapse, let's try it, and also see what 1/n_samples looks like

figure()
for t in [9,29,99,299,999]:
    # plot the value of the variance array at time t divided by the time t
    plot(list_of_number_of_independent_samples, variance_array[:,t]/t,label='time=%.f'%(t+1))

plot(list_of_number_of_independent_samples,1/list_of_number_of_independent_samples,label='$1/N_{samples}$',
    linestyle='--',lw=3,color='black')
legend(loc=0)
xscale('log')
yscale('log')
ylabel("Var($\\mu_{N_samples}$)/t")
xlabel("$N_{samples}$")
Out[11]:
Text(0.5,0,'$N_{samples}$')

Problem 1c - central limit theorem (just run this to see what it looks like)

Let's histogram the values at the final time from the previous example to see how the distribution of the mean changes

In [12]:
mean_positions_final_time = np.array(list_of_mean_position_vs_time)[:,:,-1]
for i in range(mean_positions_final_time.shape[1]):
    num_samples = list_of_number_of_independent_samples[i]
    hist,bins = np.histogram(mean_positions_final_time[:,i],bins=10,density=True)
    plot((bins[:-1]+bins[1:])/2.0,hist,label='samples=%i'%num_samples)
legend(loc=0)
Out[12]:
<matplotlib.legend.Legend at 0x7f28f551ca90>
In [ ]: