Preview

Hf-Rnn Supp

Powerful Essays
Open Document
Open Document
2870 Words
Grammar
Grammar
Plagiarism
Plagiarism
Writing
Writing
Score
Score
Hf-Rnn Supp
Learning Recurrent Neural Networks with
Hessian-Free Optimization: Supplementary
Materials
Contents
1 Pseudo-code for the damped Gauss-Newton vector product 2
2 Details of the pathological synthetic problems 3
2.1 The addition, multiplication, and XOR problem . . . . . . . . . . . . 3
2.2 The temporal order problem . . . . . . . . . . . . . . . . . . . . . . 4
2.3 The 3-bit temporal order problem . . . . . . . . . . . . . . . . . . . . 4
2.4 The random permutation problem . . . . . . . . . . . . . . . . . . . 4
2.5 Noiseless memorization . . . . . . . . . . . . . . . . . . . . . . . . . 5
3 Details of the natural problems 5
3.1 The bouncing balls problem . . . . . . . . . . . . . . . . . . . . . . 5
3.2 The MIDI dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.3 The speech dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1
1 Pseudo-code for the damped Gauss-Newton vector product
Algorithm 1 Computation of the matrix-vector product of the structurally-damped
Gauss-Newton matrix with the vector v, for the case when e is the tanh non-linearity, g the logistic sigmoid, D and L are the corresponding matching loss functions. The notation reflects the “convex approximation” interpretation of the GN matrix so that we are applying the R operator to the forwards-backwards pass through the linearized and structurally damped objective ~k, and the desired matrix-vector product is given by
Rd~k
d . All derivatives are implicitly evaluated at  = n. The previously defined parameter symbols Wph, Whx, Whh, bh, bp binit h will correspond to the parameter vector n if they have no super-script and to the input parameter vector v if they have the ‘v’ superscript.
The Rz notation follows Pearlmutter [1994], and for the purposes of reading the pseudo-code can be interpreted as merely defining a new symbol. We assume that intermediate quantities of the network (e.g. hi) have already been computed (from n).
The operator



References: J.S. Garofolo, L.F. Lamel, W.M. Fisher, J.G. Fiscus, D.S. Pallett, and N.L. Dahlgren. Darpa Timit: Acoustic-phonetic Continuous Speech Corps CD-ROM. US Dept. of Commerce, National Institute of Standards and Technology, 1993. S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Computation, 1997. 7 J. Martens. Deep learning via Hessian-free optimization. In Proceedings of the 27th International Conference on Machine Learning (ICML), 2010. B.A. Pearlmutter. Fast exact multiplication by the Hessian. Neural Computation, 1994. 8

You May Also Find These Documents Helpful

  • Satisfactory Essays

    Pt1420 Unit 3 Assignment

    • 298 Words
    • 2 Pages

    3. Consider the following system of equations with unknowns x1 , x2 and x3 :…

    • 298 Words
    • 2 Pages
    Satisfactory Essays
  • Good Essays

    1. You send a message to Buzz Aldrin on the moon, 384,000 km and he sends you an immediate reply. Both messages travel at the speed of light. How long do you have to wait between sending your message and receiving his? Over 2 sec.  (time= dist/speed= 384,000km/ 3x10^5)=1.28 Sec. (then X2 for there and back)…

    • 3861 Words
    • 16 Pages
    Good Essays
  • Good Essays

    Part 1. Which ball goes higher in the air, the ball that is hit or the ball…

    • 381 Words
    • 2 Pages
    Good Essays
  • Satisfactory Essays

    econ 513 final exam

    • 2264 Words
    • 10 Pages

    (e) Write your answer to each problem in the space given below the problem. You may freely use back…

    • 2264 Words
    • 10 Pages
    Satisfactory Essays
  • Powerful Essays

    Math/116 Syllabus

    • 2856 Words
    • 12 Pages

    This course introduces basic algebra concepts and assists in building skills for performing specific mathematical operations and problem solving. Students will solve equations, evaluate algebraic expressions, solve and graph linear equations and linear inequalities, graph lines, and solve systems of linear equations and linear inequalities. These concepts and skills will serve as a foundation for subsequent business coursework. Applications to real-world problems are also explored throughout the course. This course is the first half of the college algebra sequence, which is completed in MAT 117, Algebra 1B.…

    • 2856 Words
    • 12 Pages
    Powerful Essays
  • Satisfactory Essays

    Optimization Exam Paper

    • 1236 Words
    • 5 Pages

    Q1 (a) You are given that the formula for the total differential at the point x0 of a function f of n variables x1 , . . . , xn is…

    • 1236 Words
    • 5 Pages
    Satisfactory Essays
  • Satisfactory Essays

    Short Term Memory Project

    • 279 Words
    • 2 Pages

    This project involves discovering how to create the best setting to improve the memory of humans. The goal of this experiment is discovering which music, if any, will create the best environment for retaining information. The types of music we resolved to use were: country, rock, rap, classical, choir, forest sounds, trap, Dubstep, pop, and heavy metal. The first segment of the experiment was purposed to test the application of short term memory, we played the different types of music in the background as we displayed a long, irrational number on the board for our test subjects to memorize. Afterwards, they would write down as many numbers as they could recollect. Subsequently, we reviewed their papers and tallied how many numbers they memorized…

    • 279 Words
    • 2 Pages
    Satisfactory Essays
  • Good Essays

    my best

    • 783 Words
    • 4 Pages

    4) Once you know HOW your program should solve the problem, and you know WHAT variables loops statements etc are needed for that program to work, it should be easy to write down your finished answer.…

    • 783 Words
    • 4 Pages
    Good Essays
  • Satisfactory Essays

    In conclusion, the vectors appeared to be exactly the same however; the third vector is the same length/Newtons as the…

    • 391 Words
    • 2 Pages
    Satisfactory Essays
  • Powerful Essays

    This sequence will be continue over and over. Route A is always green. The time delay for green light, is 9seconds. For Red is 5seconds. The simulation are successful according with the program coding.…

    • 1252 Words
    • 6 Pages
    Powerful Essays
  • Satisfactory Essays

    Taylor's Theorem

    • 382 Words
    • 2 Pages

    The easiest number to choose for a is probably 1, though you can choose whatever number you want to for a , so long as its n derivatives are all defined at a.…

    • 382 Words
    • 2 Pages
    Satisfactory Essays
  • Satisfactory Essays

    viral

    • 607 Words
    • 3 Pages

    0• In this we find the time to live (ttl) of a message in both descending and ascending.…

    • 607 Words
    • 3 Pages
    Satisfactory Essays
  • Satisfactory Essays

    * 1. Lesson 2 Basic Steps 2/4 Time Signature 3/4 Time Signature Chapter I Introduction Lesson Objectives Exercise 1.2…

    • 1430 Words
    • 6 Pages
    Satisfactory Essays
  • Satisfactory Essays

    Online Taxi Booking System

    • 2197 Words
    • 9 Pages

    The taxi booking system explains how taxi and vehicle booking works well in customers end and company taxi maintaining end (Cecil, 1980)…

    • 2197 Words
    • 9 Pages
    Satisfactory Essays
  • Good Essays

    What Is Midi ?

    • 522 Words
    • 3 Pages

    In what ways can MIDI be used effectively in Multimedia Applications, as opposed to strictly musical applications ?…

    • 522 Words
    • 3 Pages
    Good Essays