Skip to main content

Time Series & Deep Learning (Part 3 of N): Finalizing the Data Preparation for Training and Evaluation of a LSTM

In the 3rd part of this series I am going to complete the description started in part 2 of the data preparation process for training and evaluation purposes of a LSTM model in time series forecasting. The data set used is the same as for part 1 and part 2. Same as for all of the post of this series, I am referring to Python 3.
In part 2 we have learned how to transform the time series into a supervised model. That isn't enough yet to feed our LSTM model: other two transformations are needed.
The first thing to do is to transform the time series data in order to make it stationary. The input time series used for these posts presents values that are dependent on time. Looking at its plotting in part 1, we can notice an increasing trend up to January 2006, then a decreasing trend up to February 2010 and finally a second increasing trend from there to date. Transforming this data to stationary makes life easier: we can remove the trends from the observed values before training and finally add them back to the forecasts in order to return the prediction to the original scale. Trends can be removed by differencing the data: we subtract the value at time t-1 from the current value (at time t). The pandas DataFrame provides a function, diff(), for this purpose. We need to implement two functions, one which returns the difference time series:

def difference(timeseries, interval=1):
    diff_list = list()
    for idx in range(interval, len(timeseries)):
        diff_value = timeseries[idx] - timeseries[idx - interval]
        diff_list.append(diff_value)
    return Series(diff_list)


and another one to invert the process before making forecasts:

def invert_difference(historical, inverted_scale_ts, interval=1):
    return inverted_scale_ts + historical[-interval]


The last thing to do is to transform the input time series observations to have a specific scale. The neural network model we are going to use is a LSTM. The default activation function for LSTMs is the hyperbolic tangent, which output values are in the range between -1 and 1, so the best range for the time series used for this example is in the same range. The min and max scaling coefficients need to be calculated on the training data set and then used to scale the test data set. The scikit-learn package comes with a specific class for this, MinMaxScaler. We need to implement two functions, one to calculate the scaling coefficients:

from sklearn.preprocessing import MinMaxScaler

def scale_data_set(train_data, test_data):
    min_max_scaler = MinMaxScaler(feature_range=(-1, 1))
    min_max_scaler = min_max_scaler.fit(train_data)
    train_data = train_data.values.reshape(train_data.shape[0], train_data.shape[1])
    scaled_train_data = min_max_scaler.transform(train_data)
    test_data = test_data.values.reshape(test_data.shape[0], test_data.shape[1])
    scaled_test_data = min_max_scaler.transform(test_data)
    return min_max_scaler, scaled_train_data, 
scaled_test_data

and a second one to invert the scaling for the forecasted values:

def invert_scale(min_max_scaler, scaled_array, value):
    row = [elem for elem in scaled_array] + [value]
    array = numpy.array(row)
    array = array.reshape(1, len(array))
    inverted_array = min_max_scaler.inverse_transform(array)
    return inverted_array[0, -1]


Now we can put all together. We first make the input data stationary:

raw_values = series.values
diff_values = difference(raw_values, 1)


then transform the data to make the problem like a supervised learning case:

supervised = tsToSupervised(diff_values, 1)

and finally split the data for training (years from 1992 to 2010) and test (years from 2011 to date):

train, test = supervised[0:-98], supervised[-98:]

and transform the scale of the training data:

scaler, train_scaled, test_scaled = scale(train, test)

At this stage we can now build and train our LSTM. But this will be the core topic of the next post of this series.
The complete example would be released as a Jupyter notebook at the end of the first part of this series.

Comments

  1. I was scrolling the internet like every day, there I found this article which is related to my interest. The way you covered the knowledge about the subject and the 4 BHK Duplex in hoshangabad Road was worth to read, it undoubtedly cleared my vision and thoughts towards B 5 BHK Duplex in hoshangabad Road. Your writing skills and the way you portrayed the examples are very impressive. The knowledge about 5 BHK Duplex in hoshangabad Road is well covered. Thank you for putting this highly informative article on the internet which is clearing the vision about top builders in Bhopal and who are making an impact in the real estate sector by building such amazing townships.


    ReplyDelete

Post a Comment

Popular posts from this blog

Turning Python Scripts into Working Web Apps Quickly with Streamlit

 I just realized that I am using Streamlit since almost one year now, posted about in Twitter or LinkedIn several times, but never wrote a blog post about it before. Communication in Data Science and Machine Learning is the key. Being able to showcase work in progress and share results with the business makes the difference. Verbal and non-verbal communication skills are important. Having some tool that could support you in this kind of conversation with a mixed audience that couldn't have a technical background or would like to hear in terms of results and business value would be of great help. I found that Streamlit fits well this scenario. Streamlit is an Open Source (Apache License 2.0) Python framework that turns data or ML scripts into shareable web apps in minutes (no kidding). Python only: no front‑end experience required. To start with Streamlit, just install it through pip (it is available in Anaconda too): pip install streamlit and you are ready to execute the working de...

jOOQ: code generation in Eclipse

jOOQ allows code generation from a database schema through ANT tasks, Maven and shell command tools. But if you're working with Eclipse it's easier to create a new Run Configuration to perform this operation. First of all you have to write the usual XML configuration file for the code generation starting from the database: <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <configuration xmlns="http://www.jooq.org/xsd/jooq-codegen-2.0.4.xsd">   <jdbc>     <driver>oracle.jdbc.driver.OracleDriver</driver>     <url>jdbc:oracle:thin:@dbhost:1700:DBSID</url>     <user>DB_FTRS</user>     <password>password</password>   </jdbc>   <generator>     <name>org.jooq.util.DefaultGenerator</name>     <database>       <name>org.jooq.util.oracle.OracleDatabase</name>     ...

Load testing MongoDB using JMeter

Apache JMeter ( http://jmeter.apache.org/ ) added support for MongoDB since its 2.10 release. In this post I am referring to the latest JMeter release (2.13). A preliminary JMeter setup is needed before starting your first test plan for MongoDB. It uses Groovy as scripting reference language, so Groovy needs to be set up for our favorite load testing tool. Follow these steps to complete the set up: Download Groovy from the official website ( http://www.groovy-lang.org/download.html ). In this post I am referring to the Groovy release 2.4.4, but using later versions is fine. Copy the groovy-all-2.4.4.jar to the $JMETER_HOME/lib folder. Restart JMeter if it was running while adding the Groovy JAR file. Now you can start creating a test plan for MongoDB load testing. From the UI select the MongoDB template ( File -> Templates... ). The new test plan has a MongoDB Source Config element. Here you have to setup the connection details for the database to be tested: The Threa...