top of page
Search
  • Writer's pictureMeryl Marie

A Fix for My ARIMA: Frequency in Time Series Data

I recently attempted an Auto Regressive Integrated Moving Average (ARIMA) model for my time series data, COVID-19 cases in New Jersey prisons. After creating a successful linear time series regression, I moved on to the slightly more complex ARIMA. Step by step, I differenced the data, fit the model, and tried to run predictions. But it kept coming back with the error:

KeyError: 'The `start` argument could not be matched to a location related to the index of the data.'

I could not figure out why because the index was referencing a date, and it was the correct format, because I used this code to reindex:

	pd.to_datetime(df['date column'])
	df.set_index('date_column', inplace = True)

With the help of my good friend Stack Overflow , I learned that the date time index has to have a consistent frequency. For example, if the first record is on March 26, 2020, the next record is on April 1, 2020 (6 days difference), and the third is April 10, 2020 (9 days difference), the time lapsed between Index[0], Index[1] and Index[3] do not match the rest of the dataset. This can cause an error down the line in the ARIMA model.


To fix this, I used resampling. For time series data, resampling is like a df.groupby(). It aggregates the data according to a time period, such as year (Y), month (M), or day (D). I utilized the weekly (W) function and was able to aggregate the data to consistent frequency with the below code:

df = df.resample('W').sum()

After I made sure the data was in the correct format, I chose my target column, parameters, and train/test split. Finally, I was able to make predictions based on my trained model.


I hope this information helps some trouble data scientist/data science student someday. It took hours of research and rerunning models, but at least I now know how to create an accurate and consistent time series data set for future models!

17 views0 comments

Recent Posts

See All

Life After a Data Science Bootcamp

The last time I updated my LinkedIn, I told people I was working on a personal project and to keep and eye out! Well, of course life happens and I was busy with interviews and vacation and living, so

Attention Sociology Students: Try Data Science!

When I was in college, I took a Statistics 101 class for my Sociology/Anthropology major. We calculated chi-square tests, correlation, and statistical significance to find relationships between variab

bottom of page