The First Step: Pulling Data
The NEW Method: You can now download data directly in R!
You can now download data directly in R without having to install Python or a Python package. This feature is new, so there may be issues and not everything is properly documented yet. If you come across any issues, please post a bug report or contribute a fix!
library(gtrendR)
Sys.setenv(GOOGLE_TRENDS_KEY="YOUR_API_KEY") # Save your API key as an environment variable
getTimelinesForHealth(
batch_size = 1, # How many terms to process at once
year_batch = "1 year", # How much time to process at once
time.startDate = "2019-06-15", # When to start collecting data
time.endDate = "2020-01-01", # When to stop collecting data
timelineResolutions = c(
"month" # can be month, year, week, or day
),
terms = c(
# The terms you want to pull
'hand washing + hand soap',
'social isolation'
),
names = c(
# The aliases for those pulls (what you want the CSV files
# to be named, in the same order as the terms)
"handwashing",
"socialisolation"
),
geoRestriction.regions = c( # Which regional geographies you want
"US-NY",
"US-CA"
),
geoRestriction.countries = c( # What countries you want
"GB",
"US"
),
geoRestriction.dmas = c( # What dmas you want
),
output_directory = "../output" # Where you want the output data saved
)
This will save CSV files to the output directory with the filename format {name}_{timelineResolution}.csv
. You can then use those CSVs for the rest of the functions in this package. The CSVs will have the form:
|timestamp |US |US_AL |US_CA |US_NY |
|----------|-----------|-----------|-----------|-----------|
|2020-01-02|642.8568888|636.164136 |262.0138526|991.5688604|
|2020-01-03|969.2211805|696.3971518|578.4875232|248.9556789|
|2020-01-04|232.1583943|655.6860359|189.5345507|279.1872892|
|2020-01-05|488.0699387|471.8936588|953.0010047|131.028145 |
|2020-01-06|758.2366717|997.2484335|740.3822249|558.1017193|
|2020-01-07|443.525007 |211.6926334|358.489257 |240.2757544|
|2020-01-08|947.7052461|664.2961719|346.3216015|907.9927533|
|2020-01-09|415.2533228|448.5096531|222.1345994|333.3310304|
|2020-01-10|919.4877736|254.382975 |811.7631744|134.159574 |
The OLD Method: Possibly More Reliable But Probably Less Convenient
Before you begin using this package, pull the Google Trends data using the gtrendspy package for Python3.
Basic Download
We will use the following data pull to demonstrate the features of the package. Unfortunately, I cannot share the raw data.
theo_timeline
from gtrendspy import timeline
timeline.theo_timeline(
terms = ['hand washing', 'social isolation'],
names = ['handwashing', 'socialisolation'],
start = '2019-01-01',
end = '2020-06-01',
timeframe_list = ['day'],
geo_country_list = ['US'],
us_states = True,
worldwide = False,
timestep_years = 1,
batch_size = 2,
outpath = "/path/to/ROOTPATH/input",
creds = "/path/to/info.py"
)
To use the gtrendspy package for Python3, you'll need to request an API Key from Google. Don't let that discourage you -- it's easy! Complete the short application here. If you do not wish to use the gtrends package for Python, you'll need to format your data to match the following and save it as a CSV:
|timestamp |US |US_AL |US_CA |US_NY |
|----------|-----------|-----------|-----------|-----------|
|2020-01-02|642.8568888|636.164136 |262.0138526|991.5688604|
|2020-01-03|969.2211805|696.3971518|578.4875232|248.9556789|
|2020-01-04|232.1583943|655.6860359|189.5345507|279.1872892|
|2020-01-05|488.0699387|471.8936588|953.0010047|131.028145 |
|2020-01-06|758.2366717|997.2484335|740.3822249|558.1017193|
|2020-01-07|443.525007 |211.6926334|358.489257 |240.2757544|
|2020-01-08|947.7052461|664.2961719|346.3216015|907.9927533|
|2020-01-09|415.2533228|448.5096531|222.1345994|333.3310304|
|2020-01-10|919.4877736|254.382975 |811.7631744|134.159574 |
Notice that the column with dates is titled "timestamp" and all other column names correspond to geographies. For example, the search value for the US on 2020-01-02 is 642.9. The search value for the same date for Alabama (US_AL) is 636.2.
(NOTE: These are randomly generated values that do not correspond to actual search volumes for anything.)
Download with Related Terms
You may be interested not just in a particular search term but in a series of related search terms. In this case, you may consider using Google Trends' built-in "Top Queries" feature. You can implement that through the following function:
theo_timeline_top
from gtrendspy import topterms
topterms.theo_timeline_top(
root_terms = ['commit suicide', 'how suicide', 'depression help', 'suicide help'], # a list of the root terms you're interested in
num_terms_per_root = 10, # how many additional terms you want per root term
start = '2019-01-01', # the start date
end = '2020-04-10', # the end date
timeframe_list = ['week'], # the timeframe you want
outpath = "/path/to/ROOTPATH/input/individual",
creds = "/path/to/creds.txt",
geo_country_list = ['US'], # the region you're interested in. ONLY CHOOSE 1 or None
batch_size = 5, # how many terms you want in each batch
timestep_years = 1, # how many years you want to pull at once
get_all = True, # if True, pull a data file for all of hte terms together
all_path = "/path/to/ROOTPATH/input" # where you want the total file
)
This will automatically pull the top 10 search queries related to each of your root terms and pull the appropriate timeline files. Right now it only works with one region at a time. This could be useful for creating a multi-term barplot or spaghetti plot.