Filter and Resize
Filter
SLiDE.filter_with
— Functionfilter_with(df::DataFrame, set::Any; kwargs...)
Arguments
df::DataFrame
to filter.set::Dict
orset::NamedTuple
: Values to keep in the DataFrame.
Keywords
extrapolate::Bool = false
: Add missing regions/years to the DataFrame? Ifextrapolate
is set to true, the followingkwargs
become relevant:- When extrapolating over years,
backward::Bool = true
: Do we extrapolate backward in time?forward::Bool = true
: Do we extrapolate forward in time?
- When extrapolating across regions,
r::Pair = "md" => "dc
:Pair
indicating a region (r.first
) to extrapolate to another region (r.second
). A suggested regional extrapolation: MD data will be used to approximate DC data in the event that it is missing.overwrite::Bool = false
: If data in the target regionr.second
is already present, should it be overwritten?
- When extrapolating over years,
Returns
df::DataFrame
with only the desired keys.
Examples
julia> df = read_file(joinpath(SLIDE_DIR,"docs","src","assets","data","filter_use.csv"))
14×4 DataFrame
│ Row │ yr │ i │ j │ value │
│ │ Int64 │ String │ String │ Float64 │
├─────┼───────┼────────┼────────┼─────────┤
│ 1 │ 2015 │ agr │ agr │ 69.42 │
│ 2 │ 2015 │ agr │ fbp │ 277.179 │
│ 3 │ 2015 │ fbp │ agr │ 49.132 │
│ 4 │ 2015 │ fbp │ fbp │ 210.998 │
│ 5 │ 2015 │ uti │ agr │ 4.846 │
│ 6 │ 2015 │ uti │ fbp │ 10.102 │
│ 7 │ 2015 │ uti │ uti │ 35.093 │
│ 8 │ 2016 │ agr │ agr │ 60.197 │
│ 9 │ 2016 │ agr │ fbp │ 264.173 │
│ 10 │ 2016 │ fbp │ agr │ 47.739 │
│ 11 │ 2016 │ fbp │ fbp │ 205.21 │
│ 12 │ 2016 │ uti │ agr │ 4.548 │
│ 13 │ 2016 │ uti │ fbp │ 9.152 │
│ 14 │ 2016 │ uti │ uti │ 27.47 │
julia> df = filter_with(df, (i = ["agr","fbp"], j = ["agr","fbp"]))
8×4 DataFrame
│ Row │ yr │ i │ j │ value │
│ │ Int64 │ String │ String │ Float64 │
├─────┼───────┼────────┼────────┼─────────┤
│ 1 │ 2015 │ agr │ agr │ 69.42 │
│ 2 │ 2015 │ agr │ fbp │ 277.179 │
│ 3 │ 2015 │ fbp │ agr │ 49.132 │
│ 4 │ 2015 │ fbp │ fbp │ 210.998 │
│ 5 │ 2016 │ agr │ agr │ 60.197 │
│ 6 │ 2016 │ agr │ fbp │ 264.173 │
│ 7 │ 2016 │ fbp │ agr │ 47.739 │
│ 8 │ 2016 │ fbp │ fbp │ 205.21 │
julia> filter_with(df, (yr = 2016,); drop = true)
4×3 DataFrame
│ Row │ i │ j │ value │
│ │ String │ String │ Float64 │
├─────┼────────┼────────┼─────────┤
│ 1 │ agr │ agr │ 60.197 │
│ 2 │ agr │ fbp │ 264.173 │
│ 3 │ fbp │ agr │ 47.739 │
│ 4 │ fbp │ fbp │ 205.21 │
SLiDE.extrapolate_year
— Functionextrapolate_year(df::DataFrame, yr::Array{Int64,1}; kwargs...)
extrapolate_year(df::DataFrame, set::Any; kwargs...)
Arguments
df::DataFrame
that might be in need of extrapolation.yr::Array{Int64,1}
: List of years overwhich extrapolation is possible (depending on the kwargs)set::Dict
orset::NamedTuple
containing list of years, identified by the key:yr
.
Keywords
backward::Bool = true
: Do we extrapolate backward in time?forward::Bool = true
: Do we extrapolate forward in time?
Returns
df::DataFrame
extrapolated in time.
Example
Continuing with the DataFrame from SLiDE.filter_with
,
julia> df
8×4 DataFrame
│ Row │ yr │ i │ j │ value │
│ │ Int64 │ String │ String │ Float64 │
├─────┼───────┼────────┼────────┼─────────┤
│ 1 │ 2015 │ agr │ agr │ 69.42 │
│ 2 │ 2015 │ agr │ fbp │ 277.179 │
│ 3 │ 2015 │ fbp │ agr │ 49.132 │
│ 4 │ 2015 │ fbp │ fbp │ 210.998 │
│ 5 │ 2016 │ agr │ agr │ 60.197 │
│ 6 │ 2016 │ agr │ fbp │ 264.173 │
│ 7 │ 2016 │ fbp │ agr │ 47.739 │
│ 8 │ 2016 │ fbp │ fbp │ 205.21 │
julia> extrapolate_year(df, Dict(:yr => 2014:2017))
16×4 DataFrame
│ Row │ yr │ i │ j │ value │
│ │ Int64 │ String │ String │ Float64 │
├─────┼───────┼────────┼────────┼─────────┤
│ 1 │ 2014 │ agr │ agr │ 69.42 │
│ 2 │ 2014 │ agr │ fbp │ 277.179 │
│ 3 │ 2014 │ fbp │ agr │ 49.132 │
│ 4 │ 2014 │ fbp │ fbp │ 210.998 │
│ 5 │ 2015 │ agr │ agr │ 69.42 │
│ 6 │ 2015 │ agr │ fbp │ 277.179 │
│ 7 │ 2015 │ fbp │ agr │ 49.132 │
│ 8 │ 2015 │ fbp │ fbp │ 210.998 │
│ 9 │ 2016 │ agr │ agr │ 60.197 │
│ 10 │ 2016 │ agr │ fbp │ 264.173 │
│ 11 │ 2016 │ fbp │ agr │ 47.739 │
│ 12 │ 2016 │ fbp │ fbp │ 205.21 │
│ 13 │ 2017 │ agr │ agr │ 60.197 │
│ 14 │ 2017 │ agr │ fbp │ 264.173 │
│ 15 │ 2017 │ fbp │ agr │ 47.739 │
│ 16 │ 2017 │ fbp │ fbp │ 205.21 │
SLiDE.extrapolate_region
— Functionextrapolate_region(df::DataFrame; kwargs...)
extrapolate_region(df::DataFrame, r::Pair; kwargs...)
Fills in missing data in the input DataFrame df
by filling it with existing information in df
. Here, "extrapolate" makes a direct copy of the data.
Arguments
df::DataFrame
that might be in need of extrapolation.r::Pair = "md" => "dc
:Pair
indicating a region (r.first
) to extrapolate to another region (r.second
). A suggested regional extrapolation: MD data will be used to approximate DC data in the event that it is missing. To fill multiple regions with data, use "md" => ["dc","va"].
Keyword Argument:
overwrite::Bool = false
: If data in the target regionr.second
is already present, should it be overwritten?
Returns
df::DataFrame
extrapolated in region.
Example
julia> df = read_file(joinpath(SLIDE_DIR,"docs","src","assets","data","filter_utd.csv"))
8×5 DataFrame
│ Row │ yr │ r │ s │ t │ value │
│ │ Int64 │ String │ String │ String │ Float64 │
├─────┼───────┼────────┼────────┼─────────┼───────────┤
│ 1 │ 2015 │ md │ agr │ exports │ 0.0390152 │
│ 2 │ 2015 │ md │ agr │ imports │ 0.778159 │
│ 3 │ 2015 │ va │ agr │ exports │ 1.11601 │
│ 4 │ 2015 │ va │ agr │ imports │ 0.88253 │
│ 5 │ 2016 │ md │ agr │ exports │ 0.0330508 │
│ 6 │ 2016 │ md │ agr │ imports │ 0.762089 │
│ 7 │ 2016 │ va │ agr │ exports │ 1.16253 │
│ 8 │ 2016 │ va │ agr │ imports │ 0.86741 │
julia> extrapolate_region(df)
12×5 DataFrame
│ Row │ r │ yr │ s │ t │ value │
│ │ String │ Int64 │ String │ String │ Float64 │
├─────┼────────┼───────┼────────┼─────────┼───────────┤
│ 1 │ dc │ 2015 │ agr │ exports │ 0.0390152 │
│ 2 │ dc │ 2015 │ agr │ imports │ 0.778159 │
│ 3 │ dc │ 2016 │ agr │ exports │ 0.0330508 │
│ 4 │ dc │ 2016 │ agr │ imports │ 0.762089 │
│ 5 │ md │ 2015 │ agr │ exports │ 0.0390152 │
│ 6 │ md │ 2015 │ agr │ imports │ 0.778159 │
│ 7 │ md │ 2016 │ agr │ exports │ 0.0330508 │
│ 8 │ md │ 2016 │ agr │ imports │ 0.762089 │
│ 9 │ va │ 2015 │ agr │ exports │ 1.11601 │
│ 10 │ va │ 2015 │ agr │ imports │ 0.88253 │
│ 11 │ va │ 2016 │ agr │ exports │ 1.16253 │
│ 12 │ va │ 2016 │ agr │ imports │ 0.86741 │
If we instead want to copy VA data into DC, specify:
julia> extrapolate_region(df, "va" => "dc")
12×5 DataFrame
│ Row │ r │ yr │ s │ t │ value │
│ │ String │ Int64 │ String │ String │ Float64 │
├─────┼────────┼───────┼────────┼─────────┼───────────┤
│ 1 │ dc │ 2015 │ agr │ exports │ 1.11601 │
│ 2 │ dc │ 2015 │ agr │ imports │ 0.88253 │
│ 3 │ dc │ 2016 │ agr │ exports │ 1.16253 │
│ 4 │ dc │ 2016 │ agr │ imports │ 0.86741 │
│ 5 │ md │ 2015 │ agr │ exports │ 0.0390152 │
│ 6 │ md │ 2015 │ agr │ imports │ 0.778159 │
│ 7 │ md │ 2016 │ agr │ exports │ 0.0330508 │
│ 8 │ md │ 2016 │ agr │ imports │ 0.762089 │
│ 9 │ va │ 2015 │ agr │ exports │ 1.11601 │
│ 10 │ va │ 2015 │ agr │ imports │ 0.88253 │
│ 11 │ va │ 2016 │ agr │ exports │ 1.16253 │
│ 12 │ va │ 2016 │ agr │ imports │ 0.86741 │
SLiDE.map_year
— FunctionThis function returns a DataFrame defining mapping for a step function.
Keywords
fun::Function
: how to pick the cut-off boundary. By default, this is set to occur between to values. For example, this would result in using 2007 data for years <= 2009 and 2012 data for years >= 2009.
Returns
Example
julia> map_year([2007,2012] => 2005:2015)
11×2 DataFrame
│ Row │ from │ to │
│ │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1 │ 2007 │ 2005 │
│ 2 │ 2007 │ 2006 │
│ 3 │ 2007 │ 2007 │
│ 4 │ 2007 │ 2008 │
│ 5 │ 2007 │ 2009 │
│ 6 │ 2012 │ 2010 │
│ 7 │ 2012 │ 2011 │
│ 8 │ 2012 │ 2012 │
│ 9 │ 2012 │ 2013 │
│ 10 │ 2012 │ 2014 │
│ 11 │ 2012 │ 2015 │
Fill
SLiDE.fill_zero
— Functionfill_zero(keys_unique::NamedTuple; value_colnames)
fill_zero(keys_unique::NamedTuple, df::DataFrame)
fill_zero(df::DataFrame...)
fill_zero(d::Dict...)
fill_zero(keys_unique, d::Dict)
This function can be used to fill zeros in either a dictionary or DataFrame.
- Options for dictionary editing:
- If only (a) dictionary/ies is/are input, the dictionaries will be edited such that they all contain all permutations of their key values. All dictionaries in a resultant list of dictionaries will be the same length.
- If a dictionary is input with a list of keys, it will be edited to ensure that it includes all permutations.
- If only a list of keys is input, a new dictionary will be created, containing all key permutations with values initialized to zero.
- Options for DataFrame editing:
- If only (a) DataFrame(s) is/are input, the DataFrame(s) will be edited such that they all contain all permutations of their key values. All DataFrames in a resultant list of DataFrames will be the same length.
- If a DataFrame is input with a NamedTuple, it will be edited to ensure that it includes all permutations of the NamedTuple's values.
- If only a NamedTuple is input, a new DataFrame will be created, containing all key permutations with values initialized to zero.
Arguments
keys_unique::Tuple
: A list of arrays whose permutations should be included in the resultant dictionary.keys_unique::NamedTuple
: A list of arrays whose permutations should be included in the resultant dictionary. The NamedTuple's keys correspond to the DataFrame columns where they will be stored.d::Dict...
: The dictionary/ies to edit.df::DataFrame...
: The DataFrame(s) to edit.
Keywords
value_colnames::Any = :value
: "value" column labels to add and set to zero when creating a new DataFrame. Default is:value
.
Returns
d::Dict...
if input included dictionaries and/or Tuplesdf::DataFrame...
if input included DataFrames and/or NamedTuples
Initialize a new DataFrame or dictionary.
julia> years = 2015:2016; regions = ["md","va"];
julia> fill_zero((years, regions))
Dict{Tuple{Int64,String},Float64} with 4 entries:
(2015, "va") => 0.0
(2015, "md") => 0.0
(2016, "md") => 0.0
(2016, "va") => 0.0
julia> fill_zero((yr = years, r = regions))
4×3 DataFrame
│ Row │ yr │ r │ value │
│ │ Int64 │ String │ Float64 │
├─────┼───────┼────────┼─────────┤
│ 1 │ 2015 │ md │ 0.0 │
│ 2 │ 2016 │ md │ 0.0 │
│ 3 │ 2015 │ va │ 0.0 │
│ 4 │ 2016 │ va │ 0.0 │
Edit an existing DataFrame or dictionary.
julia> df = read_file(joinpath(SLIDE_DIR,"docs","src","assets","data","fill_use.csv"))
6×4 DataFrame
│ Row │ yr │ i │ j │ value │
│ │ Int64 │ String │ String │ Float64 │
├─────┼───────┼────────┼────────┼─────────┤
│ 1 │ 2015 │ agr │ agr │ 69.42 │
│ 2 │ 2015 │ uti │ agr │ 4.846 │
│ 3 │ 2015 │ uti │ uti │ 35.093 │
│ 4 │ 2016 │ agr │ agr │ 60.197 │
│ 5 │ 2016 │ uti │ agr │ 4.548 │
│ 6 │ 2016 │ uti │ uti │ 27.47 │
julia> fill_zero(df)
8×4 DataFrame
│ Row │ yr │ i │ j │ value │
│ │ Int64? │ String? │ String? │ Float64 │
├─────┼────────┼─────────┼─────────┼─────────┤
│ 1 │ 2015 │ agr │ agr │ 69.42 │
│ 2 │ 2016 │ agr │ agr │ 60.197 │
│ 3 │ 2015 │ uti │ agr │ 4.846 │
│ 4 │ 2016 │ uti │ agr │ 4.548 │
│ 5 │ 2015 │ agr │ uti │ 0.0 │
│ 6 │ 2016 │ agr │ uti │ 0.0 │
│ 7 │ 2015 │ uti │ uti │ 35.093 │
│ 8 │ 2016 │ uti │ uti │ 27.47 │
julia> d = convert_type(Dict, df)
Dict{Tuple{Int64,String,String},Float64} with 6 entries:
(2015, "uti", "uti") => 35.093
(2016, "agr", "agr") => 60.197
(2015, "agr", "agr") => 69.42
(2016, "uti", "uti") => 27.47
(2015, "uti", "agr") => 4.846
(2016, "uti", "agr") => 4.548
julia> fill_zero(d)
┌ Warning: Depreciated!
└ @ SLiDE ~/Documents/Git/SLiDE/src/utils/fill_zero.jl:132
Dict{Tuple{Int64,String,String},Float64} with 6 entries:
(2015, "uti", "uti") => 35.093
(2016, "agr", "agr") => 60.197
(2015, "agr", "agr") => 69.42
(2016, "uti", "uti") => 27.47
(2015, "uti", "agr") => 4.846
(2016, "uti", "agr") => 4.548
SLiDE.fill_with
— FunctionInitialize a new DataFrame and fills it with the specified input value.