Filter and Resize

Filter

SLiDE.filter_with — Function

filter_with(df::DataFrame, set::Any; kwargs...)

Arguments

df::DataFrame to filter.
set::Dict or set::NamedTuple: Values to keep in the DataFrame.

Keywords

extrapolate::Bool = false: Add missing regions/years to the DataFrame? If extrapolate is set to true, the following kwargs become relevant:
- When extrapolating over years,
  - backward::Bool = true: Do we extrapolate backward in time?
  - forward::Bool = true: Do we extrapolate forward in time?
  Currently, "extrapolating" means copying the closest
- When extrapolating across regions,
  - r::Pair = "md" => "dc: Pair indicating a region (r.first) to extrapolate to another region (r.second). A suggested regional extrapolation: MD data will be used to approximate DC data in the event that it is missing.
  - overwrite::Bool = false: If data in the target region r.second is already present, should it be overwritten?

Returns

df::DataFrame with only the desired keys.

Examples

julia> df = read_file(joinpath(SLIDE_DIR,"docs","src","assets","data","filter_use.csv"))
14×4 DataFrame
│ Row │ yr    │ i      │ j      │ value   │
│     │ Int64 │ String │ String │ Float64 │
├─────┼───────┼────────┼────────┼─────────┤
│ 1   │ 2015  │ agr    │ agr    │ 69.42   │
│ 2   │ 2015  │ agr    │ fbp    │ 277.179 │
│ 3   │ 2015  │ fbp    │ agr    │ 49.132  │
│ 4   │ 2015  │ fbp    │ fbp    │ 210.998 │
│ 5   │ 2015  │ uti    │ agr    │ 4.846   │
│ 6   │ 2015  │ uti    │ fbp    │ 10.102  │
│ 7   │ 2015  │ uti    │ uti    │ 35.093  │
│ 8   │ 2016  │ agr    │ agr    │ 60.197  │
│ 9   │ 2016  │ agr    │ fbp    │ 264.173 │
│ 10  │ 2016  │ fbp    │ agr    │ 47.739  │
│ 11  │ 2016  │ fbp    │ fbp    │ 205.21  │
│ 12  │ 2016  │ uti    │ agr    │ 4.548   │
│ 13  │ 2016  │ uti    │ fbp    │ 9.152   │
│ 14  │ 2016  │ uti    │ uti    │ 27.47   │

julia> df = filter_with(df, (i = ["agr","fbp"], j = ["agr","fbp"]))
8×4 DataFrame
│ Row │ yr    │ i      │ j      │ value   │
│     │ Int64 │ String │ String │ Float64 │
├─────┼───────┼────────┼────────┼─────────┤
│ 1   │ 2015  │ agr    │ agr    │ 69.42   │
│ 2   │ 2015  │ agr    │ fbp    │ 277.179 │
│ 3   │ 2015  │ fbp    │ agr    │ 49.132  │
│ 4   │ 2015  │ fbp    │ fbp    │ 210.998 │
│ 5   │ 2016  │ agr    │ agr    │ 60.197  │
│ 6   │ 2016  │ agr    │ fbp    │ 264.173 │
│ 7   │ 2016  │ fbp    │ agr    │ 47.739  │
│ 8   │ 2016  │ fbp    │ fbp    │ 205.21  │

julia> filter_with(df, (yr = 2016,); drop = true)
4×3 DataFrame
│ Row │ i      │ j      │ value   │
│     │ String │ String │ Float64 │
├─────┼────────┼────────┼─────────┤
│ 1   │ agr    │ agr    │ 60.197  │
│ 2   │ agr    │ fbp    │ 264.173 │
│ 3   │ fbp    │ agr    │ 47.739  │
│ 4   │ fbp    │ fbp    │ 205.21  │

source

SLiDE.extrapolate_year — Function

extrapolate_year(df::DataFrame, yr::Array{Int64,1}; kwargs...)
extrapolate_year(df::DataFrame, set::Any; kwargs...)

Arguments

df::DataFrame that might be in need of extrapolation.
yr::Array{Int64,1}: List of years overwhich extrapolation is possible (depending on the kwargs)
set::Dict or set::NamedTuple containing list of years, identified by the key :yr.

Keywords

backward::Bool = true: Do we extrapolate backward in time?
forward::Bool = true: Do we extrapolate forward in time?

Returns

df::DataFrame extrapolated in time.

Example

Continuing with the DataFrame from SLiDE.filter_with,

julia> df
8×4 DataFrame
│ Row │ yr    │ i      │ j      │ value   │
│     │ Int64 │ String │ String │ Float64 │
├─────┼───────┼────────┼────────┼─────────┤
│ 1   │ 2015  │ agr    │ agr    │ 69.42   │
│ 2   │ 2015  │ agr    │ fbp    │ 277.179 │
│ 3   │ 2015  │ fbp    │ agr    │ 49.132  │
│ 4   │ 2015  │ fbp    │ fbp    │ 210.998 │
│ 5   │ 2016  │ agr    │ agr    │ 60.197  │
│ 6   │ 2016  │ agr    │ fbp    │ 264.173 │
│ 7   │ 2016  │ fbp    │ agr    │ 47.739  │
│ 8   │ 2016  │ fbp    │ fbp    │ 205.21  │

julia> extrapolate_year(df, Dict(:yr => 2014:2017))
16×4 DataFrame
│ Row │ yr    │ i      │ j      │ value   │
│     │ Int64 │ String │ String │ Float64 │
├─────┼───────┼────────┼────────┼─────────┤
│ 1   │ 2014  │ agr    │ agr    │ 69.42   │
│ 2   │ 2014  │ agr    │ fbp    │ 277.179 │
│ 3   │ 2014  │ fbp    │ agr    │ 49.132  │
│ 4   │ 2014  │ fbp    │ fbp    │ 210.998 │
│ 5   │ 2015  │ agr    │ agr    │ 69.42   │
│ 6   │ 2015  │ agr    │ fbp    │ 277.179 │
│ 7   │ 2015  │ fbp    │ agr    │ 49.132  │
│ 8   │ 2015  │ fbp    │ fbp    │ 210.998 │
│ 9   │ 2016  │ agr    │ agr    │ 60.197  │
│ 10  │ 2016  │ agr    │ fbp    │ 264.173 │
│ 11  │ 2016  │ fbp    │ agr    │ 47.739  │
│ 12  │ 2016  │ fbp    │ fbp    │ 205.21  │
│ 13  │ 2017  │ agr    │ agr    │ 60.197  │
│ 14  │ 2017  │ agr    │ fbp    │ 264.173 │
│ 15  │ 2017  │ fbp    │ agr    │ 47.739  │
│ 16  │ 2017  │ fbp    │ fbp    │ 205.21  │

source

SLiDE.extrapolate_region — Function

extrapolate_region(df::DataFrame; kwargs...)
extrapolate_region(df::DataFrame, r::Pair; kwargs...)

Fills in missing data in the input DataFrame df by filling it with existing information in df. Here, "extrapolate" makes a direct copy of the data.

Arguments

df::DataFrame that might be in need of extrapolation.
r::Pair = "md" => "dc: Pair indicating a region (r.first) to extrapolate to another region (r.second). A suggested regional extrapolation: MD data will be used to approximate DC data in the event that it is missing. To fill multiple regions with data, use "md" => ["dc","va"].

Keyword Argument:

overwrite::Bool = false: If data in the target region r.second is already present, should it be overwritten?

Returns

df::DataFrame extrapolated in region.

Example

julia> df = read_file(joinpath(SLIDE_DIR,"docs","src","assets","data","filter_utd.csv"))
8×5 DataFrame
│ Row │ yr    │ r      │ s      │ t       │ value     │
│     │ Int64 │ String │ String │ String  │ Float64   │
├─────┼───────┼────────┼────────┼─────────┼───────────┤
│ 1   │ 2015  │ md     │ agr    │ exports │ 0.0390152 │
│ 2   │ 2015  │ md     │ agr    │ imports │ 0.778159  │
│ 3   │ 2015  │ va     │ agr    │ exports │ 1.11601   │
│ 4   │ 2015  │ va     │ agr    │ imports │ 0.88253   │
│ 5   │ 2016  │ md     │ agr    │ exports │ 0.0330508 │
│ 6   │ 2016  │ md     │ agr    │ imports │ 0.762089  │
│ 7   │ 2016  │ va     │ agr    │ exports │ 1.16253   │
│ 8   │ 2016  │ va     │ agr    │ imports │ 0.86741   │

julia> extrapolate_region(df)
12×5 DataFrame
│ Row │ r      │ yr    │ s      │ t       │ value     │
│     │ String │ Int64 │ String │ String  │ Float64   │
├─────┼────────┼───────┼────────┼─────────┼───────────┤
│ 1   │ dc     │ 2015  │ agr    │ exports │ 0.0390152 │
│ 2   │ dc     │ 2015  │ agr    │ imports │ 0.778159  │
│ 3   │ dc     │ 2016  │ agr    │ exports │ 0.0330508 │
│ 4   │ dc     │ 2016  │ agr    │ imports │ 0.762089  │
│ 5   │ md     │ 2015  │ agr    │ exports │ 0.0390152 │
│ 6   │ md     │ 2015  │ agr    │ imports │ 0.778159  │
│ 7   │ md     │ 2016  │ agr    │ exports │ 0.0330508 │
│ 8   │ md     │ 2016  │ agr    │ imports │ 0.762089  │
│ 9   │ va     │ 2015  │ agr    │ exports │ 1.11601   │
│ 10  │ va     │ 2015  │ agr    │ imports │ 0.88253   │
│ 11  │ va     │ 2016  │ agr    │ exports │ 1.16253   │
│ 12  │ va     │ 2016  │ agr    │ imports │ 0.86741   │

If we instead want to copy VA data into DC, specify:

julia> extrapolate_region(df, "va" => "dc")
12×5 DataFrame
│ Row │ r      │ yr    │ s      │ t       │ value     │
│     │ String │ Int64 │ String │ String  │ Float64   │
├─────┼────────┼───────┼────────┼─────────┼───────────┤
│ 1   │ dc     │ 2015  │ agr    │ exports │ 1.11601   │
│ 2   │ dc     │ 2015  │ agr    │ imports │ 0.88253   │
│ 3   │ dc     │ 2016  │ agr    │ exports │ 1.16253   │
│ 4   │ dc     │ 2016  │ agr    │ imports │ 0.86741   │
│ 5   │ md     │ 2015  │ agr    │ exports │ 0.0390152 │
│ 6   │ md     │ 2015  │ agr    │ imports │ 0.778159  │
│ 7   │ md     │ 2016  │ agr    │ exports │ 0.0330508 │
│ 8   │ md     │ 2016  │ agr    │ imports │ 0.762089  │
│ 9   │ va     │ 2015  │ agr    │ exports │ 1.11601   │
│ 10  │ va     │ 2015  │ agr    │ imports │ 0.88253   │
│ 11  │ va     │ 2016  │ agr    │ exports │ 1.16253   │
│ 12  │ va     │ 2016  │ agr    │ imports │ 0.86741   │

source

SLiDE.map_year — Function

This function returns a DataFrame defining mapping for a step function.

Keywords

fun::Function: how to pick the cut-off boundary. By default, this is set to occur between to values. For example, this would result in using 2007 data for years <= 2009 and 2012 data for years >= 2009.

Returns

Example

julia> map_year([2007,2012] => 2005:2015)
11×2 DataFrame
│ Row │ from  │ to    │
│     │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1   │ 2007  │ 2005  │
│ 2   │ 2007  │ 2006  │
│ 3   │ 2007  │ 2007  │
│ 4   │ 2007  │ 2008  │
│ 5   │ 2007  │ 2009  │
│ 6   │ 2012  │ 2010  │
│ 7   │ 2012  │ 2011  │
│ 8   │ 2012  │ 2012  │
│ 9   │ 2012  │ 2013  │
│ 10  │ 2012  │ 2014  │
│ 11  │ 2012  │ 2015  │

source

Fill

SLiDE.fill_zero — Function

fill_zero(keys_unique::NamedTuple; value_colnames)
fill_zero(keys_unique::NamedTuple, df::DataFrame)
fill_zero(df::DataFrame...)
fill_zero(d::Dict...)
fill_zero(keys_unique, d::Dict)

This function can be used to fill zeros in either a dictionary or DataFrame.

Options for dictionary editing:
- If only (a) dictionary/ies is/are input, the dictionaries will be edited such that they all contain all permutations of their key values. All dictionaries in a resultant list of dictionaries will be the same length.
- If a dictionary is input with a list of keys, it will be edited to ensure that it includes all permutations.
- If only a list of keys is input, a new dictionary will be created, containing all key permutations with values initialized to zero.
Options for DataFrame editing:
- If only (a) DataFrame(s) is/are input, the DataFrame(s) will be edited such that they all contain all permutations of their key values. All DataFrames in a resultant list of DataFrames will be the same length.
- If a DataFrame is input with a NamedTuple, it will be edited to ensure that it includes all permutations of the NamedTuple's values.
- If only a NamedTuple is input, a new DataFrame will be created, containing all key permutations with values initialized to zero.

Arguments

keys_unique::Tuple: A list of arrays whose permutations should be included in the resultant dictionary.
keys_unique::NamedTuple: A list of arrays whose permutations should be included in the resultant dictionary. The NamedTuple's keys correspond to the DataFrame columns where they will be stored.
d::Dict...: The dictionary/ies to edit.
df::DataFrame...: The DataFrame(s) to edit.

Keywords

value_colnames::Any = :value: "value" column labels to add and set to zero when creating a new DataFrame. Default is :value.

Returns

d::Dict... if input included dictionaries and/or Tuples
df::DataFrame... if input included DataFrames and/or NamedTuples

source

Initialize a new DataFrame or dictionary.

julia> years = 2015:2016; regions = ["md","va"];

julia> fill_zero((years, regions))
Dict{Tuple{Int64,String},Float64} with 4 entries:
  (2015, "va") => 0.0
  (2015, "md") => 0.0
  (2016, "md") => 0.0
  (2016, "va") => 0.0

julia> fill_zero((yr = years, r = regions))
4×3 DataFrame
│ Row │ yr    │ r      │ value   │
│     │ Int64 │ String │ Float64 │
├─────┼───────┼────────┼─────────┤
│ 1   │ 2015  │ md     │ 0.0     │
│ 2   │ 2016  │ md     │ 0.0     │
│ 3   │ 2015  │ va     │ 0.0     │
│ 4   │ 2016  │ va     │ 0.0     │

Edit an existing DataFrame or dictionary.

julia> df = read_file(joinpath(SLIDE_DIR,"docs","src","assets","data","fill_use.csv"))
6×4 DataFrame
│ Row │ yr    │ i      │ j      │ value   │
│     │ Int64 │ String │ String │ Float64 │
├─────┼───────┼────────┼────────┼─────────┤
│ 1   │ 2015  │ agr    │ agr    │ 69.42   │
│ 2   │ 2015  │ uti    │ agr    │ 4.846   │
│ 3   │ 2015  │ uti    │ uti    │ 35.093  │
│ 4   │ 2016  │ agr    │ agr    │ 60.197  │
│ 5   │ 2016  │ uti    │ agr    │ 4.548   │
│ 6   │ 2016  │ uti    │ uti    │ 27.47   │

julia> fill_zero(df)
8×4 DataFrame
│ Row │ yr     │ i       │ j       │ value   │
│     │ Int64? │ String? │ String? │ Float64 │
├─────┼────────┼─────────┼─────────┼─────────┤
│ 1   │ 2015   │ agr     │ agr     │ 69.42   │
│ 2   │ 2016   │ agr     │ agr     │ 60.197  │
│ 3   │ 2015   │ uti     │ agr     │ 4.846   │
│ 4   │ 2016   │ uti     │ agr     │ 4.548   │
│ 5   │ 2015   │ agr     │ uti     │ 0.0     │
│ 6   │ 2016   │ agr     │ uti     │ 0.0     │
│ 7   │ 2015   │ uti     │ uti     │ 35.093  │
│ 8   │ 2016   │ uti     │ uti     │ 27.47   │

julia> d = convert_type(Dict, df)
Dict{Tuple{Int64,String,String},Float64} with 6 entries:
  (2015, "uti", "uti") => 35.093
  (2016, "agr", "agr") => 60.197
  (2015, "agr", "agr") => 69.42
  (2016, "uti", "uti") => 27.47
  (2015, "uti", "agr") => 4.846
  (2016, "uti", "agr") => 4.548

julia> fill_zero(d)
┌ Warning: Depreciated!
└ @ SLiDE ~/Documents/Git/SLiDE/src/utils/fill_zero.jl:132
Dict{Tuple{Int64,String,String},Float64} with 6 entries:
  (2015, "uti", "uti") => 35.093
  (2016, "agr", "agr") => 60.197
  (2015, "agr", "agr") => 69.42
  (2016, "uti", "uti") => 27.47
  (2015, "uti", "agr") => 4.846
  (2016, "uti", "agr") => 4.548

SLiDE.fill_with — Function

Initialize a new DataFrame and fills it with the specified input value.

source