Water Harvesting Project

Data Analysis and Visualization using R Programming

This is a sample visualization for an imaginary water harvesting project. All the information provided here are imaginary, created solely for the purpose of simulating a case to develop innovative intervention that will strengthen resilience of concerned communities (example country taken here is Somalia). The case scenario is based on plausible assumptions and projections but does not reflect the actual situation or needs of any specific location or group. The aim of this exercise is to develop a sample portfolio showcasing the skills and knowledge related to data analysis and visualization.

Libraries

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.2     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.2     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(reshape2)

## 
## Attaching package: 'reshape2'
## 
## The following object is masked from 'package:tidyr':
## 
##     smiths

library(viridis)

## Loading required package: viridisLite

Setting theme

theme_set(theme_minimal())

Importing csv file harvest

harvest <- read.csv("harvest.csv")
head(harvest)

##   SN            Name Gender Age  District Latitude Longitude     Date Land.Size
## 1  1 Kartiile Khadar      M  54 Mogadishu  2.08617   45.2959 7/6/2023         2
## 2  2      Xayd Ladan      M  41 Mogadishu  2.08617   45.2959 7/6/2023         4
## 3  3 Sharmooge Diric      M  26 Mogadishu  2.08617   45.2959 7/6/2023         7
## 4  4  Batuulo Xareed      F  43 Mogadishu  2.08617   45.2959 7/6/2023         6
## 5  5   Siraad Garaar      F  52 Mogadishu  2.08617   45.2959 7/6/2023         1
## 6  6     Hidan Bedri      M  38 Mogadishu  2.08617   45.2959 7/6/2023         3
##   Land.ID  Crops Updated.Yield Updated.Income.from.Agriculture
## 1    MO-1 Tomato         1,600                           2,880
## 2    MO-2   Bean           875                           3,850
## 3    MO-3  Maize           840                           4,410
## 4    MO-4 Tomato         1,500                           8,100
## 5    MO-5   Bean           630                             693
## 6    MO-6   Bean           735                           2,426
##   Updated.Total.income X..Trained Satisfaction Perception
## 1               $4,723          2            5          5
## 2               $7,007          2            5          5
## 3               $8,335          2            1          1
## 4              $12,717          1            4          4
## 5                 $866          2            2          1
## 6               $3,614          1            2          2

str(harvest)

## 'data.frame':    50 obs. of  17 variables:
##  $ SN                             : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Name                           : chr  "Kartiile Khadar" "Xayd Ladan" "Sharmooge Diric" "Batuulo Xareed" ...
##  $ Gender                         : chr  "M" "M" "M" "F" ...
##  $ Age                            : int  54 41 26 43 52 38 48 19 22 22 ...
##  $ District                       : chr  "Mogadishu" "Mogadishu" "Mogadishu" "Mogadishu" ...
##  $ Latitude                       : num  2.09 2.09 2.09 2.09 2.09 ...
##  $ Longitude                      : num  45.3 45.3 45.3 45.3 45.3 ...
##  $ Date                           : chr  "7/6/2023" "7/6/2023" "7/6/2023" "7/6/2023" ...
##  $ Land.Size                      : int  2 4 7 6 1 3 1 7 7 2 ...
##  $ Land.ID                        : chr  "MO-1" "MO-2" "MO-3" "MO-4" ...
##  $ Crops                          : chr  "Tomato" "Bean" "Maize" "Tomato" ...
##  $ Updated.Yield                  : chr  "1,600" "875" "840" "1,500" ...
##  $ Updated.Income.from.Agriculture: chr  "2,880" "3,850" "4,410" "8,100" ...
##  $ Updated.Total.income           : chr  "$4,723" "$7,007" "$8,335" "$12,717" ...
##  $ X..Trained                     : int  2 2 2 1 2 1 1 2 1 1 ...
##  $ Satisfaction                   : int  5 5 1 4 2 2 2 5 2 5 ...
##  $ Perception                     : int  5 5 1 4 1 2 3 4 2 5 ...

Changing data types

Converting Date column to data type Date

harvest$Date

##  [1] "7/6/2023" "7/6/2023" "7/6/2023" "7/6/2023" "7/6/2023" "7/6/2023"
##  [7] "7/6/2023" "7/6/2023" "7/6/2023" "7/6/2023" "7/5/2023" "7/5/2023"
## [13] "7/5/2023" "7/5/2023" "7/5/2023" "7/5/2023" "7/5/2023" "7/5/2023"
## [19] "7/5/2023" "7/5/2023" "7/5/2023" "7/5/2023" "7/5/2023" "7/5/2023"
## [25] "7/5/2023" "7/5/2023" "7/5/2023" "7/5/2023" "7/5/2023" "7/5/2023"
## [31] "7/2/2023" "7/2/2023" "7/2/2023" "7/2/2023" "7/2/2023" "7/2/2023"
## [37] "7/2/2023" "7/2/2023" "7/2/2023" "7/2/2023" "7/6/2023" "7/6/2023"
## [43] "7/6/2023" "7/6/2023" "7/6/2023" "7/6/2023" "7/6/2023" "7/6/2023"
## [49] "7/6/2023" "7/6/2023"

class(harvest$Date)

## [1] "character"

harvest$Date <- as.Date(harvest$Date,"%m/%d/%Y")
harvest$Date

##  [1] "2023-07-06" "2023-07-06" "2023-07-06" "2023-07-06" "2023-07-06"
##  [6] "2023-07-06" "2023-07-06" "2023-07-06" "2023-07-06" "2023-07-06"
## [11] "2023-07-05" "2023-07-05" "2023-07-05" "2023-07-05" "2023-07-05"
## [16] "2023-07-05" "2023-07-05" "2023-07-05" "2023-07-05" "2023-07-05"
## [21] "2023-07-05" "2023-07-05" "2023-07-05" "2023-07-05" "2023-07-05"
## [26] "2023-07-05" "2023-07-05" "2023-07-05" "2023-07-05" "2023-07-05"
## [31] "2023-07-02" "2023-07-02" "2023-07-02" "2023-07-02" "2023-07-02"
## [36] "2023-07-02" "2023-07-02" "2023-07-02" "2023-07-02" "2023-07-02"
## [41] "2023-07-06" "2023-07-06" "2023-07-06" "2023-07-06" "2023-07-06"
## [46] "2023-07-06" "2023-07-06" "2023-07-06" "2023-07-06" "2023-07-06"

Converting character columns to numeric

harvest$Updated.Income.from.Agriculture <- as.numeric(gsub(",","",harvest$Updated.Income.from.Agriculture))
harvest$Updated.Income.from.Agriculture

##  [1]  2880  3850  4410  8100   693  2426   656 11025  6738  2200  3150  2205
## [13]   808   844  4950  4252  4219  3600  3600  8085  2520  1012  3038 10080
## [25]  6300  7875  2126  5400  7700  6075   945  8100  3780  6300  2835  5670
## [37]  2126  2700  2880  7700  1540  1575  1312  1575  6300  7200  1980  6738
## [49]  5670   900

class(harvest$Updated.Income.from.Agriculture)

## [1] "numeric"

harvest$Updated.Yield <- as.numeric(gsub(",","",harvest$Updated.Yield))
harvest$Updated.Yield

##  [1] 1600  875  840 1500  630  735  875 1750  875 1000 1050  735  735 1125  750
## [16] 1575 1125 2000 1600 1050  840 1125 1125 1600 1750 1250  945 1000 1400 1350
## [31] 1050 1500 1400 1750 1575 1575  945 1200 1600 1000 1400 1050  875 1050 1400
## [46] 1600  900  875 1575 1000

class(harvest$Updated.Yield)

## [1] "numeric"

harvest$Updated.Total.income <- gsub("\\$", "", harvest$Updated.Total.income)
harvest$Updated.Total.income <- as.numeric(gsub(",","",harvest$Updated.Total.income))
harvest$Updated.Total.income

##  [1]  4723  7007  8335 12717   866  3614   998 18191  8826  2772  5261  2690
## [13]   978   844  8168  6549  5400  4248  6228 15281  3503  1175  5619 10181
## [25]  9324 14963  3168 10098  8778  7594  1588  9477  6766  9828  3033  9752
## [37]  3189  5049  5472 10241  2618  2993  2205  1685 12285 11016  3564 11993
## [49]  5783   990

class(harvest$Updated.Total.income)

## [1] "numeric"

To show gender distribution of respondents

Creating custom color vector

color_gender <- c("M"= "cornflowerblue","F"="hotpink")

Gender distribution using a lollipop plot

ggplot(harvest, aes(x=Gender, fill=Gender))+
        geom_bar(width = 0.008)+
        geom_point(aes(color=Gender),stat="count", size=10) +
        ylim(0,40) +
        geom_text(aes(label=stat(count)), stat="count")+
        labs(title="Respondents' Gender", y="Number of Respondents")+
        scale_x_discrete(label=c("Female", "Male"))+
        scale_color_manual(values = color_gender)+
        theme(legend.position = "none",
              plot.title = element_text(size = 15),
              axis.title = element_text(size=12),
              axis.title.y = element_text(margin = margin(r=10)))

## Warning: `stat(count)` was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(count)` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

Same Gender distribution using a bar plot

ggplot(harvest, aes(x=Gender, fill=Gender)) +
        geom_bar(width = 0.4)+
        labs(title = "Respondents' Gender", y= "Number of Respondents")+
        scale_fill_manual(values=color_gender, guide=NULL)+
        geom_text(aes(label=stat(count)), stat="count", vjust=-0.5) +
        ylim(0,40)+
        scale_x_discrete(labels=c("Female","Male"))+
        theme(plot.title = element_text(size = 15),
              axis.title = element_text(size = 12),
              axis.title.y = element_text(margin = margin(r=10)))

To show age distribution of respondents

Create age group column

harvest_agegroup <- harvest %>% 
        mutate(
                agegroup = case_when(
                        Age <=20 ~ "20 or Under",
                        Age >=21 & Age <=30 ~ "21 to 30",
                        Age >=31 & Age <= 40 ~ "31 to 40",
                        Age >=41 & Age <= 50 ~ "41 to 50",
                        Age > 50 ~ "Over 50",
                        TRUE ~ "Unknown"
                )
        )

To refactor agegroup count

agegroup_counts <- count(harvest_agegroup, agegroup)
agegroup_counts

##      agegroup  n
## 1 20 or Under  3
## 2    21 to 30 12
## 3    31 to 40 16
## 4    41 to 50 12
## 5     Over 50  7

harvest_agegroup$agegroup <- factor(
        harvest_agegroup$agegroup,
        levels = agegroup_counts$agegroup[order(agegroup_counts$n, decreasing = TRUE)]
)

To create a bar chart showing age group distribution

ggplot(harvest_agegroup, aes(x=agegroup, fill=agegroup)) +
        geom_bar(width=0.6)+
        scale_fill_viridis(discrete = T, option = "E", guide=NULL)+
        geom_text(aes(label=stat(count)), stat="count", vjust=-0.5)+
        labs(title="Respondents' age group distribution", y="Number of Respondents", x="Age Group")+
        theme(legend.position = "none",
              plot.title = element_text(size=15),
              axis.title = element_text(size = 12),
              axis.title.y = element_text(margin = margin(r=10)))

To create a boxplot age distribution

ggplot(harvest, aes(x=Gender, y=Age)) +
        geom_boxplot(aes(fill=Gender), alpha=0.5)+
        geom_point(position = position_jitter(width = .2, seed = 1), size=2, alpha=0.4)+
        scale_fill_manual(values = color_gender, guide=NULL)+
        labs(title = "Age Distribution by Gender")+
        scale_x_discrete(labels=c("Female","Male"))+
        theme(plot.title = element_text(size = 15),
              axis.title = element_text(size = 12))

Yield comparision (Baseline vs Current)

Importing baseline data

baseline <- read.csv("baseline.csv")
head(baseline)

##   SN            Name Gender Age  District Latitude Longitude     Date Land.Size
## 1  1 Kartiile Khadar      M  54 Mogadishu  2.08617   45.2959 7/6/2022         2
## 2  2      Xayd Ladan      M  41 Mogadishu  2.08617   45.2959 7/6/2022         4
## 3  3 Sharmooge Diric      M  26 Mogadishu  2.08617   45.2959 7/6/2022         7
## 4  4  Batuulo Xareed      F  43 Mogadishu  2.08617   45.2959 7/6/2022         6
## 5  5   Siraad Garaar      F  52 Mogadishu  2.08617   45.2959 7/6/2022         1
## 6  6     Hidan Bedri      M  38 Mogadishu  2.08617   45.2959 7/6/2022         3
##   Land.ID  Crops Baseline.Yield Baseline.Income.from.Agriculture
## 1    MO-1 Tomato            800                            1,440
## 2    MO-2   Bean            500                            2,200
## 3    MO-3  Maize            800                            4,200
## 4    MO-4 Tomato          1,000                            5,400
## 5    MO-5   Bean            600                              660
## 6    MO-6   Bean            700                            2,310
##   Baseline.Total.income
## 1                 2,362
## 2                 4,004
## 3                 7,938
## 4                 8,478
## 5                   825
## 6                 3,442

Merging baseline and current data with left_join

merged_data <- left_join(harvest, baseline, by="Land.ID", suffix=c("_c", "_b"))

str(merged_data)

## 'data.frame':    50 obs. of  30 variables:
##  $ SN_c                            : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Name_c                          : chr  "Kartiile Khadar" "Xayd Ladan" "Sharmooge Diric" "Batuulo Xareed" ...
##  $ Gender_c                        : chr  "M" "M" "M" "F" ...
##  $ Age_c                           : int  54 41 26 43 52 38 48 19 22 22 ...
##  $ District_c                      : chr  "Mogadishu" "Mogadishu" "Mogadishu" "Mogadishu" ...
##  $ Latitude_c                      : num  2.09 2.09 2.09 2.09 2.09 ...
##  $ Longitude_c                     : num  45.3 45.3 45.3 45.3 45.3 ...
##  $ Date_c                          : Date, format: "2023-07-06" "2023-07-06" ...
##  $ Land.Size_c                     : int  2 4 7 6 1 3 1 7 7 2 ...
##  $ Land.ID                         : chr  "MO-1" "MO-2" "MO-3" "MO-4" ...
##  $ Crops_c                         : chr  "Tomato" "Bean" "Maize" "Tomato" ...
##  $ Updated.Yield                   : num  1600 875 840 1500 630 735 875 1750 875 1000 ...
##  $ Updated.Income.from.Agriculture : num  2880 3850 4410 8100 693 ...
##  $ Updated.Total.income            : num  4723 7007 8335 12717 866 ...
##  $ X..Trained                      : int  2 2 2 1 2 1 1 2 1 1 ...
##  $ Satisfaction                    : int  5 5 1 4 2 2 2 5 2 5 ...
##  $ Perception                      : int  5 5 1 4 1 2 3 4 2 5 ...
##  $ SN_b                            : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Name_b                          : chr  "Kartiile Khadar" "Xayd Ladan" "Sharmooge Diric" "Batuulo Xareed" ...
##  $ Gender_b                        : chr  "M" "M" "M" "F" ...
##  $ Age_b                           : int  54 41 26 43 52 38 48 19 22 22 ...
##  $ District_b                      : chr  "Mogadishu" "Mogadishu" "Mogadishu" "Mogadishu" ...
##  $ Latitude_b                      : num  2.09 2.09 2.09 2.09 2.09 ...
##  $ Longitude_b                     : num  45.3 45.3 45.3 45.3 45.3 ...
##  $ Date_b                          : chr  "7/6/2022" "7/6/2022" "7/6/2022" "7/6/2022" ...
##  $ Land.Size_b                     : int  2 4 7 6 1 3 1 7 7 2 ...
##  $ Crops_b                         : chr  "Tomato" "Bean" "Maize" "Tomato" ...
##  $ Baseline.Yield                  : chr  "800" "500" "800" "1,000" ...
##  $ Baseline.Income.from.Agriculture: chr  "1,440" "2,200" "4,200" "5,400" ...
##  $ Baseline.Total.income           : chr  "2,362" "4,004" "7,938" "8,478" ...

Creating a subset data

merged_yield <- select(merged_data, District_c, Baseline.Yield, Updated.Yield)
head(merged_yield)

##   District_c Baseline.Yield Updated.Yield
## 1  Mogadishu            800          1600
## 2  Mogadishu            500           875
## 3  Mogadishu            800           840
## 4  Mogadishu          1,000          1500
## 5  Mogadishu            600           630
## 6  Mogadishu            700           735

Converting data type from character to numeric

merged_yield$Baseline.Yield <- as.numeric(gsub(",","",merged_yield$Baseline.Yield))
str(merged_yield)

## 'data.frame':    50 obs. of  3 variables:
##  $ District_c    : chr  "Mogadishu" "Mogadishu" "Mogadishu" "Mogadishu" ...
##  $ Baseline.Yield: num  800 500 800 1000 600 700 700 1000 700 500 ...
##  $ Updated.Yield : num  1600 875 840 1500 630 735 875 1750 875 1000 ...

Plotting Baseline vs Current yield with bar plot

ggplot(merged_yield, aes(x=District_c)) + 
        geom_col(aes(y=Baseline.Yield, fill="Baseline.Yield"),width = .4, just = 1.05) +
        geom_col(aes(y=Updated.Yield, fill="Updated.Yield"), width = .4, just = 0) +
        scale_y_continuous(breaks = c(3000, 6000, 9000, 12000, 15000))+
        labs(title="Baseline vs Current Yield", x="District", y = "Yield", fill="Stage")+
        scale_fill_brewer(type="qual", palette=3, labels=c("Baseline", "Current"))+
        theme(plot.title = element_text(size=15),
              axis.title = element_text(size=12),
              axis.title.x = element_text(margin = margin(t=10)))

Baseline vs Current Yield based on crops

Converting data type from character to numeric

merged_data$Baseline.Yield <- as.numeric(gsub(",","", merged_data$Baseline.Yield))

Creating a data subset

merged_data_facet <- merged_data %>% 
        select(District_c, Updated.Yield, Baseline.Yield,  Crops_c)

merged_data_facet <- merged_data_facet %>% 
        rename("District"= "District_c",
                            "Current"="Updated.Yield",        
                            "Baseline"="Baseline.Yield",
                            "Crops" = "Crops_c")

head(merged_data_facet)

##    District Current Baseline  Crops
## 1 Mogadishu    1600      800 Tomato
## 2 Mogadishu     875      500   Bean
## 3 Mogadishu     840      800  Maize
## 4 Mogadishu    1500     1000 Tomato
## 5 Mogadishu     630      600   Bean
## 6 Mogadishu     735      700   Bean

merged_data_facet1 <- melt(merged_data_facet, id.vars = c("District","Crops"))
head(merged_data_facet1)

##    District  Crops variable value
## 1 Mogadishu Tomato  Current  1600
## 2 Mogadishu   Bean  Current   875
## 3 Mogadishu  Maize  Current   840
## 4 Mogadishu Tomato  Current  1500
## 5 Mogadishu   Bean  Current   630
## 6 Mogadishu   Bean  Current   735

Plotting Baseline vs Current yield based on crops

ggplot(merged_data_facet1, aes(x=District, fill=factor(District))) + 
        geom_col(aes(y=value, group=factor(variable)))+
        facet_wrap(vars(Crops, variable), ncol = 2)+
        scale_fill_viridis(discrete = T, option = "E", guide=NULL)+
        labs(title = "Baseline and Current Yield per Crops", x="", y="Yield")+
        theme(aspect.ratio = 0.5,
              axis.text.x = element_text(angle = 90),
              panel.spacing.x = unit(1,"cm"),
              strip.background.x = element_rect(fill = alpha("grey", 0.1), colour = alpha("darkgrey", 0.3)))

Agricultural income comparison (Baseline vs Current)

Creating a subset data

merged_income <- select(merged_data, District_c, 
                        Updated.Income.from.Agriculture, 
                        Baseline.Income.from.Agriculture, 
                        Updated.Total.income,
                        Baseline.Total.income,
                        Crops_c)
head(merged_income)

##   District_c Updated.Income.from.Agriculture Baseline.Income.from.Agriculture
## 1  Mogadishu                            2880                            1,440
## 2  Mogadishu                            3850                            2,200
## 3  Mogadishu                            4410                            4,200
## 4  Mogadishu                            8100                            5,400
## 5  Mogadishu                             693                              660
## 6  Mogadishu                            2426                            2,310
##   Updated.Total.income Baseline.Total.income Crops_c
## 1                 4723                 2,362  Tomato
## 2                 7007                 4,004    Bean
## 3                 8335                 7,938   Maize
## 4                12717                 8,478  Tomato
## 5                  866                   825    Bean
## 6                 3614                 3,442    Bean

merged_income <- merged_income %>% 
        rename("District"="District_c",
               "Current"="Updated.Income.from.Agriculture",
               "Baseline"="Baseline.Income.from.Agriculture",
               "Current_total"= "Updated.Total.income",
               "Baseline_total"= "Baseline.Total.income",
               "Crops" = "Crops_c")

head(merged_income)

##    District Current Baseline Current_total Baseline_total  Crops
## 1 Mogadishu    2880    1,440          4723          2,362 Tomato
## 2 Mogadishu    3850    2,200          7007          4,004   Bean
## 3 Mogadishu    4410    4,200          8335          7,938  Maize
## 4 Mogadishu    8100    5,400         12717          8,478 Tomato
## 5 Mogadishu     693      660           866            825   Bean
## 6 Mogadishu    2426    2,310          3614          3,442   Bean

str(merged_income)

## 'data.frame':    50 obs. of  6 variables:
##  $ District      : chr  "Mogadishu" "Mogadishu" "Mogadishu" "Mogadishu" ...
##  $ Current       : num  2880 3850 4410 8100 693 ...
##  $ Baseline      : chr  "1,440" "2,200" "4,200" "5,400" ...
##  $ Current_total : num  4723 7007 8335 12717 866 ...
##  $ Baseline_total: chr  "2,362" "4,004" "7,938" "8,478" ...
##  $ Crops         : chr  "Tomato" "Bean" "Maize" "Tomato" ...

Converting data type from character to numeric

merged_income$Baseline <- as.numeric(gsub(",", "",merged_income$Baseline))
merged_income$Baseline_total <- as.numeric(gsub(",", "",merged_income$Baseline_total))

Comparing Baseline vs Current agricultural income with bar plot

ggplot(merged_income, aes(x=District)) + 
        geom_col(aes(y=Baseline, fill="Baseline"),width = .4, just = 1.05) +
        geom_col(aes(y=Current, fill="Current"), width = .4, just = 0) +
        scale_y_continuous(n.breaks = 5)+
        labs(title="Baseline vs Current Agricultural Income", x="District", y = "Agricultural Income", fill="Stage")+
        scale_fill_brewer(type="qual", palette=3)+
        theme(plot.title = element_text(size=15),
              axis.title = element_text(size=12),
              axis.title.x = element_text(margin = margin(t=10)),
              axis.title.y = element_text(margin = margin(r=10)))

Baseline vs Current agricultural income based on crops

Creating a subset of data

agriincome_facet <- merged_income %>% 
        select(District,
               Current,
               Baseline,
               Crops)
str(agriincome_facet)

## 'data.frame':    50 obs. of  4 variables:
##  $ District: chr  "Mogadishu" "Mogadishu" "Mogadishu" "Mogadishu" ...
##  $ Current : num  2880 3850 4410 8100 693 ...
##  $ Baseline: num  1440 2200 4200 5400 660 2310 525 6300 5390 1100 ...
##  $ Crops   : chr  "Tomato" "Bean" "Maize" "Tomato" ...

Unpivoting columns to create a single variable for Baseline and Current

agriincome_facet <- melt(agriincome_facet, id.vars = c("District","Crops"))
head(agriincome_facet)

##    District  Crops variable value
## 1 Mogadishu Tomato  Current  2880
## 2 Mogadishu   Bean  Current  3850
## 3 Mogadishu  Maize  Current  4410
## 4 Mogadishu Tomato  Current  8100
## 5 Mogadishu   Bean  Current   693
## 6 Mogadishu   Bean  Current  2426

Plotting Baseline vs Current agricultural income based on crops

ggplot(agriincome_facet, aes(x=District, fill=factor(District))) + 
        geom_col(aes(y=value, group=factor(variable)))+
        facet_wrap(vars(Crops, variable), ncol = 2)+
        scale_fill_viridis(discrete = T, option = "E", guide=NULL)+
        labs(title = "Baseline and Current agricultural income per Crops", x="", y="Agricultural Income")+
        theme(aspect.ratio = 0.5,
              axis.text.x = element_text(angle = 90),
              axis.title.y = element_text(margin = margin(r=10)),
              panel.spacing.x = unit(1,"cm"),
              strip.background.x = element_rect(fill = alpha("grey", 0.1), colour = alpha("darkgrey", 0.3)))

Total income comparison (Baseline vs Current)

Comparing Baseline vs Current total income with bar plot

ggplot(merged_income, aes(x=District)) + 
        geom_col(aes(y=Baseline_total, fill="Baseline"),width = .4, just = 1.05) +
        geom_col(aes(y=Current_total, fill="Current"), width = .4, just = 0) +
        scale_y_continuous(n.breaks = 7)+
        labs(title="Baseline vs Current Total Income", x="District", y = "Total Income", fill="Stage")+
        scale_fill_brewer(type="qual", palette=3)+
        theme(plot.title = element_text(size=15),
              axis.title = element_text(size=12),
              axis.title.x = element_text(margin = margin(t=10)),
              axis.title.y = element_text(margin = margin(r=10)))

Baseline vs Current total income based on crops

Creating a subset of data

str(merged_income)

## 'data.frame':    50 obs. of  6 variables:
##  $ District      : chr  "Mogadishu" "Mogadishu" "Mogadishu" "Mogadishu" ...
##  $ Current       : num  2880 3850 4410 8100 693 ...
##  $ Baseline      : num  1440 2200 4200 5400 660 2310 525 6300 5390 1100 ...
##  $ Current_total : num  4723 7007 8335 12717 866 ...
##  $ Baseline_total: num  2362 4004 7938 8478 825 ...
##  $ Crops         : chr  "Tomato" "Bean" "Maize" "Tomato" ...

totalincome_facet <- merged_income %>% 
        select(District,
               Current_total,
               Baseline_total,
               Crops)

str(totalincome_facet)

## 'data.frame':    50 obs. of  4 variables:
##  $ District      : chr  "Mogadishu" "Mogadishu" "Mogadishu" "Mogadishu" ...
##  $ Current_total : num  4723 7007 8335 12717 866 ...
##  $ Baseline_total: num  2362 4004 7938 8478 825 ...
##  $ Crops         : chr  "Tomato" "Bean" "Maize" "Tomato" ...

Unpivoting columns to create a single variable for Baseline and Current

totalincome_facet <- melt(totalincome_facet, id.vars = c("District","Crops"))
head(totalincome_facet)

##    District  Crops      variable value
## 1 Mogadishu Tomato Current_total  4723
## 2 Mogadishu   Bean Current_total  7007
## 3 Mogadishu  Maize Current_total  8335
## 4 Mogadishu Tomato Current_total 12717
## 5 Mogadishu   Bean Current_total   866
## 6 Mogadishu   Bean Current_total  3614

Plotting Baseline vs Current total income based on crops

ggplot(totalincome_facet, aes(x=District, fill=factor(District))) + 
        geom_col(aes(y=value, group=factor(variable)))+
        facet_wrap(vars(Crops, variable), ncol = 2)+
        scale_fill_viridis(discrete = T, option = "E", guide=NULL)+
        labs(title = "Baseline and Current total income per Crops", x="", y="Total Income")+
        theme(aspect.ratio = 0.5,
              axis.text.x = element_text(angle = 90),
              axis.title.y = element_text(margin = margin(r=10)),
              panel.spacing.x = unit(1,"cm"),
              strip.background.x = element_rect(fill = alpha("grey", 0.1), colour = alpha("darkgrey", 0.3)))

Income relation (Agricultural and Total)

ggplot(merged_income, aes(x=Current, y=Current_total)) +
        geom_smooth()+
        labs(title = "Relation Between Current Agricultural and Total Income",
             y="Total Income",
             x="Agricultural Income",
             caption = "Level of confidence interval = 0.95")+
        theme(plot.title = element_text(size=15),
              axis.title = element_text(size = 12),
              axis.title.x = element_text(margin = margin(t=10)),
              axis.title.y = element_text(margin = margin(r=10)))

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

ggplot(merged_income, aes(x=Current, y=Current_total, group=Crops)) +
        geom_smooth()+
        labs(title = "Relation Between Current Agricultural and Total Income by Crops",
             y="Total Income",
             x="Agricultural Income",
             caption = "Level of confidence interval = 0.95")+
        facet_wrap(vars(Crops))+
        theme(plot.title = element_text(size=15),
              axis.title = element_text(size = 12),
              axis.title.x = element_text(margin = margin(t=10)),
              axis.title.y = element_text(margin = margin(r=10)))

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Water Harvesting Project

Rohan

Data Analysis and Visualization using R Programming

Libraries

Setting theme

Importing csv file harvest

Changing data types

Converting Date column to data type Date

Converting character columns to numeric

To show gender distribution of respondents

Creating custom color vector

Gender distribution using a lollipop plot

Same Gender distribution using a bar plot

To show age distribution of respondents

Create age group column

To refactor agegroup count

To create a bar chart showing age group distribution

To create a boxplot age distribution

Yield comparision (Baseline vs Current)

Importing baseline data

Merging baseline and current data with left_join

Creating a subset data

Converting data type from character to numeric

Plotting Baseline vs Current yield with bar plot

Baseline vs Current Yield based on crops

Converting data type from character to numeric

Creating a data subset

Plotting Baseline vs Current yield based on crops

Agricultural income comparison (Baseline vs Current)

Creating a subset data

Converting data type from character to numeric

Comparing Baseline vs Current agricultural income with bar plot

Baseline vs Current agricultural income based on crops

Creating a subset of data

Unpivoting columns to create a single variable for Baseline and Current

Plotting Baseline vs Current agricultural income based on crops

Total income comparison (Baseline vs Current)

Comparing Baseline vs Current total income with bar plot

Baseline vs Current total income based on crops

Creating a subset of data

Unpivoting columns to create a single variable for Baseline and Current

Plotting Baseline vs Current total income based on crops

Income relation (Agricultural and Total)