Hello, my name is Yani Bellini Saibene, I made this work with Sandro Camargo, who send cheers from Brasil. Welcome to “TELL ME WHO YOU HANG OUT WITH, AND I WILL TELL YOU WHO YOU ARE”. In Spanish: “Dime con quien andas y te diré quien eres”. This is a well-known phrase in Argentina. It refers to the groups of people we get together with to do things: the swim team, the book club, the knitters, our college friends or the R-Ladies chapter. In English, you can think on the phrase “Bird of feathers, come together”. In this talk we are going to refer to a special kind of group called Community of Practice.

Communities of Practice

groups of people who share a passion for something that they know how to do, and who interact regularly in order to learn how to do it better – Etienne Wenger

Why I’m talking about this? because In June last year, I become community Manager of rOpenSci, a community of practice

rOpenSci

we are a group of people who passion is open and reproducible research to everyone, buid it by everyone. And we know how to do it by creating technical and social infrastructure.

  • Creating a suite of carefully vetted, federated R software tools.
  • Making the right data, tools and best practices more discoverable.
  • Welcoming and diverse community.
  • Building capacity of software users and developers and fostering a sense of pride in their work.
  • Promoting advocacy for a culture of data sharing and reusable software.

https://ropensci.org

We do this creating a suite of carefully vetted, federated R software tools. Developers of research software send their package to our review process, and after they pass, the package is part of our suite. Developers get support on from our staff and from the community. Users get high quality software to do science.

Making the right data, tools and best practices more discoverable. Our R-Universe platform allows to publish and search more than 18000 R packages.

Welcoming and diverse community through a code of conduct, our Champions Program and our multilingual publishing efforts.

Building capacity of software users and developers and fostering a sense of pride in their work with the projets I already mention, but also by publishing online books, organizing community call and co-working session, higlighing developer with interviews and creating contect on our web pages, forum and newsletter.

Community Manager

Facilitates the activities of a community and the interactions between community members. Community management may be considered as “in-reach” rather than “outreach” or public engagement. - CSCCE

Learn more: What is Community Engagement Within Science?, What does a scientific community manager do?

A community manager in the context of a community of practice is a person that facilitates the activities of a community and the interaction with their members. Have responsabilities in task in technical, interpersonal, communication, program management and program development aspect.

Let’s analyze rOpenSci community

Why analyze our community?

essentially because by knowing your community you can do a better job in the role of community manager. rOpenSci records a lot of data and generates statistics and summaries, for example, how many packages we reviewed, how many blog posts we wrote, how many community calls we organized and how many people came. This are very useful and show us an overal idea of our community and our activities.

Communities are built on connections.

We need to know our community connectivity to plan targeted and effective interventions to:

  • improve collaborations.
  • improve information flow.
  • improve knowledge reuse.
  • effective knowledge (co)creation.
  • effective knowledge transfer.

Now, communities are built on connections, and those summaries and number don’t give us many information about the growth and strength of professional interpersonal connections in our community.

At a given moment in time

  • Who is connected to whom? Who is not connected?
  • Where, and who, are the hubs?
  • Where and about what are the clusters? Are there silos?

Changes over time

  • Are new connection forming?
  • Are new patterns of connectivity forming?
  • How was our network before and after the introduction of an activity?

We will try to answer questions like:

How we can analyze our community connectivity?

Social Networks Analysis

Group of individuals who relate to others for a specific purpose, characterized by the existence of information flows.

Here is where Social Networks Analysis comes into play. I’m not talking about twitter or instagram here, I’m talking about networks build by individuals u organization that have some kind of relationship.

Social Networks Analysis - Basic elements

Social Networks Analysis - Basic elements

Social Networks Analysis - Basic elements

Social Networks Analysis - Basic elements

You can map the nodes and edges to explore the connections and patterns that exist and make conclusions based off of that exploration, for example, here we have map network, with the people as nodes and the edges as collaborations, for example, write a blog post together, being co-authors.

Social Networks Analysis - Basic elements

The degree of a node is how many connection have, for example this node has 6 connection, so the degree is 6. This other have 5 connections so the degree is 5. higer degree, more connected is the node.

Social Networks Analysis - Basic elements

The multiplexity show the number of connection between two nodes, for example you co-author more than one blog post.

Social Networks Analysis - Basic elements

Betweenness centrality measures the number of times a node lies on the shortest path between other nodes. What it tells us: This measure shows which nodes are ‘bridges’ between nodes in a network.

Social Networks Analysis - Basic elements

Scores each node based on how close it is to all other nodes in the network. It is useful for finding the individuals who are best placed to influence the entire network most quickly.

Social Networks Analysis - Basic elements

Clusters or communities are groups that work together, their nodes have high number of connection between them. A clique cluster have all thier memebers interconected and a silo don’t have connection with other clusters on the network.

How we can collect the data?

For this type of analysis we need data that reveal some kind of connection between the actors in a network.

The most common data collection methods used in social network analysis are surveys and interviews collect from members in the network. As you can imagine, this can be costly in time and money.

The data also could come from existing data, like data on social media connections, and it can come from your own knowledge of the relationships that exist in the network.

So we thought, is it possible that we already have that data in another format and we can accommodate it to analyze the connectivity of our community? is it possible that we could collect that data in an automated or semi-automated way to repeat the analysis?

Path to contribute at rOpenSci

Write a blog post

Review a package

Maintain a package

Speak at a Comm Call

Become a champion

Host a coworking session

Learn more: rOpenSci Community Contributing Guide and How to Participate with rOpenSci

Fortunately at rOpenSci we have a contribution guide, there is a whole book describing the different ways you can contribute to the community. For example, you can, Write a blog post, Review a package, Maintain a package, Speak at a Community Call, Become a champion, Host a coworking session.

Many of these forms of contribution can be made with other people and there we have our nodes and a connection between them.

Contributions in a network

N:author. E:coauthorship

N:author, editor, reviewer. E:peer-review

N:developer. E:codevelopment.

N:speaker. E:cospeaking, coorganization

N:mentor, mentee. E:mentorship

N:participants. E: coorganization, coattendence

Data for the network

Webpage

GitHub, database

GitHub,r-universe

Webpage

Webpage, database

Webpage

Let’s see an example with the Blog

Let’s see an example with the Blog

file_list <- fs::dir_ls(path = "content/blog/", 
                        recurse = TRUE, 
                        type = "file", 
                        glob = "*.md") 

datos <- tibble(fecha = character(), 
                titulo = character(),
                autor = character(), 
                year = character(), 
                contribution_type = character())
                
for (documento in file_list){ 
  doc <- rmarkdown::yaml_front_matter(input = file.path(documento)) 
  datos <- tibble::add_row(datos, 
                           fecha = doc$date, 
                           titulo = doc$title, 
                           autor = doc$author, 
                           year = as.character(year(date(doc$date))), 
                           contribution_type = 'blog post' 
                           )  
}

write_csv(datos, "blog_post_authors_2023.csv") # ;-)                file_list <- fs::dir_ls(path = "content/blog/", 
                        recurse = TRUE, 
                        type = "file", 
                        glob = "*.md") 

datos <- tibble(fecha = character(), 
                titulo = character(),
                autor = character(), 
                year = character(), 
                contribution_type = character())
                
for (documento in file_list){ 
  doc <- rmarkdown::yaml_front_matter(input = file.path(documento)) 
  datos <- tibble::add_row(datos, 
                           fecha = doc$date, 
                           titulo = doc$title, 
                           autor = doc$author, 
                           year = as.character(year(date(doc$date))), 
                           contribution_type = 'blog post' 
                           )  
}

write_csv(datos, "blog_post_authors_2023.csv") # ;-)                file_list <- fs::dir_ls(path = "content/blog/", 
                        recurse = TRUE, 
                        type = "file", 
                        glob = "*.md") 

datos <- tibble(fecha = character(), 
                titulo = character(),
                autor = character(), 
                year = character(), 
                contribution_type = character())
                
for (documento in file_list){ 
  doc <- rmarkdown::yaml_front_matter(input = file.path(documento)) 
  datos <- tibble::add_row(datos, 
                           fecha = doc$date, 
                           titulo = doc$title, 
                           autor = doc$author, 
                           year = as.character(year(date(doc$date))), 
                           contribution_type = 'blog post' 
                           )  
}

write_csv(datos, "blog_post_authors_2023.csv") # ;-)                file_list <- fs::dir_ls(path = "content/blog/", 
                        recurse = TRUE, 
                        type = "file", 
                        glob = "*.md") 

datos <- tibble(fecha = character(), 
                titulo = character(),
                autor = character(), 
                year = character(), 
                contribution_type = character())
                
for (documento in file_list){ 
  doc <- rmarkdown::yaml_front_matter(input = file.path(documento)) 
  datos <- tibble::add_row(datos, 
                           fecha = doc$date, 
                           titulo = doc$title, 
                           autor = doc$author, 
                           year = as.character(year(date(doc$date))), 
                           contribution_type = 'blog post' 
                           )  
}

write_csv(datos, "blog_post_authors_2023.csv") # ;-)                file_list <- fs::dir_ls(path = "content/blog/", 
                        recurse = TRUE, 
                        type = "file", 
                        glob = "*.md") 

datos <- tibble(fecha = character(), 
                titulo = character(),
                autor = character(), 
                year = character(), 
                contribution_type = character())
                
for (documento in file_list){ 
  doc <- rmarkdown::yaml_front_matter(input = file.path(documento)) 
  datos <- tibble::add_row(datos, 
                           fecha = doc$date, 
                           titulo = doc$title, 
                           autor = doc$author, 
                           year = as.character(year(date(doc$date))), 
                           contribution_type = 'blog post' 
                           )  
}

write_csv(datos, "blog_post_authors_2023.csv") # ;-)                file_list <- fs::dir_ls(path = "content/blog/", 
                        recurse = TRUE, 
                        type = "file", 
                        glob = "*.md") 

datos <- tibble(fecha = character(), 
                titulo = character(),
                autor = character(), 
                year = character(), 
                contribution_type = character())
                
for (documento in file_list){ 
  doc <- rmarkdown::yaml_front_matter(input = file.path(documento)) 
  datos <- tibble::add_row(datos, 
                           fecha = doc$date, 
                           titulo = doc$title, 
                           autor = doc$author, 
                           year = as.character(year(date(doc$date))), 
                           contribution_type = 'blog post' 
                           )  
}

write_csv(datos, "blog_post_authors_2023.csv") # ;-)                file_list <- fs::dir_ls(path = "content/blog/", 
                        recurse = TRUE, 
                        type = "file", 
                        glob = "*.md") 

datos <- tibble(fecha = character(), 
                titulo = character(),
                autor = character(), 
                year = character(), 
                contribution_type = character())
                
for (documento in file_list){ 
  doc <- rmarkdown::yaml_front_matter(input = file.path(documento)) 
  datos <- tibble::add_row(datos, 
                           fecha = doc$date, 
                           titulo = doc$title, 
                           autor = doc$author, 
                           year = as.character(year(date(doc$date))), 
                           contribution_type = 'blog post' 
                           )  
}

write_csv(datos, "blog_post_authors_2023.csv") # ;-)                
  1. Read all the files in the content/blog/ folder with the .md extension
  2. Create a tibble with the variables to store: date, title, author, year and contribution_type.
  3. For each markdown document
  4. Read the YAML header, extract the value of each variable
  5. and add a row in the dataset with the information
  6. After process all the documents, we save the dataset to a CSV file ;-)

Let’s see an example with the Blog

results <- datos |> 
  group_by(titulo, year) |>
  filter(n() > 1) |> 
  summarise(as.data.frame(t(combn(autor, 2)))) |>
  select(titulo, year, from=V1, to=V2)results <- datos |> 
  group_by(titulo, year) |>
  filter(n() > 1) |> 
  summarise(as.data.frame(t(combn(autor, 2)))) |>
  select(titulo, year, from=V1, to=V2)results <- datos |> 
  group_by(titulo, year) |>
  filter(n() > 1) |> 
  summarise(as.data.frame(t(combn(autor, 2)))) |>
  select(titulo, year, from=V1, to=V2)results <- datos |> 
  group_by(titulo, year) |>
  filter(n() > 1) |> 
  summarise(as.data.frame(t(combn(autor, 2)))) |>
  select(titulo, year, from=V1, to=V2)results <- datos |> 
  group_by(titulo, year) |>
  filter(n() > 1) |> 
  summarise(as.data.frame(t(combn(autor, 2)))) |>
  select(titulo, year, from=V1, to=V2)results <- datos |> 
  group_by(titulo, year) |>
  filter(n() > 1) |> 
  summarise(as.data.frame(t(combn(autor, 2)))) |>
  select(titulo, year, from=V1, to=V2)

the next step is to transform the list of author of each blog post in a table with network format.

This code take the list we create in the previws step. Group by tile and year and keep all the blog post that have two authors or more. Then, for each group, the combn function create a matrix with two rows and columns representing all the unique combination of two authors. We transpose this data to get two columns that become from and to, epresenting the nodes.

Blog-Post full network 2013-2023

We can analyze it annually

We can also have a network for each year and see how the model changes over time. Now we also add the name of the author to the node.

All contribution together

Champions first co-hort

R-universe Stars Interview Team

Tell me with you hang out and i will tell you who you are ;-)

We can also identify network members characteristics, like Maëlle Salmon have the max of contributions as active member and Noam Ross have the higest degree and the higest centrality. It is the most connected member. Laura DeCicco have the higest degree and it is not a staff member and Kara Woo have the higest centrality. I have the max of contribution in other language than English, and Ale Bellini y Lucio Casalla have max contribution in other language than English, and aren’t staff members.

What if you wanted to do the same for your community?

My tips

  • Define the nodes in your network (people, countries, organizations, …)
  • Define the type(s) of connection you have in your network.
    • Start with your paths for contributions.
    • Identify which contributions can be done in teams.
  • Probably you are alredy registering information about those type of connection.

My tips

  • You can automatize a portion of the data collection.
    • Formalize the workflow (code ;-)) so you can repeat & reproduce.
  • It is hard to capture all type of interactions.
    • Take into account open/close/privacy of the data.
  • Knowing the nodes help to undertand the clusters and the interactions.
    • Lean on the people who have been in the network for the longest time.

My tips

  • You can take snapshot of the network model …
  • … so you can compare it at different times.
  • … so you can use it for evaluating the impact of interventions and programs.
  • Share what you find with your community
  • … and other community managers.

¡Gracias, Thank you, Obrigada!

  • Slides: https://bit.ly/csvconf2023
  • GitHub: https://github.com/yabellini/CSVConfv7
  • The pictures are adaptation by my 7yo son and me to images by Freepik on hand drawn style stickman set
  • We use R, gephy, excalidraw and quarto for build this talk.
  • Thanks to the rOpenSci Staff Team, Elio, Ale and my English Conversation Club for their feedback.

This talk is at https://bit.ly/csvconf2023

1 / 34

  1. Slides

  2. Tools

  3. Close
  • Hello, my name is...
  • Communities of Practice
  • rOpenSci
  • Community Manager
  • Let’s analyze rOpenSci community
  • Why analyze our community?
  • Communities are built...
  • At a given moment...
  • How we can analyze our community connectivity?
  • Social Networks Analysis
  • Social Networks Analysis - Basic elements
  • Social Networks Analysis - Basic elements
  • Social Networks Analysis - Basic elements
  • Social Networks Analysis - Basic elements
  • Social Networks Analysis - Basic elements
  • Social Networks Analysis - Basic elements
  • Social Networks Analysis - Basic elements
  • Social Networks Analysis - Basic elements
  • Social Networks Analysis - Basic elements
  • How we can collect the data?
  • Path to contribute at rOpenSci
  • Contributions in a network
  • Data for the network
  • Let’s see an example with the Blog
  • Let’s see an example with the Blog
  • Let’s see an example with the Blog
  • Blog-Post full network 2013-2023
  • We can analyze it annually
  • All contribution together
  • What if you wanted to do the same for your community?
  • My tips
  • My tips
  • My tips
  • ¡Gracias, Thank you, Obrigada!
  • f Fullscreen
  • s Speaker View
  • o Slide Overview
  • e PDF Export Mode
  • b Toggle Chalkboard
  • c Toggle Notes Canvas
  • d Download Drawings
  • ? Keyboard Help