The coorsim R package is designed to detect and analyze coordinated social media manipulation (CSMM), allowing researchers to identify suspiciously similar patterns of social media behavior that may indicate coordinated efforts to spread content. By leveraging embeddings, coorsim detects similarities in posts that share themes and semantics, even if they use diverse vocabulary or languages. This approach is particularly relevant for identifying manipulation with AI-generated content.
You can install the development version of coorsim from GitHub with:
# install.packages("devtools")
devtools::install_github("thieled/coorsim")
Below is an example of how to use coorsim to detect and analyze coordinated behavior in a set of Twitter data.
Prepare data with tweets containing posts and users with user metadata. Ensure a matrix of post embeddings is also available.
posts <- readRDS("/path/to/file")
users <- readRDS("/path/to/user_file")
post_embedding_matrix <- readRDS("/path/to/embedding_file")
Run co-similarity detection on posts within a 60-second timeframe and a cosine similarity threshold of 0.95.
sim_dt <- coorsim::detect_cosimilarity(
data = posts,
vector_matrix = post_embedding_matrix,
time_window = 60,
min_simil = 0.95,
min_participation = 3,
post_id = "tweet_id",
account_id = "user_id",
time = "created_at",
content = "text",
verbose = TRUE
)
Identify communities of accounts using the FSA_V method to reveal groups with coordinated posting behavior.
comm_dt <- coorsim::coorsim_detect_groups(
simdt = sim_dt,
user_data = users,
cluster_method = "FSA_V",
account_id = "user_id",
theta = 0.7,
verbose = TRUE
)
Sample post content and metadata to generate concise community labels.
comm_dt <- coorsim::prepare_community_texts(
groups_data = comm_dt,
sample_n = 5,
min_n_char = 10,
verbose = TRUE
)
Use a language model to generate labels for each identified community
instruction <- "Generate a concise label in English and a one-sentence description that summarizes the themes, tone, and regional focus of this community of Twitter users. The account names, locations, short bios, and sampled posts are provided below. Use '[LABEL:]' for the label and '[DESCRIPTION:]' for the description. Provide no additional output."
label_res <- coorsim::label_communities(
groups_data = comm_dt,
instruction = instruction,
llm = "llama3.1:8b",
retries = 3
)
p1 <- coorsim::plot_communities(network_data = label_res, component_size_threshold = 3)
p2 <- coorsim::plot_coordinated_posts(network_data = label_res, by_community = TRUE)