Project - ChatGPT — Yuting Huang

Social Media Content Analysis

Understand how the industry of use and objective of use impacts how individuals feel about ChatGPT.

Overview

With this in mind, this project focuses on how the industry of use and objective of use impact how an individual perceives ChatGPT, as evidenced by the content of public-facing Twitter posts.

While some have welcomed the growing use of Artificial Intelligence, others have expressed concern about whether increased access to technologies such as ChatGPT will lead to further automation of jobs across multiple industries.

Research Questions & Hypothesis

Research Questions

RQ1: How does the purpose behind the use of ChatGPT influence attitude toward the chatbot?

H1: There is no significant difference in attitude toward ChatGPT between users who utilize it for information and entertainment purposes and those who use it for professional or educational reasons.

RQ2: When ChatGPT is used to produce industry-specific content, how does the industry specified influence the user's attitude?

H2: Participants who use ChatGPT for STEM purposes will have a more positive attitude toward ChatGPT than those who use it for humanities, art, or creative purposes.

A total of 10,300 tweets were collected using pre-identified key words and through the Brandwatch platform.

The data was cleaned to remove titles, replace "n/a" categories with "unknown," and include "real ID" and "Sample ID" categories.

To address the research questions, the supervised machine learning model was employed to label the sentiment of each tweets.

Method

# Remove emojis
sample_text <- gsub("[^\x20-\x7e]"," ", sample_text)

# Remove http links and other special characters
sample_text <- gsub("(@|http)[^[:blank:]]*|[[:punct:]]|[[:digit:]]"," ", sample_text)
sample_text <- gsub("\\s+", " ", sample_text)

# Define stopwords
myStopwords = c(stopwords('english'),'description','null','text','description','url','text','href','rel','nofollow','false','true','rt')  

# Create a document-term matrix for later analysis 
text_corpus <- VCorpus(VectorSource(sample_text))

textDTM <- DocumentTermMatrix(text_corpus, list(tolower=TRUE, removePunctuation = TRUE, removeNumbers = TRUE, stopwords=myStopwords, stemming=TRUE))

Data Cleaning

Remove English stop words, punctuation, capital letters, and numbers, as well as words that repeated throughout the set that were not significant to the analysis.

# Create two document-term matrices -- one for training and the other for testing. 
trainingDTM <- textDTM[train_index, ] 
testingDTM <- textDTM[-train_index, ]

# Convert the training and testing DTM into matrices.
training <- as.matrix(trainingDTM) 
testing <- as.matrix(testingDTM)

train <- as.data.frame(training)
test <- as.data.frame(testing)

# Convert the data in the label column to factors for later analysis.
train$V3 <- as.factor(train$V3) 

Text Mining

Create two document-term matrices (DTM), demonstrating the frequency of words that occur in the content.

Convert the DTMs to data frames for further analysis.

# Randmly split the samples into training & testing part
set.seed(235)

train_index <- createDataPartition(df$V3, p = .7, list = FALSE)

# Training
set.seed(235)

svm_model <- train(V3 ~ ., data = train, method = "svmLinear3")

test_pred <- predict(svm_model, newdata = test) 

Supervised Machine Learning

The data was randomly split into training & testing part by .7, and trined through the Linear SVM classifier from caret package.

df_testing <- df[-train_index, ] 

# Extract the real lable of V3 for comparision.
test_truth <- df_testing$V3 

table(test_pred,test_truth) 

test_pred<-factor(test_pred) 
test_truth<-factor(test_truth)

confusionMatrix(test_pred, test_truth)

Check the accuracy of Prediction

Generate confusion matrix , check the True Positive Rate (TPR) and balanced accuracy, making sure that F-1 score of predicted outcome is beyond .7.

Result

NOT SUPPORTED

H1: Those who use ChatGPT for information and entertainment purposes will have a more positive attitude toward ChatGPT than those who use it for professional or educational reasons.

SUPPORTED

H2: Participants who use ChatGPT for STEM purposes will have a more positive attitude toward ChatGPT than those who use it for humanities, art, or creative purposes.

people who use ChatGPT for STEM purposes will have a more positive attitude toward ChatGPT than those who use it for creative purposes.

Since the knowledge that Chat GPT was built on originates in the latest technology and computational advancements, those in STEM fields are more likely to view the chatbot similarly as they would a calculator. To them, it is an instrument that helps in processing and interpreting information faster. Those who engage in creative pursuits have been quick to highlight the potential negative ramifications and threats of widespread use of AI, with some artists taking to Instagram in December of 2022 to demand that AI be prevented from generating images based on their work (Babbs, 2023).

This pushback from creatives highlights the need for increased transparency from AI engineers on how such technology is trained and what parameters they have or will implement to ensure creative agency. Restructuring our first hypothesis with this in mind could lead to more exciting and accurate insights into how the industry of use impacts feelings toward ChatGPT.

Want more details?

Explore the full versions of academic paper, original code for the supervised machine learning model, and the results in slides for a comprehensive view.

Delving into the details is always a pleasure.

Download Slides

View Code

View Academic Paper