A Python pipeline to generate responses for a dataset of 250 datapoints of type gender, age, ethnicity
using GPT3 (text-davinci-003
), map them to a 768 dimensional dense vector space using the T5 XXL sentence transformer, use PCA and UMAP dimensionality-reduction methods to reduce the dimensionality of the data set, and then provide visualizations using Plotly and sentiment analysis using TextBlob.
- Add OpenAI key to
keys.py
- Run
pip install matplotlib seaborn umap-learn sentence_transformers openai plotly textblob
- Run
python3 gen.py
, changing thePROMPT
global variable if you want to change the dataset infake_data/fake_people.csv
- Change the
STORY_START
andSTORY_END
global variables instories.py
to account for what answers you want GPT3 to generate
- Run
python3 stories.py
- Run
python3 strans.py
(CAUTION: This will likely use significant GPU resources while the sentence transformer is running) - Run
python3 vis.py
to generate Plotly and sentence sentiment analysis graphs