Skip to content

Commit

Permalink
Merge pull request #7 from jsingh811/report
Browse files Browse the repository at this point in the history
Report
  • Loading branch information
jsingh811 authored Jun 30, 2021
2 parents 9e3d9db + ae037c8 commit 87cbea4
Show file tree
Hide file tree
Showing 13 changed files with 457 additions and 17 deletions.
54 changes: 38 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# pyYouTubeAnalysis
Interaction with the YouTube API to pull data and run analysis using statistics and Natural Language Processing (NLP). Contains NLP implementations of text cleaning specific to social media data noise, key-phrase extraction using NLTK and Named-entity Recognition (NER) on a list of strings.
Interaction with the YouTube API to pull data and run analysis using statistics and Natural Language Processing (NLP). Contains NLP implementations of text cleaning specific to social media data noise, key-phrase extraction using NLTK and Named-entity Recognition (NER) on a list of strings. Contains automatic plots, wordclouds, and analysis report pdf generation.

# Setup
Clone the project and get it setup
Expand All @@ -21,6 +21,8 @@ To see Key-phrase extraction examples, see the section [Extracting Keyphrases fr

To see data cleaning examples for removing emojis and URLs from text, see the section [Removing Emojis and URLs from Text](https://github.com/jsingh811/pyYouTubeAnalysis#removing-emojis-and-urls-from-text).

To see report generation with statistical and NLP analysis, see the section [Report Generation](https://github.com/jsingh811/pyYouTubeAnalysis#report-generation).


# YouTube Data Fetching

Expand Down Expand Up @@ -162,27 +164,47 @@ url_removed = cleaner.remove_urls(document)
```

# Citation
# Report Generation

Please cite this software as below
This functionality allows the user to crawl YouTube and gather stats related plots, wordclouds and location analysis in one pdf. The files generated as a part of this can be found in [this folder](https://github.com/jsingh811/pyYouTubeAnalysis/blob/master/samples/report).

## APA
## Command Line Usage

```
Singh, J. (2021). jsingh811/pyYouTubeAnalysis: pyYouTubeAnalysis: YouTube data requests and NER on text (v1.0) [Computer software]. Zenodo. https://doi.org/10.5281/ZENODO.4915746
cd pyYouTubeAnalysis
```

## BibTex
```
python report.py -path "/Users/abc/Documents" -k "travel vlog" -sd "2020-01-01T00:00:00Z" -ed "2021-03-31T00:00:00Z" -analysis "monthly,yearly" -t "<YouTube API key (39 chars long)>"```
```

## Import and Use

```
@misc{https://doi.org/10.5281/zenodo.4915746,
doi = {10.5281/ZENODO.4915746},
url = {https://zenodo.org/record/4915746},
author = {Singh, Jyotika},
keywords = {YouTube, NER, NLP},
title = {jsingh811/pyYouTubeAnalysis: pyYouTubeAnalysis: YouTube data requests and NER on text},
publisher = {Zenodo},
year = {2021},
copyright = {Open Access}
}
from pyYouTubeAnalysis.report import ReportGenerator
from pyYouTubeAnalysis import run_crawl, crawler
keyword = "travel vlog"
start_date = "2020-01-01T00:00:00Z"
end_date = "2021-03-31T00:00:00Z"
analysis_type = ["yearly", "monthly"]
api_token = "<YouTube API key (39 chars long)>"
path = "/Users/abc/Documents"
rgen = ReportGenerator(path, keyword, start_date, end_date, analysis_type)
api = crawler.YouTubeCrawler(key=api_token)
# Fetch data from the api
[videos, comments] = run_crawl.get_videos_and_comments(
api, keyword=keyword, start_date=start_date, end_date=end_date, comment_limit=10
)
print("\nFetched data\n")
rgen.get_and_plot_stats(videos)
rgen.plot_trending_tags(videos)
rgen.plot_comment_locations(comments)
print("\nFetched plots\n")
output_path = rgen.export_to_pdf()
print("\nGenerated pdf here {}\n".format(output_path))
```

9 changes: 9 additions & 0 deletions pyYouTubeAnalysis/extract_locations.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,15 @@ def read_comment_text(filepath):
"""
with open(filepath, "r") as f:
data = json.load(f)

comment_text = get_comments_list(data)

return comment_text

def get_comments_list(data):
"""
Convert comments from YT response to a list of comments
"""
comment_text = []
for video_id in data:
if data[video_id]:
Expand Down
Loading

0 comments on commit 87cbea4

Please sign in to comment.