Skip to content

Commit

Permalink
Cut null heds
Browse files Browse the repository at this point in the history
  • Loading branch information
palewire committed Jul 24, 2024
1 parent baa629c commit e02d831
Showing 1 changed file with 4 additions and 0 deletions.
4 changes: 4 additions & 0 deletions newshomepages/analyze/drudge.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,10 @@ def drudge_entities(output_dir: str = "./"):
print("Filtering down to stories")
story_df = drudge_df[drudge_df.is_story].copy()

# Remove any links without text
print("Removing any links without text")
story_df = story_df[~pd.isnull(story_df.text)].copy()

# Cut `...`
print("Sanitizing text")
story_df.text = story_df.text.str.replace(r"\.{2,}", "", regex=True)
Expand Down

0 comments on commit e02d831

Please sign in to comment.