A project aimed at taking in some input files as command line arguments and parsing them. The goal is to extract information related to cybersecurity policies based on some keyword. Currently, writing the parser for .docx files -> as it is relatively simple to extract the .xml file associated with the text contents and search through it for each file.
Be able to handle .pdf files as well as .docx files in the same batch of arguments. Think of ideas on how the output should be formatted (i.e. should we print the whole sentence? How can we quickly access the location of the word?) Automating the process of extracting some information and putting it into a word or excel document?