Cyber Security Text Analytics with Machine Learning

Xiaye Xu

by Xiaye Xu 02/27/2018 11:16 PM GMT


I accept the terms and conditions (see side bar). I understand all content I am submitting must be licensed under an open-source software or Creative Commons license as described in the Terms and Conditions:

on

Description

We hope to undertake the cyber security project and improve the accuracy of categorizations and sub-categorizations of the statements contained in the policy documents. By implementing a carefully crafted machine learning algorithm, and by learning from the work of others who previously participated in this project, we hope to execute the following steps using Python and scikit-learn:

  • Data collection (training and testing)
  • Transfer pdf to processable text file
  • Text file processing (nltk tokenize, remove stop words and etc)
  • Combine sentence and corresponding category and store as processible data structure (training set)
  • Model training process with machine learning models 
  • Generate output
  • Adjustment if needed

Please do not hesitate to leave us any comment. 

Co-authors to your solution

Xin Li, Minghan Wang, Yunzhou Jiang

Link to your concept design and documentation (Required by the final day of the Submission & Collaboration phase)

Link to an online working solution or prototype (Required by the final day of the Submission & Collaboration phase):

Link to a video or screencast of your solution or prototype (Required by the final day of the Submission & Collaboration phase):

Link to source code of your solution or prototype above. (If you submitted a link to an online solution or prototype, or to a video of your solution of prototype, you must provide a link to the source code. This item is required by the final day of the submission phase):


Comments