TY - GEN
T1 - Synthesis of social media profiles using a probabilistic context-free grammar
AU - Ade-Ibijola, Abejide
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/7/1
Y1 - 2017/7/1
N2 - One helpful resource to have when presenting delicate/sensitive information is hypothetical data, or placeholders that conceal the identity of concerned parties. This is crucial in environments such as medicine and criminology as volunteers in medical research, patients of dreaded diseases, and convicts of certain crimes often prefer to remain anonymous, even when they agree to their records being shared. Recently, research based on social media has raised similar ethical concerns about privacy and the use of real users' profiles. In this paper, we present a new application of a type of formal grammar-probabilistic/stochastic context-free grammar-in the automatic generation of social media profiles using Facebook as a test case. First, we present a grammar-based formalism for describing the rules governing the formulation of reasonable user attributes (e.g. full names, date of birth, addresses, phone numbers, etc). These grammar rules are specified with associated probabilistic weights that decides when (if at all) a rule is used or chosen. Secondly, we describe the implementation of these grammar rules. Our implementation results produced one million iterations of unique Facebook profiles within three hours of execution time-with an almost-impossible probability that a profile will reoccur. 100,000 of these synthesised profiles can be viewed at: tinyurl.com/synthesisedprofiles2017. These profiles may find applications in role-playing games in health, and social media research; and the described technique may find a wider application in generation of hypothetical profiles for data anonymisation in different domains.
AB - One helpful resource to have when presenting delicate/sensitive information is hypothetical data, or placeholders that conceal the identity of concerned parties. This is crucial in environments such as medicine and criminology as volunteers in medical research, patients of dreaded diseases, and convicts of certain crimes often prefer to remain anonymous, even when they agree to their records being shared. Recently, research based on social media has raised similar ethical concerns about privacy and the use of real users' profiles. In this paper, we present a new application of a type of formal grammar-probabilistic/stochastic context-free grammar-in the automatic generation of social media profiles using Facebook as a test case. First, we present a grammar-based formalism for describing the rules governing the formulation of reasonable user attributes (e.g. full names, date of birth, addresses, phone numbers, etc). These grammar rules are specified with associated probabilistic weights that decides when (if at all) a rule is used or chosen. Secondly, we describe the implementation of these grammar rules. Our implementation results produced one million iterations of unique Facebook profiles within three hours of execution time-with an almost-impossible probability that a profile will reoccur. 100,000 of these synthesised profiles can be viewed at: tinyurl.com/synthesisedprofiles2017. These profiles may find applications in role-playing games in health, and social media research; and the described technique may find a wider application in generation of hypothetical profiles for data anonymisation in different domains.
KW - Data Anonymisation
KW - Facebook Profile Synthesis
KW - Privacy Protection
KW - Probabilistic Context-Free Grammars
UR - http://www.scopus.com/inward/record.url?scp=85049454731&partnerID=8YFLogxK
U2 - 10.1109/RoboMech.2017.8261131
DO - 10.1109/RoboMech.2017.8261131
M3 - Conference contribution
AN - SCOPUS:85049454731
T3 - 2017 Pattern Recognition Association of South Africa and Robotics and Mechatronics International Conference, PRASA-RobMech 2017
SP - 104
EP - 109
BT - 2017 Pattern Recognition Association of South Africa and Robotics and Mechatronics International Conference, PRASA-RobMech 2017
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 28th Annual Symposium of the Pattern Recognition Association of South Africa and the 10th Robotics and Mechatronics International Conference, PRASA-RobMech 2017
Y2 - 29 November 2017 through 1 December 2017
ER -