Knowing Personality Traits on Facebook Status Using the Naïve Bayes Classifier

With the development of social media trends among students by using Facebook social media, students can communicate and pour out everything that is felt in the form of status. Personality is the character or various characters of a person - therefore, how a person to adjust to the surrounding environment for the achievement of communication smoothly. In the personality category, many things classify a person's category in the psychologist theory. In this exercise, the Big Five, the psychologist theory, is described in five codes, namely Openness, Conscientiousness, Extraversion, Agreeables, Neuroticism. Naive Bayes Classifier is used to determine the highest probability value with the aim to determine the highest value. The data used are two namely training data and testing data obtained from the Facebook status of students. From the data obtained can be tested in the system that the accuracy value is 88%.

looking at Twitter analysis research to find out the character of someone using the Naïve Bayes Classifier. The results of this study are the development of previous research, which uses the Probabilistic Neural Network algorithm to classify one's personality through Facebook social media [6].
In this study, the author is more or less inspired by the references of previous studies by looking at the background of the problems in this study. The research related to this research are Big Five Personality Profile and Collecting Behavior of LPs Collectors [7] Collectors generally have trait openness to high experience personalities, collectors tend their hearts and persistence in caring for hunting and more. Collectors easily join and socialize with others because collecting related to music can eliminate anxiety. Furthermore, researchers use Twitter Analysis Reference to Know Someone's Character Using the Naïve Bayes Classifier Algorithm [8] Proof of the study using MBTI psychologist tests by looking at tweets or posts on Twitter can analyze someone's personality with accurate results. From using the Naïve Bayes classifier probability algorithm, the results of the test can get a good level of accuracy. Implementation of Thesis Text Mining Classification Using the Naïve Bayes Classifier Method [9], looking at this research in the Naïve Bayes Classifier algorithm can obtain accurate data classification values which are proven in thesis testing in the library of informatics engineering study program with three categories, namely software engineering, network-based computing, and visual intelligent computing with 90% accuracy. Next sentiment analysis of category classification of public figures on Twitter [10].
The Influence of Social Media on the Students' Personality STEBIS IGM Palembang [11], this study explains that technological developments in the era of globalization are very influential on students, especially on social media which cannot be denied any more because social media invites users to participate openly, social media also has an influence to users in terms of clothing to the personality of the students of IGM, because of this modern development added that the one factor that shapes the personality itself is culture.

II. METHOD
The flowchart of the system planning in this study with the following steps must be taken in Fig.1. The system design flow above explains the network system, knowing the personality obtained from two data, namely training data and testing data. In the training data explained, starting from taking a document (Facebook) that is used is a different ID. After getting the data in the form of text, it will in the process of using text mining, which is processed with the stages of text processing, feature selection, stemming. After the data is processed with text processing, it will get the data in standard text / standard words. To be input into the database In the testing data flow explained the same as the process of taking data from Facebook and carried out the same process to get data in the form of raw words Data -data that has been inputted into the database will be analyzed by the naïve method Bayes classifier for getting results.

A. Personality
Formerly the mask was used in the theatrical show to show the characters that were played with that personality had meanings like a mask. Personality is an explanation of character because the character of people intends to impose norms on the person being discussed, in this case, it is said that the attitude. Attitude and behavior are viewed in terms of norms. As explained by Allport that "Character is personality evaluated and personal character is devaluated" (Allport, 1973, p52) that character and personality are the same. Another theory of psychoanalysis pioneered by Sigmund Freud views personality as consisting of three components, namely id (instinct), ego (consciousness or 'I'), and superego (conscience). The interaction between the three components is manifested in behavior Personality is a unique and unique aspect of a person's behavior, which can affect a person's ability to adapt to the environment (Adisti, 2010, p.19). Personality itself includes thought patterns, feelings, and behavior, which is one of the unique things in each person and is a character that distinguishes one another. Based on the experts above it can be understood that the definition of personality, in general, is the characteristics and tendencies of the traits or characteristics possessed by individuals in looking at themselves and interacting with others formed by heredity and external factors such as the environment, social, and culture [1].

B. Facebook
Facebook is a social networking website where users can join communities such as cities, work, schools, and regions to connect and interact with others. Facebook social networking site penetrates 100 million active users announced by Facebook CEO. Facebook is currently one of the number one sites in the world until now included in the ranks of the top five most known for having many members. Entering 2006, the use of Friendster and MySpace began to be displaced by Facebook. This site with a more modern appearance allows people to get acquainted and access information to the widest possible extent. Facebook is a free internet social networking service where we can form a network by inviting our friends.

C. Big Five
The theory of personality used by researchers in this study is the Big Five Personality model is a theory of personality models that is determined based on five basic factors that underlie each other and include the most significant variations in human personality. Big Five is one of the methods known in the world of psychology to interpret one's personality, especially to find the relationship of personality with the work environment. The Big Five personality consists of openness (O), conscientiousness (C), extraversion (E), Agreeableness (A), and neuroticism (N) [5].

Extraversion
Extraversion describes the character of someone who tends to be affectionate, cheerful, happy to talk, happy to gather, fun (McCrae and 14 Costa, in Feist & Feist, 2010) passionate, enthusiastic, dominant, friendly and communicative. People with Extraversion personalities are also passionate, friendly, and communicative. Someone with a high level of extraversion can be faster to make friends than someone with a low level of extraversion. Extraversion is easily motivated by change, variation in life, and also easily bored, while someone with a low level of extraversion tends to be calm and withdraw from their surroundings.

Agreeableness or Agreement
Agreeableness can be characterized as being able to adapt socially well. Individuals who have a high level of Agreeableness include those who are soft-hearted, kind, and warm, tend to trust people easily and are friendly, like to help others, forgiving and direct to the problem. This personality distinguishes between soft-hearted individuals who do not know compassion. Someone with low Agreeableness likes to be suspicious, miserly, unfriendly, easily offended, tends to be more aggressive, and criticizes others and is less cooperative. Someone with agreeableness personality is also called social adaptability or likability, which means to characterize someone friendly, has a personality that always succumbs, and avoids conflicts.

Conscientiousness
Conscientiousness describes someone who tends to be organized, cautious (Friedman & Schustack, 2006; McCrae and Costa, in Feist & Feist, 2010), reliable, responsible, hardworking, on time, and able to survive. Personality conscientiousness is usually referred to as dependability, impulse control, and will to achieve. Someone with a high level of conscientiousness is a hardworking, careful, punctual, and determined person. Whereas someone with a low level of conscientiousness tends to be disorganized, neglectful, lazy, and lacks purpose and gives up easily when encountering difficulties.

Neuroticism or Neuroticism
Neuroticism describes someone who tends to be nervous, sensitive, tense, temperamental anxious, self-loving, very selfaware, emotional and prone to stress-related disorders 5. Openness to Experience or Openness Openness to experience describes someone who tends to look imaginative, creative, fun, artistic, full of curiosity, open, and prefers variation. Openness describes the complexity or breadth of life. Individuals who have high openness are generally creative, imaginative, curious, and have broad interests. As well as fun and artistic, openness distinguishes between someone who chooses a variety compared to someone who closes themselves and who gets comfort in their relationships with things and people they know. To make it easier to get to know someone or to understand big five personalities can be seen in the following table where the difference in character between low scores and high scores.

D. Text mining
Text mining is a variation of data mining that seeks to find interesting patterns from a large collection of textual data. Text mining from other sources is a process of combining information extracted from various sources. Completion of text mining, there are several stages in general [9]. Text Preprocessing the initial stages of text mining to prepare text into data that will be processed at a later stage. In-text mining, raw data that contains information has an arbitrary structure, so the process of converting forms into structured data as needed is required, which is usually going to be numeric values. This process is called Text Preprocessing.
Tokenizing is the process of separating every word in a sentence into a single word. Each word in the sentence is separated using a space character. In this process, apart from separating words in sentences, a lowercase process is carried out. Lowercase is a process to change all letters in a sentence to lowercase, meaning that if there are capital letters in the sentence, this process will automatically change the capital letters into lowercase. In this process, only letters "a" to "z" are accepted, in addition to letters "a" to "z" such as characters, numbers, and punctuation (punctuation) are removed.
Feature Selection or feature selection, this stage aims to delete words that are considered not important. Feature selection has two stages, stopword and stemming.
1. Stopword Stopwords are common words or words that often appear but do not have meaning, often used in sentences or documents that will cause a value to be small in helping to choose documents that suit your needs. In this process, what will be done is to erase common words to reduce the number of occurrences of words that have no meaning. Some words that often appear like "that", "and", "like", "often", "in", "will", "in", "this", "that", "no", "yes", "Is", "is", "ie", "ie", "in", "where", "for", "of", "and", "again", "you", "I", "then "," Then "," can "," to "," a "," like "," then "," said "," can "," so "," with ", etc. which are considered to have no meaning means it.

E. Naïve Bayes Classifier
Naïve Bayes Classifier is an algorithm used to find the highest probability value for classifying test data in the most appropriate category put forward by Thomas Bayes as in equation (1) [9][10] [12].
Where: X is the data with unknown classes H is the data X hypothesis is a specific class. P (H│X) is the probability of hypothesis H based on condition X (posterior probability). P (H) is the probability of Hypothesis H (prior probability). P (X | H) is the probability of X based on the conditions in Hypothesis H P (X) is the probability of X where the process Naïve Bayes Classifier method is as follows: • Equation (2) is used in the Naïve Bayes Classifier Algorithm, ( │ ) = ( ( │ ) * ( ))/ ( ) (2) • Event A on condition B is determined from opportunity B with condition A, opportunity A, and opportunity B. In the application later it becomes Equation (3), ( │ ) = ( ( │ ) * ( ))/ ( ) (3) • Naïve Bayes Classifier, or biased referred to as multinominal Naïve Bayes, is a simplified model of the Bayes Theorem that fits in classifying Facebook Status categories. Therefore, eating equation (4), = max ( | 1, 2, … ) (4) • According to Equation (2), Equation (3) can be written as Equation (5), Because P (α1, α2, …………….αn is constant, so Equation (5) can be written as Equation (6), = (Arg Max p(α1,α2…an│Vj)P(Vj)) (6) Where, Vj ∈V, = The highest category probability, ( ) = Chance of the highest class type or j-category, and ( , 2, 3, … . . | ) = Opportunity attribute if the known state of Vj However, because P (a1, a2, a3 …… ..an | vj) is difficult to calculate, it will be assumed that each word has no relation, indicated by Equation (7). VMAP =Arg Max P(Vj ) ∏ P (ai |Vj ) (7) Vj ∈ V. So the Naive Bayes classifier calculation with Equation (8) is the opportunity is the number of parameters / total words is the number of word records in each class category Equation (8) is resolved through the following calculation: 1. Specifies the value of nc for each class 2. Calculate the value of P (a1│Vj) and calculate the value of P (Vj) using Equation (8) VMAP = Arg Max P (Vj) ∏ P (ai | Vj) (8) Vj ∈ V 3. Calculates P (a1│Vj) x P (Vj) for each high class 4. Determine the classification results, namely the high class which has the greatest multiplication results

III. RESULT AND DISCUSSION
Personality data in Fig.2(a) that is input will appear on the admin personality menu page can also edit or delete to the database. The personality data menu page is used to enter personality data to be used as a classification. On the personality data edit page, you can change only the type table in Fig.2(b). For personality ids, it cannot be changed because personality ids are auto-added. In this Facebook data page menu in Fig.3 (a), the admin can add Facebook data, by selecting the add button at the top. In the Facebook data, input form admin enters the Facebook name data by the user ID name on Facebook, to input admin status enter the pure status text does not need to be deleted or added, although the status of the word used is not by the spelling and symbolssymbols that are used do not need to be deleted. Because the status data entered will be processed with text mining and will produce basic words. Then the admin enters the personality that has been inputted on the personal input form, and the admin just has to choose the dropdown button that matches his personality as seen in Fig.3(b). On the Facebook data menu page on the right, there is an edit and delete link. In the edit can not be used because the data input is valid and has obtained the personality category obtained from the questionnaire results and has been calculated by the psychologist method. To delete the link can be used because it will delete the whole meaning that the data is no longer entered into the database. The Status Data page in Fig.4 contains words from all statuses in the training data that has been through the text processing process, which in the process produces standard words or important words that are used to analyze a person's personality.

Fig.4. Data Status Menu Page
The evaluation result based on table I, it can be seen that the Naïve Bayes classifier has an accuracy value of 88%, and 22 users are true or by their personalities, and three other users are false or not by their personalities. accuracy = (the amount of data is correct) / (the total amount of data) x100% = 22/25 x100% = 88%.

IV. CONCLUSION
The BigFive psychologist theory that reflects the five personality categories can determine the personality category of each student by looking at the results of the questionnaire that is filled out. From these results, students are more precise on how to adapt to the environment around the university. Naïve Bayes classifier is more appropriate to use because the accuracy obtained from Facebook social media data, which is divided into two data between training data and testing data, produces a perfect accuracy value with a value of 88%. For further research development, it might be possible to use the platform assistance from the newer Facebook API to make it easier, obtaining data, and obtaining more data so that the accuracy value generated is higher. In future studies, it may be possible to use other social media by taking data uploaded in the form of images to find out the person's character/personality.