3 Group Size and Organisation
4 Due Date and Submission
5 Report Format
8 Project Description
The Group Project provides us with a chance to analyse the Social Web using knowledge obtained from this unit with assistance from a computer based statistical package. For this project, we will focus on identifying a chosen Public Figure’s Twitter image.
To complete this project:
1. Read through this specification.
2. Form a group and register your group using the Project Groups section of vUWS.
3. Choose a public figure that is active on Twitter, check that it is not already on the list of Group Project Twitter Handles. Then submit the Twitter handle of the public figure using the same link. Note that a given Person cannot be allocated to more than one group. If duplicate person names are found on the list, the group with the later time stamp will be asked to find a new public figure.
4. Complete the data analysis required by the specification.
5. Write up your analysis using your favourite word processing/typesetting program, making sure that all of the working is shown and presented well.
6. Include the student declaration text on the front page of your report. Please make sure that the names and student numbers of each group member are clearly displayed on the front page. If a group member did not contribute to any part of the project, do not put their name to the cover (no contribution means 0 mark).
7. Submit the report as a PDF by the due date using theSubmit Group Project.More detailed screenshots of your code should be in the Appendix part of the assignment,include comments in the code to explain what you tried to do.
3 Group Size and Organisation
Students in groups of size 4 or 5 are to work together to complete this project. One project report is to be submitted per group.
The group must be formed by signing-up to a group within the Project section of 300958 in vUWS. 0 marks will be awarded to lone submissions.
Groups must be formed by week 7. Once the group is formed, one person should be nominated within the group to be responsible for submitting the report.
4 Due date and Submission
The project report Part A is due in by 11:59 p.m. on the Monday of week 10. The report must be submitted as a PDF file using the assignment submission facilities in the Project section of 300958 in vUWS. Only one student from each group needs to submit the assignment.
5 Report Format
Once the required analysis is performed by the group, the members of the group are to write up the analysis as a report. Remember that the assessor will only see the groups’ report and will be marking the group's analysis based on your report. Therefore, the report should contain a clear and concise description of the procedures carried out, comments on the code, explanations of what you tried to do, the analysis of results and any conclusions reached from the analysis.
The required analysis in this specification covers the material presented in lectures and labs. Students should use the computer software R to carry out the required analysis and then present the results from the analysis in the report.
This project is worth 30% (Part A 16% + Part B 14%) of your final grade, and so the project will be marked out of 30. The project consists of five investigations (3 sections in Part A, 2 sections at Part B) and will be marked using the following criteria:
If a report is submitted late, the maximum mark it can achieve will be reduced by 10% per day.There are also two marks allocated to presentation (based on the report formatting, style, grammar, clarity and mathematical notation). If the report looks like something that would be submitted to an employer, then the full two marks will be awarded.
The following declaration must be included in a clearly visible and readable place on the first page of the report.
By including this statement, we the authors of this work, verify that:
· We hold a copy of this assignment that we can produce if the original is lost or damaged.
· We hereby certify that no part of this assignment/product has been copied from any other student's work or from any other source except where due acknowledgement is made in the assignment.
· No part of this assignment/product has been written/produced for us by another person except where such collaboration has been authorised by the subject lecturer/tutor concerned.
· We are aware that this work may be reproduced and submitted to plagiarism detection software programs for the purpose of detecting possible plagiarism (which may retain a copy on its database for future plagiarism checking).
· We hereby certify that we have read and understand what the School of Computing and Mathematics defines as minor and substantial breaches of misconduct as outlined in the learning guide for this unit.
Note: An examiner or lecturer/tutor has the right not to mark this project report if the above declaration has not been added to the cover of the report.
A well-known public figure is investigating their public image and has approached your team to identify what the public associates with them. They want five pieces of analysis to be performed.
8.1Analysis of Twitter language about the Public Figure
In this section, we want to examine the language used in tweets. Use the rtweet package to download tweets.
1. Use thesearch_tweets function from thertweet library to search for 1000 tweets about the person you selected. Save these tweets as “tweets”.
2. Construct a document-term matrix or term-document matrix using TFIDF weighting.
3. Construct a word cloud of the words in your document term matrix. Make sure you removed all the words non-informative by updating your stop list.
4. Sum the frequencies of each terms in all documents to obtain a vector of term frequencies summed over all tweets.
5. Compute the proportion of each term in the tweets from the vector of term frequencies. Visualize the top 20 words and their proportion by using a bar plot.
7. Create a dendrogram of the words in your document term matrix. You do not need to visualize all words in your dendrogram, set up appropriate boundaries to improve the visualization.
i. Try simple and complete linkage clustering.
ii. Which one do you think performed better?
What do these words tell us about the Person? Comment on what people are saying about this person?
8.2 Clustering the Users Who Posted Tweets About the Public Figure
We want to categorize (cluster) the users of the tweets about the Public Figure based on the descriptions provided in their Twitter account to figure out what kind of users are tweeting about them. Descriptions in the users' Twitter profiles give a short piece of information about the Twitter handle. To cluster users, built a document term matrix by using the user descriptions of the tweets you downloaded at section 1.
1. Compute the most appropriate number of clusters using the elbow method. Make sure an appropriate metric is used.
2. Cluster the users and visualize the clusters in two dimensional vector space.
3. List the top 10 words associated with each user cluster and manually determine the type of user profiles for each cluster (e.g. journalist, organisation, personal).
4. Comment on your findings.
8.3 Retweet Analysis of the User Clusters
We want to examine if the number of retweeted tweets is independent of which user cluster posted the tweet.
1. Find the tweets of the users at each cluster at your tweets file and examine how many of them are retweeted.
2. Construct a 2×M” role=”presentation”>2×M2×M table where M” role=”presentation”>MM is the number of user clusters you found at Section 2. Each row (2 rows in total) should represent the total number of retweeted tweets and non-retweeted tweets in each cluster.
3. Is retweeting independent of user groups? Perform an appropriate test to answer this question.
4. Interpret your results in context.
By combining all your findings, what can we say about the person’s image on Twitter? Draw a conclusion from your report.
The person wants the above three parts of analysis to be written up in a professional report. Each part should have its own section of the report and all questions should have thoughtful answers. Include only the relevant piece of code along with its output in the body of your assignment. More detailed code should be in the Appendix part of the assignment.
PART B (due Week 13, Friday 11:59 pm)
Topic is about any actor or games with twiter and must be done in R studio
“Names and Student IDs of all group members who contributed the project”
8 Project DescriptionPART A (due Week 10, Monday 11:59 pm)