This essay focuses on Singapore University of Social Sciences. This is to perform extract, load, transformation, and calculation operations. ● Assess use of Pandas dataframes
Objectives: ● Perform simple exploratory data analysis. ● Design computation logic and routines in Python. ● Assess use of Python only and Python data structures to perform extract, load, transformation, and calculation operations. ● Assess use of Pandas dataframes to perform extract, load, transformation and calculation operations. ● Assess the design and use of database ORM and methods to perform extract, load, transformation and calculation operations. Use ORM (unless state otherwise) to compute tf–idf (https://en.wikipedia.
It is.org/wiki/Tf%E2%80%93idf), which can be used as a feature to classify tweets or to search tweets by user queries. (a) Create and develop new SQLite table called tweet_word_pairs with 2 columns tweet_id and word. Break the words column obtained in Q2(b) into multiple rows to form pairs of column tweet_id and word. Insert (tweet_id, word) pairs computed from the previous step into this tweet_word_pairs table.
Each row in the dataframe generated in Q2(b) corresponds to a tweet and the words field of each row contains the list of pre-processed words computed from the full_text. One of the rows in the dataframe generated in Q2(b) is shown in Figure 1. Considering the above row as the example, we will have the expected result show in Figure 2. (6 marks) Figure 1: One of the rows in the dataframe generated in Q2(b) ICT233 Copyright © 2021 Singapore University of Social Sciences (SUSS) Page 6 of 8 TMA – July Semester 2021 Figure 2: Expected result of the given row (b) Compute the number of times each word appear in each tweet and store the value ftd(word, tweet) in the column called ftd. Example output format is shown in Figure.
details;
firstly, be sure
Secondly, be honest
Thirdly, be fast
further, be creative
Lastly, passion
lastly, be sober