Sunday 27 August 2023

Airline Sentiment Analysis with a Transformer Artificial Neural Network

 Airline Sentiment Analysis with a Transformer Artificial Neural Network

Ali Riza SARAL

 

This is a report on my project for airline sentiment analysis using a Transformer artificial NNW.

 

I took the 'airline twitter sentiment' data from Data World (https://data.world/datasets/sentiment).  This dataset includes customer comments and their associated sentiments.   There are 14641 comments in this Excel file.  There are fields named: _unit_id, _trusted_judgments, airline_sentiment, airline_sentiment:confidence, negative reason, negativereason:confidence, airline, text, tweet_coord.  I only needed the airline_sentiment and text fields from this Excel file.

 

The implementation of sentiment analysis requires a significant effort to prepare the input data for learning. It also difficult to come with an appropriate artificial neural network architecture.  AS a beginning I took  the ‘Text classification with Transformer’ from Keras site: https://keras.io/examples/nlp/text_classification_with_transformer/

This transformer example uses IMDB movie reviews as data and classifies them as positive or negative.  Although it is a good and working example, it does not have a prediction part.  The difficulty begins with using the IMDB database and the preprocessing IMDB does is not well documented.

 

To make predictions you have to take a text and process it the same as the inputs you use for the learning phase.  Convert to lowercase, remove links, remove all sorts of marks, comma, dot etc. and also some frequent meaningless words such ‘a’ etc.  And then you have to convert the words to word2vector numbers to be able to process them in the network.  The problem is you have to do the same word2vector conversion as IMDB in order to have your network do a healthy conversion.  Otherwise for example the word ‘go’ ends up as 234 where as IMDB assigns 459 to it.  This is the reason I decided to use airline sentiment twitter data from Data World to test Keras’s transformer architecture.

 

I did preprocessing in a short python function:

-------------------------------------------------------------------------------------------------------------

import re

def clean_links(text):

    #cleaned_text = re.sub(r'http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\\(\\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', '', text, flags=re.MULTILINE)

    cleaned_text = re.sub(r'http[s]?://\S+', '', text)  # Updated regular expression

    return cleaned_text

 

def clean_string(input_str):

    clean_chars = []

    input_str=clean_links(input_str)

   

    input_str=re.sub(r'http[s]?://\S+|\n', '', input_str)

    for char in input_str:

        if char.isalnum() or char == "'" or char.isspace():

            clean_chars.append(char)

   

    retVal=''.join(clean_chars)

    clean_links(retVal)

    return retVal

 

input_string = "driver's licence"

input_string = "https://t.co/mWpG7grEZP"

input_string = "http://t.co/Y7O0uNxTQP"

input_string="http://t.co/4gr39s91Dl‰Û_Ù÷â"

input_string="http://t.co/4gr39s91Dl‰Û_Ù÷â"

input_string="http://t.co/GsB2J3c4gM"

cleaned_string = clean_string(input_string)

print(cleaned_string)

-------------------------------------------------------------------------------------------------------------

To convert the texts to numbers for the learning process I first built a dictionary where each word has a number assigned to.  To build the dictionary, i wrote a program that goes through the whole data and extracts each word, cleans them with the above function and saves this data to a file:

--------------------------------------------------------------------------------------------------------------

virginamerica

plus

you've

added

commercials

to

the

experience

tacky

virginamerica

i

didn't

today

must

mean

--------------------------------------------------------------------------------------------------------------------

There are 253076 words and hence lines in this file including the repetitions.  I used

a rex utility tat I had written earlier and removed the dups in this file:

--------------------------------------------------------------------------------------------------------------------

absolute

absolutely

absorb

absorber

absoulutely

absurd

absurdity

absurdly

abt

abundance

abuse

abused

abysmal

ac

-----------------------------------------------------------------------------------------------------------------

There are 13786 words hence lines in this file.  This is the dictionary size, the number of words in the dictionary.

 

The next step is to preprocess the inputs to the learning phase.  This phase extracts the airline_sentiment values and preprocesses the text values.

--------------------------------------------------------------------------------------------------------------------

1!positive& virginamerica plus you've added commercials to the experience tacky

2!neutral& virginamerica i didn't today must mean i need to take another trip

3!negative& virginamerica it's really aggressive to blast obnoxious entertainment in your guests' faces amp they have little recourse

4!negative& virginamerica and it's a really big bad thing about it

5!negative& virginamerica seriously would pay 30 a flight for seats that didn't have this playing it's really the only bad thing about flying va

6!positive&UnicodeEncodeError

7!neutral& virginamerica really missed a prime opportunity for men without hats parody there

8!positive& virginamerica well i didn'tûbut now i do d

9!positive& virginamerica it was amazing and arrived an hour early you're too good to me

--------------------------------------------------------------------------------------------------------------------

The program handles if there is a unicode error in the text.  Later on the unicode error lines will not be included into network data.

The next program creates a word2vec model for dictionary

------------------------------------------------------------------------------------------------------------------------

model.build_vocab(initial_sentences)

and updates it for each word in the preprocess output of the dictionary, 13800 words

saves the model

model.save("word2vec_dict_model.model")

 

model.wv.save_word2vec_format("saW2Vdict_vectors.txt", binary=False)

------------------------------------------------------------------------------------------------------------------------

aboout 0.9614743

abounds 0.49820578

about 0.92331433

above -0.8157917

abq 0.44957983

abroad -0.4137076

absolute 0.08245361

absolutely 0.849862

absorb -0.44621766

absorber 0.45175004

-------------------------------------------------------------------------------------------

The next step does the same for each sentence in the data and creates x_train y_train data.  Here x_train has the text and y_train has the sentiment.  This program converts all the sentences word by word to numbers and ‘positive’ sentiments to 1, ‘negative’ sentiments to 0.

-------------------------------------------------------------------------------------------------------------------------

0? southwestair i continue to be amazed by the amazing customer service thank you swa![[12252], [677], [18600], [13555], [6673], [7008], [1069], [1150], [9228], [4573], [17659], [17086], [13689], [2772]]&1

1? americanair golfwithwoody don't buy it woody they're making it much worse with understaffing rudeness and prerookie mistakes![[14348], [3087], [10883], [196], [13835], [16713], [4748], [6870], [13835], [18504], [17594], [12636], [7400], [16721], [3502], [10051], [13862]]&0

2? usairways thanks to betty working gate at ilm and lovely gate agents here in clt helping me get home 2 phx tonight instead of tomorrow![[9362], [17243], [13555], [3706], [16524], [8672], [18355], [17395], [3502], [12246], [8672], [14530], [7500], [17457], [16153], [6352], [15003], [16996], [13628], [6159], [2349], [2897], [18895], [6965]]&1

3? southwestair i'll stick with flying for free any where that southwest goes my son works for this wonderful company and moms fly free![[12252], [10557], [2938], [12636], [6970], [4161], [6466], [3275], [11195], [17114], [8426], [7158], [17815], [4710], [3650], [4161], [4333], [4784], [15239], [3502], [16000], [11957], [6466]]&1

4? southwestair is there a way to know who checked my bag on the curb she was awesome and want to be sure she gets a high five![[12252], [14324], [2347], [3685], [13555], [12020], [17067], [11717], [17815], [12527], [12424], [1150], [10389], [2426], [734], [2224], [3502], [6253], [13555], [6673], [19303], [2426], [4642], [2905], [14056]]&1

5? united you're welcome![[6378], [3488], [16333]]&1

------------------------------------------------------------------------------------------------------------------------

İt divides the data to learning and validation parts

-------------------------------------------------------------------------------------------------------------------------

x_train= x_values_array[:14000]

y_train= y_values_array[:14000]

x_val= x_values_array[14000:]

y_val= y_values_array[14000:]

---------------------------------------------------------------------------------------------------------------------------

The last program for learning is the transformer architecture as given in the Keras reference above.

But there is still an enormous difficulty.  The x_train y_train data that you have prepared does not work with the given architecture.  A very important part of creating an artificial neural network is to adjust the format of the input data to the decided  architecture. 

 

First you have to decide the max length of the text.  And then you have to pad your input texts according to this size.  After some processing for flattening the data, converting to numpy and than to tensors the network accepts the input.

 

There is an iterative performance improvement phase after the network begins to work and produce some accuracy.  Leraning rate scheduling based on val_accuracy, layer normalization, droputs etc...

The result for learning is:

Model: "model"

_________________________________________________________________

 Layer (type)                Output Shape              Param #  

=================================================================

 input_1 (InputLayer)        [(None, 100)]             0        

                                                                 

 token_and_position_embeddi  (None, 100, 32)           643200   

 ng_8 (TokenAndPositionEmbe                                     

 dding)                                                         

                                                                 

 transformer_block_16 (Tran  (None, 100, 32)           10656    

 sformerBlock)                                                  

                                                                

 transformer_block_17 (Tran  (None, 100, 32)           10656    

 sformerBlock)                                                  

                                                                

 global_average_pooling1d (  (None, 32)                0        

 GlobalAveragePooling1D)                                         

                                                                

 layer_normalization_36 (La  (None, 32)                64       

 yerNormalization)                                              

                                                                 

 dropout_36 (Dropout)        (None, 32)                0        

                                                                

 dense_36 (Dense)            (None, 20)                660      

                                                                 

 layer_normalization_37 (La  (None, 20)                40       

 yerNormalization)                                              

                                                                

 dropout_37 (Dropout)        (None, 20)                0        

                                                                

 dense_37 (Dense)            (None, 2)                 42       

                                                                 

=================================================================

Total params: 665318 (2.54 MB)

Trainable params: 665318 (2.54 MB)

Non-trainable params: 0 (0.00 Byte)

_________________________________________________________________

Epoch 1/2

110/110 [==============================] - ETA: 0s - loss: 0.5824 - accuracy: 0.6614  

 

Reduced learning rate: 1.0000000474974512e-06

val_accuracy = 0.8716470003128052

110/110 [==============================] - 53s 409ms/step - loss: 0.5824 - accuracy: 0.6614 - val_loss: 0.3216 - val_accuracy: 0.8716

Epoch 2/2

110/110 [==============================] - ETA: 0s - loss: 0.2878 - accuracy: 0.8939

 

Reduced learning rate: 9.999999974752428e-10

val_accuracy = 0.8799197673797607

110/110 [==============================] - 45s 410ms/step - loss: 0.2878 - accuracy: 0.8939 - val_loss: 0.2996 - val_accuracy: 0.8799

new reviews:

 ["virginamerica it was amazing and arrived an hour early you're too good to me", 'virginamerica your chat support is not working on your site']

************************************TESTTTTTTTTTTTT

 

y_train-------------> tf.Tensor([1. 0. 1. 1. 1. 1. 0. 0. 0. 1.], shape=(10,), dtype=float32)

 

y_val-------------> tf.Tensor([1. 1. 0. 0. 0. 0. 0. 0. 0. 1.], shape=(10,), dtype=float32)

1/1 [==============================] - 1s 863ms/step

&&&&&&&&&&&&&&&&Input 0:

Predicted probabilities: [[0.1222235 0.8777765]]

Predicted class: [1]

Actual class: [1.]

 

1/1 [==============================] - 0s 42ms/step

&&&&&&&&&&&&&&&&Input 1:

Predicted probabilities: [[0.12541966 0.8745803 ]]

Predicted class: [1]

Actual class: [1.]

 

1/1 [==============================] - 0s 111ms/step

&&&&&&&&&&&&&&&&Input 2:

Predicted probabilities: [[0.80272907 0.19727091]]

Predicted class: [0]

Actual class: [0.]

 

1/1 [==============================] - 0s 40ms/step

&&&&&&&&&&&&&&&&Input 3:

Predicted probabilities: [[0.95607054 0.04392945]]

Predicted class: [0]

Actual class: [0.]

 

1/1 [==============================] - 0s 41ms/step

&&&&&&&&&&&&&&&&Input 4:

Predicted probabilities: [[0.9571833  0.04281674]]

Predicted class: [0]

Actual class: [0.]

 

------------------------------------------------------------------------------------------------

The result for prediction is:

 

runfile('C:/Users/ars/ARStensorflow/sentimentAnalysis/saSTEP9/saPredictSingleTST.py', wdir='C:/Users/ars/ARStensorflow/sentimentAnalysis/saSTEP9')

this

word_vector= [-0.5667038]

is

word_vector= [0.43231726]

my

word_vector= [0.78149366]

book

word_vector= [0.48123372]

 

 

word_vectors= [array([-0.5667038], dtype=float32), array([0.43231726], dtype=float32), array([0.78149366], dtype=float32), array([0.48123372], dtype=float32)]

 

 

discretized_sentence= [[ 2986]

 [ 9869]

 [12275]

 [10206]]

Reloaded modules: customSplitText, padSequences, flattenList2, saDiscretize

this

word_vector= [-0.5667038]

is

word_vector= [0.43231726]

my

word_vector= [0.78149366]

book

word_vector= [0.48123372]

 

 

word_vectors= [array([-0.5667038], dtype=float32), array([0.43231726], dtype=float32), array([0.78149366], dtype=float32), array([0.48123372], dtype=float32)]

 

 

discretized_sentence= [[ 2986]

 [ 9869]

 [12275]

 [10206]]

line= 4901!positive& southwestair i continue to be amazed by the amazing customer service thank you swa

result= ['4901', 'positive', ' southwestair i continue to be amazed by the amazing customer service thank you swa']

i=0 sentence= southwestair i continue to be amazed by the amazing customer service thank you swa w2v=[[8442], [466], [12816], [9340], [4598], [4829], [737], [793], [6358], [3151], [12167], [11772], [9432], [1910]] sentiment=1

 

line= 14471!negative& americanair golfwithwoody don't buy it woody they're making it much worse with understaffing rudeness and prerookie mistakes

result= ['14471', 'negative', " americanair golfwithwoody don't buy it woody they're making it much worse with understaffing rudeness and prerookie mistakes"]

i=1 sentence= americanair golfwithwoody don't buy it woody they're making it much worse with understaffing rudeness and prerookie mistakes w2v=[[9886], [2127], [7499], [135], [9532], [11515], [3271], [4734], [9532], [12749], [12123], [8706], [5099], [11521], [2413], [6925], [9551]] sentiment=0

 

line= 11384!positive& usairways thanks to betty working gate at ilm and lovely gate agents here in clt helping me get home 2 phx tonight instead of tomorrow

result= ['11384', 'positive', ' usairways thanks to betty working gate at ilm and lovely gate agents here in clt helping me get home 2 phx tonight instead of tomorrow']

i=2 sentence= usairways thanks to betty working gate at ilm and lovely gate agents here in clt helping me get home 2 phx tonight instead of tomorrow w2v=[[6451], [11881], [9340], [2553], [11385], [5975], [12647], [11985], [2413], [8438], [5975], [10011], [5168], [12028], [11130], [4376], [10337], [11710], [9390], [4244], [1618], [1996], [13019], [4799]] sentiment=1

 

line= 4668!positive& southwestair i'll stick with flying for free any where that southwest goes my son works for this wonderful company and moms fly free

result= ['4668', 'positive', " southwestair i'll stick with flying for free any where that southwest goes my son works for this wonderful company and moms fly free"]

i=3 sentence= southwestair i'll stick with flying for free any where that southwest goes my son works for this wonderful company and moms fly free w2v=[[8442], [7274], [2025], [8706], [4803], [2867], [4455], [2257], [7714], [11791], [5805], [4932], [12275], [3245], [2515], [2867], [2986], [3296], [10500], [2413], [11024], [8239], [4455]] sentiment=1

 

line= 6172!positive& southwestair is there a way to know who checked my bag on the curb she was awesome and want to be sure she gets a high five

result= ['6172', 'positive', ' southwestair is there a way to know who checked my bag on the curb she was awesome and want to be sure she gets a high five']

i=4 sentence= southwestair is there a way to know who checked my bag on the curb she was awesome and want to be sure she gets a high five w2v=[[8442], [9869], [1617], [2539], [9340], [8282], [11759], [8073], [12275], [8631], [8560], [793], [7158], [1672], [506], [1532], [2413], [4309], [9340], [4598], [13300], [1672], [3199], [2002], [9685]] sentiment=1

 

line= 3402!positive& united you're welcome

result= ['3402', 'positive', " united you're welcome"]

i=5 sentence= united you're welcome w2v=[[4394], [2404], [11254]] sentiment=1

 

0? southwestair i continue to be amazed by the amazing customer service thank you swa![[8442], [466], [12816], [9340], [4598], [4829], [737], [793], [6358], [3151], [12167], [11772], [9432], [1910]]&1

 

1? americanair golfwithwoody don't buy it woody they're making it much worse with understaffing rudeness and prerookie mistakes![[9886], [2127], [7499], [135], [9532], [11515], [3271], [4734], [9532], [12749], [12123], [8706], [5099], [11521], [2413], [6925], [9551]]&0

 

2? usairways thanks to betty working gate at ilm and lovely gate agents here in clt helping me get home 2 phx tonight instead of tomorrow![[6451], [11881], [9340], [2553], [11385], [5975], [12647], [11985], [2413], [8438], [5975], [10011], [5168], [12028], [11130], [4376], [10337], [11710], [9390], [4244], [1618], [1996], [13019], [4799]]&1

 

3? southwestair i'll stick with flying for free any where that southwest goes my son works for this wonderful company and moms fly free![[8442], [7274], [2025], [8706], [4803], [2867], [4455], [2257], [7714], [11791], [5805], [4932], [12275], [3245], [2515], [2867], [2986], [3296], [10500], [2413], [11024], [8239], [4455]]&1

 

4? southwestair is there a way to know who checked my bag on the curb she was awesome and want to be sure she gets a high five![[8442], [9869], [1617], [2539], [9340], [8282], [11759], [8073], [12275], [8631], [8560], [793], [7158], [1672], [506], [1532], [2413], [4309], [9340], [4598], [13300], [1672], [3199], [2002], [9685]]&1

 

5? united you're welcome![[4394], [2404], [11254]]&1

 

x_values------------

 

 

 

['[[8442], [466], [12816], [9340], [4598], [4829], [737], [793], [6358], [3151], [12167], [11772], [9432], [1910]]'

 '[[9886], [2127], [7499], [135], [9532], [11515], [3271], [4734], [9532], [12749], [12123], [8706], [5099], [11521], [2413], [6925], [9551]]'

 '[[6451], [11881], [9340], [2553], [11385], [5975], [12647], [11985], [2413], [8438], [5975], [10011], [5168], [12028], [11130], [4376], [10337], [11710], [9390], [4244], [1618], [1996], [13019], [4799]]'

 '[[8442], [7274], [2025], [8706], [4803], [2867], [4455], [2257], [7714], [11791], [5805], [4932], [12275], [3245], [2515], [2867], [2986], [3296], [10500], [2413], [11024], [8239], [4455]]'

 '[[8442], [9869], [1617], [2539], [9340], [8282], [11759], [8073], [12275], [8631], [8560], [793], [7158], [1672], [506], [1532], [2413], [4309], [9340], [4598], [13300], [1672], [3199], [2002], [9685]]'

 '[[4394], [2404], [11254]]']

 

y_values*********

['1' '0' '1' '1' '1' '1']

------------

 

 

 

 

 

 

 

?????????????????????????????x_train

['[[8442], [466], [12816], [9340], [4598], [4829], [737], [793], [6358], [3151], [12167], [11772], [9432], [1910]]'

 '[[9886], [2127], [7499], [135], [9532], [11515], [3271], [4734], [9532], [12749], [12123], [8706], [5099], [11521], [2413], [6925], [9551]]'

 '[[6451], [11881], [9340], [2553], [11385], [5975], [12647], [11985], [2413], [8438], [5975], [10011], [5168], [12028], [11130], [4376], [10337], [11710], [9390], [4244], [1618], [1996], [13019], [4799]]'

 '[[8442], [7274], [2025], [8706], [4803], [2867], [4455], [2257], [7714], [11791], [5805], [4932], [12275], [3245], [2515], [2867], [2986], [3296], [10500], [2413], [11024], [8239], [4455]]'

 '[[8442], [9869], [1617], [2539], [9340], [8282], [11759], [8073], [12275], [8631], [8560], [793], [7158], [1672], [506], [1532], [2413], [4309], [9340], [4598], [13300], [1672], [3199], [2002], [9685]]'

 '[[4394], [2404], [11254]]']

 

 

 

 

y_train

['1' '0' '1' '1' '1' '1']

Shape of x_train: (6,)

Shape of y_train: (6,)

:::::::::::::::::::::::::::::::::

 

x_train_loaded=============== [[[8442], [466], [12816], [9340], [4598], [4829], [737], [793], [6358], [3151], [12167], [11772], [9432], [1910]], [[9886], [2127], [7499], [135], [9532], [11515], [3271], [4734], [9532], [12749], [12123], [8706], [5099], [11521], [2413], [6925], [9551]], [[6451], [11881], [9340], [2553], [11385], [5975], [12647], [11985], [2413], [8438], [5975], [10011], [5168], [12028], [11130], [4376], [10337], [11710], [9390], [4244], [1618], [1996], [13019], [4799]], [[8442], [7274], [2025], [8706], [4803], [2867], [4455], [2257], [7714], [11791], [5805], [4932], [12275], [3245], [2515], [2867], [2986], [3296], [10500], [2413], [11024], [8239], [4455]], [[8442], [9869], [1617], [2539], [9340], [8282], [11759], [8073], [12275], [8631], [8560], [793], [7158], [1672], [506], [1532], [2413], [4309], [9340], [4598], [13300], [1672], [3199], [2002], [9685]], [[4394], [2404], [11254]]]

Sublist 1 length: 14

Sublist 2 length: 17

Sublist 3 length: 24

Sublist 4 length: 23

Sublist 5 length: 25

Sublist 6 length: 3

 

x_train padded=============== [[[0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [8442], [466], [12816], [9340], [4598], [4829], [737], [793], [6358], [3151], [12167], [11772], [9432], [1910]], [[0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [9886], [2127], [7499], [135], [9532], [11515], [3271], [4734], [9532], [12749], [12123], [8706], [5099], [11521], [2413], [6925], [9551]], [[0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [6451], [11881], [9340], [2553], [11385], [5975], [12647], [11985], [2413], [8438], [5975], [10011], [5168], [12028], [11130], [4376], [10337], [11710], [9390], [4244], [1618], [1996], [13019], [4799]], [[0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [8442], [7274], [2025], [8706], [4803], [2867], [4455], [2257], [7714], [11791], [5805], [4932], [12275], [3245], [2515], [2867], [2986], [3296], [10500], [2413], [11024], [8239], [4455]], [[0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [8442], [9869], [1617], [2539], [9340], [8282], [11759], [8073], [12275], [8631], [8560], [793], [7158], [1672], [506], [1532], [2413], [4309], [9340], [4598], [13300], [1672], [3199], [2002], [9685]], [[0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [4394], [2404], [11254]]]

 x_train padded length= 6

Sublist 1 length: 100

Sublist 2 length: 100

Sublist 3 length: 100

Sublist 4 length: 100

Sublist 5 length: 100

Sublist 6 length: 100

 

x_train flatened=============== [[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 8442, 466, 12816, 9340, 4598, 4829, 737, 793, 6358, 3151, 12167, 11772, 9432, 1910], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 9886, 2127, 7499, 135, 9532, 11515, 3271, 4734, 9532, 12749, 12123, 8706, 5099, 11521, 2413, 6925, 9551], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 6451, 11881, 9340, 2553, 11385, 5975, 12647, 11985, 2413, 8438, 5975, 10011, 5168, 12028, 11130, 4376, 10337, 11710, 9390, 4244, 1618, 1996, 13019, 4799], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 8442, 7274, 2025, 8706, 4803, 2867, 4455, 2257, 7714, 11791, 5805, 4932, 12275, 3245, 2515, 2867, 2986, 3296, 10500, 2413, 11024, 8239, 4455], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 8442, 9869, 1617, 2539, 9340, 8282, 11759, 8073, 12275, 8631, 8560, 793, 7158, 1672, 506, 1532, 2413, 4309, 9340, 4598, 13300, 1672, 3199, 2002, 9685], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4394, 2404, 11254]]

 x_train length= 6

-------------->>>>>>>>>>>>> tf.Tensor(

[[    0.     0.     0.     0.     0.     0.     0.     0.     0.     0.

      0.     0.     0.     0.     0.     0.     0.     0.     0.     0.

      0.     0.     0.     0.     0.     0.     0.     0.     0.     0.

      0.     0.     0.     0.     0.     0.     0.     0.     0.     0.

      0.     0.     0.     0.     0.     0.     0.     0.     0.     0.

      0.     0.     0.     0.     0.     0.     0.     0.     0.     0.

      0.     0.     0.     0.     0.     0.     0.     0.     0.     0.

      0.     0.     0.     0.     0.     0.     0.     0.     0.     0.

      0.     0.     0.     0.     0.     0.  8442.   466. 12816.  9340.

   4598.  4829.   737.   793.  6358.  3151. 12167. 11772.  9432.  1910.]

 [    0.     0.     0.     0.     0.     0.     0.     0.     0.     0.

      0.     0.     0.     0.     0.     0.     0.     0.     0.     0.

      0.     0.     0.     0.     0.     0.     0.     0.     0.     0.

      0.     0.     0.     0.     0.     0.     0.     0.     0.     0.

      0.     0.     0.     0.     0.     0.     0.     0.     0.     0.

      0.     0.     0.     0.     0.     0.     0.     0.     0.     0.

      0.     0.     0.     0.     0.     0.     0.     0.     0.     0.

      0.     0.     0.     0.     0.     0.     0.     0.     0.     0.

      0.     0.     0.  9886.  2127.  7499.   135.  9532. 11515.  3271.

   4734.  9532. 12749. 12123.  8706.  5099. 11521.  2413.  6925.  9551.]

 [    0.     0.     0.     0.     0.     0.     0.     0.     0.     0.

      0.     0.     0.     0.     0.     0.     0.     0.     0.     0.

      0.     0.     0.     0.     0.     0.     0.     0.     0.     0.

      0.     0.     0.     0.     0.     0.     0.     0.     0.     0.

      0.     0.     0.     0.     0.     0.     0.     0.     0.     0.

      0.     0.     0.     0.     0.     0.     0.     0.     0.     0.

      0.     0.     0.     0.     0.     0.     0.     0.     0.     0.

      0.     0.     0.     0.     0.     0.  6451. 11881.  9340.  2553.

  11385.  5975. 12647. 11985.  2413.  8438.  5975. 10011.  5168. 12028.

  11130.  4376. 10337. 11710.  9390.  4244.  1618.  1996. 13019.  4799.]

 [    0.     0.     0.     0.     0.     0.     0.     0.     0.     0.

      0.     0.     0.     0.     0.     0.     0.     0.     0.     0.

      0.     0.     0.     0.     0.     0.     0.     0.     0.     0.

      0.     0.     0.     0.     0.     0.     0.     0.     0.     0.

      0.     0.     0.     0.     0.     0.     0.     0.     0.     0.

      0.     0.     0.     0.     0.     0.     0.     0.     0.     0.

      0.     0.     0.     0.     0.     0.     0.     0.     0.     0.

      0.     0.     0.     0.     0.     0.     0.  8442.  7274.  2025.

   8706.  4803.  2867.  4455.  2257.  7714. 11791.  5805.  4932. 12275.

   3245.  2515.  2867.  2986.  3296. 10500.  2413. 11024.  8239.  4455.]

 [    0.     0.     0.     0.     0.     0.     0.     0.     0.     0.

      0.     0.     0.     0.     0.     0.     0.     0.     0.     0.

      0.     0.     0.     0.     0.     0.     0.     0.     0.     0.

      0.     0.     0.     0.     0.     0.     0.     0.     0.     0.

      0.     0.     0.     0.     0.     0.     0.     0.     0.     0.

      0.     0.     0.     0.     0.     0.     0.     0.     0.     0.

      0.     0.     0.     0.     0.     0.     0.     0.     0.     0.

      0.     0.     0.     0.     0.  8442.  9869.  1617.  2539.  9340.

   8282. 11759.  8073. 12275.  8631.  8560.   793.  7158.  1672.   506.

   1532.  2413.  4309.  9340.  4598. 13300.  1672.  3199.  2002.  9685.]

 [    0.     0.     0.     0.     0.     0.     0.     0.     0.     0.

      0.     0.     0.     0.     0.     0.     0.     0.     0.     0.

      0.     0.     0.     0.     0.     0.     0.     0.     0.     0.

      0.     0.     0.     0.     0.     0.     0.     0.     0.     0.

      0.     0.     0.     0.     0.     0.     0.     0.     0.     0.

      0.     0.     0.     0.     0.     0.     0.     0.     0.     0.

      0.     0.     0.     0.     0.     0.     0.     0.     0.     0.

      0.     0.     0.     0.     0.     0.     0.     0.     0.     0.

      0.     0.     0.     0.     0.     0.     0.     0.     0.     0.

      0.     0.     0.     0.     0.     0.     0.  4394.  2404. 11254.]], shape=(6, 100), dtype=float32)

----------x_train_tensor---->>>>---->>>>> tf.Tensor(2, shape=(), dtype=int32)

----------x_train---->>>>---->>>>> tf.Tensor(2, shape=(), dtype=int32)

y_train=['1' '0' '1' '1' '1' '1']

y_train=[1, 0, 1, 1, 1, 1]

-------------->>>>>>>>>>>>> tf.Tensor([1. 0. 1. 1. 1. 1.], shape=(6,), dtype=float32)

y_train=<tf.Variable 'Variable:0' shape=(6,) dtype=float32, numpy=array([1., 0., 1., 1., 1., 1.], dtype=float32)>

:::::::::::::::::::::::::::::::::

 

y_train-------------> tf.Tensor([1. 0. 1. 1. 1. 1.], shape=(6,), dtype=float32)

1/1 [==============================] - 1s 954ms/step

&&&&&&&&&&&&&&&&Input 0:

Predicted probabilities: [[0.49673203 0.50326794]]

Predicted class: [1]

Actual class: [1.]

 

1/1 [==============================] - 0s 33ms/step

&&&&&&&&&&&&&&&&Input 1:

Predicted probabilities: [[0.49673203 0.50326794]]

Predicted class: [1]

Actual class: [0.]

 

1/1 [==============================] - 0s 32ms/step

&&&&&&&&&&&&&&&&Input 2:

Predicted probabilities: [[0.49673203 0.50326794]]

Predicted class: [1]

Actual class: [1.]

 

1/1 [==============================] - 0s 31ms/step

&&&&&&&&&&&&&&&&Input 3:

Predicted probabilities: [[0.49673203 0.50326794]]

Predicted class: [1]

Actual class: [1.]

 

1/1 [==============================] - 0s 30ms/step

&&&&&&&&&&&&&&&&Input 4:

Predicted probabilities: [[0.49673203 0.50326794]]

Predicted class: [1]

Actual class: [1.]

 

Sunday 13 August 2023

How to Prepare Data for a Neural Network: A Step-by-Step Guide

 How to Prepare Data for a Neural Network: A Step-by-Step Guide

 

Introduction

In this guide, I'll walk you through the steps I took to prepare airline sentiment data for a neural network. The aim is to create a model that predicts whether new comments are positive or negative using word embeddings and a transformer neural network architecture.

 

Step 0: Data Collection

I began by obtaining the 'airline twitter sentiment' data from Data World (https://data.world/datasets/sentiment). This dataset includes customer comments and their associated sentiments.

  

Step 1: Data Cleaning and Text Extraction

First, I extracted customer comments from the 'text' field of the dataset and cleaned them by removing punctuation, numbers, and other irrelevant elements. The cleaned comments were then written to a text file called "saPreprocessSentences.txt." This process was implemented using the following Python code:

 

# saPreprocessClean.py

from saPreprocessClean import clean_string

 

# Load the dataset

df = pd.read_excel('/Users/ARS/ARStensorflow/Airline-SentimentARS1.xlsx')

 

# Clean and preprocess the comments

cleaned_comments = ""

for value in df['text']:

    if isinstance(value, str):

        clean_string = clean_string(value)

        cleaned_comments += ' ' + clean_string

 

# Write cleaned comments to file

with open("saPreprocessSentences.txt", "w") as file:

                file.write(cleaned_comments)

 

Step 2: Word Extraction and Cleaning

Next, I extracted individual words from the cleaned comments, further cleaned them, and saved them to a file called "saPreprocessWords.txt." The code to achieve this is as follows:

 

# saPreprocessWords.py

from saPreprocessClean import clean_string

 

# Load the dataset

df = pd.read_excel('/Users/ARS/ARStensorflow/sentimentAnalysis/Airline-SentimentARS1.xlsx')

 

# Extract and preprocess words from comments

with open("saPreprocessWords.txt", "w") as file:

    for value in df['text']:

        if isinstance(value, str):

            words = value.split()

            for word in words:

                clean_word = clean_string(word)

                file.write(f"word={word} cleaned={clean_word}\n")

 

Step 3: Removing Duplicate Entries

To ensure data integrity, I created a batch program called "removeDUP" to remove any duplicate entries from the "saPreprocessWords.txt" file. The cleaned output was saved to "saRemoveDUPOutput.txt."

 

Step 4: Creating Word Embeddings

I converted each unique word into a float value using word2vec embeddings and built a dictionary model to map words to their corresponding vectors. This model was saved as "word2vec_dict_model.model," and the vectors were stored in "saW2Vdict_vectors.txt."

 

# saCreateW2VDictModel.py

from gensim.models import Word2Vec

 

# Load the cleaned word list

with open("saRemoveDUPOutput2.txt", "r", encoding="utf-16-le") as file:

    words = [line.strip().split()[1] for line in file]

 

# Build and train the word2vec model

model = Word2Vec.load("word2vec_dict_model.model")

for new_word in words:

    if any(char.isdigit() for char in new_word):

        print(f"includes number --> {new_word}")

    else:

        model.build_vocab([new_word], update=True)

        model.train([new_word], total_examples=1, epochs=1)

model.save("word2vec_dict_model.model")

model.wv.save_word2vec_format("saW2Vdict_vectors.txt", binary=False)

 

Step 5: Data Transformation and Labeling

I transformed the sentences in "saPreprocessSentences.txt" into word2vec vectors using the dictionary model. I also labeled each sentence based on its sentiment, appending it to the "saW2VXtrainYtrainData.txt" file.

 

# Transform sentences to word2vec vectors and label

def get_w2v_sentence(sentence):

    word_vectors = [model.wv[word] for word in sentence.split() if word in model.wv]

    return word_vectors

 

# Load the word2vec model

model = Word2Vec.load("word2vec_model_updated.model")

 

# Load sentiment data

df = pd.read_excel('/Users/ARS/ARStensorflow/Airline-SentimentARS1.xlsx')

sentiments = []

 

for sentiment_value in df['airline_sentiment']:

    if sentiment_value == "positive":

        sentiments.append(1)

 

with open("saPreprocessSentences.txt", "r") as file:

    with open("saW2VXtrainYtrainData.txt", "w") as file2:

        i = 0

        for line in file:

            sentence = line.strip()

            w2v_sentence_vectors = get_w2v_sentence(sentence)

            w2v_sentence_lists = [vector.tolist() for vector in w2v_sentence_vectors]

            print(f"i={i} w2v={w2v_sentence_lists} sentiment={sentiments[i]}", file=file2)

            i += 1

 

# Reading and formatting the data

x_values = []

y_values = []

 

with open("saW2VXtrainYtrainData.txt", "r") as file:

    for line in file:

        parts = line.strip().split()

        x_value = [float(x) for x in parts[1].split(',')]

        y_value = int(parts[2])

        x_values.append(x_value)

        y_values.append(y_value)

       

x_values_array = np.array(x_values)

y_values_array = np.array(y_values)

Step 6: Splitting Data for Training and Validation

I split the data into training and validation sets using a train-validation ratio of 80-20. The resulting arrays were saved as "saXtrainYtrainData.npz."

 

python

Copy code

from sklearn.model_selection import train_test_split

 

# Split data into training and validation sets

x_train, x_val, y_train, y_val = train_test_split(

    x_values_array, y_values_array, test_size=val_ratio, random_state=42

)

 

# Save the arrays to a file

np.savez("saXtrainYtrainData.npz", x_train=x_train, x_val=x_val, y_train=y_train, y_val=y_val)

 

Conclusion

By following these steps, I successfully prepared the airline sentiment data for training a transformer neural network. The data, which includes word2vec-transformed sentences and corresponding sentiment labels, is ready to be used for building and training the neural network model. This process showcases the power of chatGPT in aiding and accelerating the programming process.

#neuralNetworks #dataPrepare 


runfile('C:/Users/ars/ARStensorflow/sentimentAnalysis/saSTEP7/saProduceXtrainYtrainDataNEW.py', wdir='C:/Users/ars/ARStensorflow/sentimentAnalysis/saSTEP7')

sentiment = neutral -->0

sentiment = positive -->1

sentiment = neutral -->0

sentiment = negative -->0

sentiment = negative -->0

sentiment = negative -->0

sentiment = positive -->1

sentiment = neutral -->0

sentiment = positive -->1

sentiment = positive -->1

sentiment = neutral -->0

i=0 = 

i=0 w2v=[] sentiment=0

i=1 = virginamerica plus you've added commercials to the experience tacky

i=1 w2v=[[0.0012580156326293945], [-0.6498260498046875], [-0.5153262615203857], [-0.020553112030029297], [-0.6892403364181519], [0.3554692268371582], [-0.8850120306015015], [-0.2642437219619751], [0.10036587715148926]] sentiment=1

i=2 = virginamerica i didn't today must mean i need to take another trip

i=2 w2v=[[0.0012580156326293945], [-0.932395339012146], [0.2726396322250366], [-0.6015644073486328], [-0.01267862319946289], [-0.4461408853530884], [-0.932395339012146], [-0.946296215057373], [0.3554692268371582], [0.02413642406463623], [0.9904971122741699], [0.6381527185440063]] sentiment=0

i=3 = virginamerica it's really aggressive to blast obnoxious entertainment in your guests' faces amp they have little recourse

i=3 w2v=[[0.0012580156326293945], [-0.7756330966949463], [-0.5128778219223022], [-0.5935169458389282], [0.3554692268371582], [-0.942463755607605], [0.03211188316345215], [-0.1338428258895874], [0.7456883192062378], [0.9946664571762085], [-0.31516337394714355], [-0.22687816619873047], [0.2923257350921631], [0.6583085060119629], [0.6221826076507568], [-0.5303181409835815], [0.7077651023864746]] sentiment=0

i=4 = virginamerica and it's a really big bad thing about it

i=4 w2v=[[0.0012580156326293945], [-0.6498693227767944], [-0.7756330966949463], [-0.5128778219223022], [0.98853600025177], [0.7764592170715332], [0.9213329553604126], [0.9233143329620361], [0.3834136724472046]] sentiment=0

i=5 = virginamerica seriously would pay 30 a flight for seats that didn't have this playing it's really the only bad thing about flying va

i=5 w2v=[[0.0012580156326293945], [0.48972034454345703], [0.81987464427948], [-0.03148186206817627], [-0.7918610572814941], [-0.5839154720306396], [-0.767822265625], [0.7113058567047119], [0.2726396322250366], [0.6221826076507568], [-0.5667037963867188], [0.7418577671051025], [-0.7756330966949463], [-0.5128778219223022], [-0.8850120306015015], [0.8083392381668091], [0.7764592170715332], [0.9213329553604126], [0.9233143329620361], [-0.3030076026916504], [-0.29737353324890137]] sentiment=0

i=6 = virginamerica really missed a prime opportunity for men without hats parody there

i=6 w2v=[[0.0012580156326293945], [-0.5128778219223022], [-0.9224623441696167], [-0.2657853364944458], [-0.10833430290222168], [-0.5839154720306396], [0.42132532596588135], [0.7914493083953857], [0.4627121686935425], [-0.3580136299133301], [-0.7653474807739258]] sentiment=1

i=7 = virginamerica well i didn'tûbut now i do d

i=7 w2v=[[0.0012580156326293945], [0.28537607192993164], [-0.932395339012146], [-0.4359729290008545], [0.019797325134277344], [-0.932395339012146], [-0.1655644178390503], [0.6342606544494629]] sentiment=0

i=8 = virginamerica it was amazing and arrived an hour early you're too good to me

i=8 w2v=[[0.0012580156326293945], [0.3834136724472046], [-0.9266262054443359], [-0.0772627592086792], [-0.6498693227767944], [0.427449107170105], [0.07871925830841064], [0.5342621803283691], [-0.6033754348754883], [-0.6512038707733154], [-0.5308046340942383], [-0.4651916027069092], [0.3554692268371582], [0.500235915184021]] sentiment=1

i=9 = virginamerica did you know that suicide is the second leading cause of death among teens 1024

i=9 w2v=[[0.0012580156326293945], [-0.12914776802062988], [0.36882483959198], [0.20193088054656982], [0.7113058567047119], [-0.4647252559661865], [0.43231725692749023], [-0.8850120306015015], [0.8175731897354126], [0.1814650297164917], [0.9735549688339233], [0.8894059658050537], [-0.048635125160217285], [0.7589428424835205], [-0.8305487632751465]] sentiment=1

i=10 = virginamerica i lt3 pretty graphics so much better than minimal iconography d

i=10 w2v=[[0.0012580156326293945], [-0.932395339012146], [-0.6855369806289673], [0.13977575302124023], [-0.11634397506713867], [0.8503241539001465], [0.973355770111084], [-0.6604783535003662], [-0.5532848834991455], [0.3296027183532715], [0.6342606544494629]] sentiment=0





['[]'

 '[[0.0012580156326293945], [-0.6498260498046875], [-0.5153262615203857], [-0.020553112030029297], [-0.6892403364181519], [0.3554692268371582], [-0.8850120306015015], [-0.2642437219619751], [0.10036587715148926]]'

 '[[0.0012580156326293945], [-0.932395339012146], [0.2726396322250366], [-0.6015644073486328], [-0.01267862319946289], [-0.4461408853530884], [-0.932395339012146], [-0.946296215057373], [0.3554692268371582], [0.02413642406463623], [0.9904971122741699], [0.6381527185440063]]'

 '[[0.0012580156326293945], [-0.7756330966949463], [-0.5128778219223022], [-0.5935169458389282], [0.3554692268371582], [-0.942463755607605], [0.03211188316345215], [-0.1338428258895874], [0.7456883192062378], [0.9946664571762085], [-0.31516337394714355], [-0.22687816619873047], [0.2923257350921631], [0.6583085060119629], [0.6221826076507568], [-0.5303181409835815], [0.7077651023864746]]'

 '[[0.0012580156326293945], [-0.6498693227767944], [-0.7756330966949463], [-0.5128778219223022], [0.98853600025177], [0.7764592170715332], [0.9213329553604126], [0.9233143329620361], [0.3834136724472046]]'

 '[[0.0012580156326293945], [0.48972034454345703], [0.81987464427948], [-0.03148186206817627], [-0.7918610572814941], [-0.5839154720306396], [-0.767822265625], [0.7113058567047119], [0.2726396322250366], [0.6221826076507568], [-0.5667037963867188], [0.7418577671051025], [-0.7756330966949463], [-0.5128778219223022], [-0.8850120306015015], [0.8083392381668091], [0.7764592170715332], [0.9213329553604126], [0.9233143329620361], [-0.3030076026916504], [-0.29737353324890137]]'

 '[[0.0012580156326293945], [-0.5128778219223022], [-0.9224623441696167], [-0.2657853364944458], [-0.10833430290222168], [-0.5839154720306396], [0.42132532596588135], [0.7914493083953857], [0.4627121686935425], [-0.3580136299133301], [-0.7653474807739258]]'

 '[[0.0012580156326293945], [0.28537607192993164], [-0.932395339012146], [-0.4359729290008545], [0.019797325134277344], [-0.932395339012146], [-0.1655644178390503], [0.6342606544494629]]'

 '[[0.0012580156326293945], [0.3834136724472046], [-0.9266262054443359], [-0.0772627592086792], [-0.6498693227767944], [0.427449107170105], [0.07871925830841064], [0.5342621803283691], [-0.6033754348754883], [-0.6512038707733154], [-0.5308046340942383], [-0.4651916027069092], [0.3554692268371582], [0.500235915184021]]'

 '[[0.0012580156326293945], [-0.12914776802062988], [0.36882483959198], [0.20193088054656982], [0.7113058567047119], [-0.4647252559661865], [0.43231725692749023], [-0.8850120306015015], [0.8175731897354126], [0.1814650297164917], [0.9735549688339233], [0.8894059658050537], [-0.048635125160217285], [0.7589428424835205], [-0.8305487632751465]]'

 '[[0.0012580156326293945], [-0.932395339012146], [-0.6855369806289673], [0.13977575302124023], [-0.11634397506713867], [0.8503241539001465], [0.973355770111084], [-0.6604783535003662], [-0.5532848834991455], [0.3296027183532715], [0.6342606544494629]]']

['0' '1' '0' '0' '0' '0' '1' '0' '1' '1' '0']

------------




Shape of x_train: (8,)

Shape of y_train: (8,)

Shape of x_val: (3,)

Shape of y_val: (3,)

Shape of x_train_loaded: (8,)

Shape of x_val_loaded: (3,)

Shape of y_train_loaded: (8,)

Shape of y_val_loaded: (3,)