Dutch Product Price Analysis

In the following notebook an analysis method using Python is presented for exploring and understanding the changes in prices of different items on different websites over time. Product prices are known to vary both on- and offline, depending on the characteristics of potential customers. However, in addition to purposeful manual price manipulation, during which human sellers actively modify prices (e.g. for sales), recent years have seen an increase in online automatic price updates by AI technologies. Algorithmic pricing aims to maximise seller profits by using probabilistic concepts, competition details, user information and user history to exploit business opportunities. For example, the technologies of a company could monitor an item's price across multiple platforms and then optimise the the company's item price by updating it to be lower than all the prices on the other websites. Algorithmic pricing is not a strategy but rather a toolkit for ensuring pricing strategies remain optimal and efficient. However, the exact effect of these technologies on the price management process is not always straightforward. As automated technolgies are employed more and more, expecially by large and established brands, certain markets can become monopolised, restricing access to new and innovative additions. The exact existence and the effect of algorithmic pricing have not properly been researched; thus this notebook provides an initial analysis as well as methods that can be used to assist further investigations into the phenomenon.

The analysis is divided into five main sections:

  1. Data Extraction
  2. Data reading and processing
  3. Descriptive statistics
  4. Investigation into a specific website
  5. Investigation into a specific item
  6. Investigation into a specific product

Key Insights Summary

This specific analysis was conducted on data extracted from x websites between date and date 2021.

1. Data Extraction

The price data for the various products across websites was recorded over time and extracted as follows.

In [1]:
# Add scraper: !python <name>.py

2. Data Reading and Processing

The next step required in order to read the data in Python, is the importation of Pandas, a free, open source data manipulation tool. Furthermore, in preparation for the rest of the analysis, import numpy (which provides access to many mathematical functions and further data manipulation) and matplotlib (an important visualization library).

In [2]:
import pandas as pd
import numpy as np
from pandas import Series, DataFrame
import matplotlib.pyplot as plt

The data - in this case as a csv file - is loaded using Pandas and the appropriate access path. The delimiter is added here to assist with the correct reading of the data.

In [3]:
dataset = pd.read_csv(r"/Users/lsr/Documents/dutch_product_prices.csv", delimiter=';')

The 'head' function provides an overview of the data's different variables (columns). The 'shape' command returns the number of rows and columns in the dataset.

In [4]:
dataset.head()
Out[4]:
website item product date time price
0 Coolblue laptop product1 2021-06-01 00:00:00 morning 29,44
1 Coolblue steelstofzuiger product1 2021-06-01 00:00:00 morning 16,93
2 Coolblue vaatwasser product1 2021-06-01 00:00:00 morning 29,43
3 Coolblue printer product1 2021-06-01 00:00:00 morning 35,84
4 Coolblue loopband product1 2021-06-01 00:00:00 morning 37,06
In [5]:
dataset.shape
Out[5]:
(22320, 6)

Next, we confirm the variable values have the appropriate data types for our analysis. We can see below that all variables have the 'object' type. We want 'price' to be recognised as a float (decimal point number) and 'date' to be recognised as a datetime data type. Therefore, we use the following commands to reset the data types of these variables.

In [6]:
dataset.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 22320 entries, 0 to 22319
Data columns (total 6 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   website  22320 non-null  object
 1   item     22320 non-null  object
 2   product  22320 non-null  object
 3   date     22320 non-null  object
 4   time     22320 non-null  object
 5   price    22320 non-null  object
dtypes: object(6)
memory usage: 1.0+ MB
In [7]:
def conversion(value):
    new_value=value.replace(',','.')
    return float(new_value)
dataset['price']=dataset['price'].apply(conversion)
In [8]:
dataset.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 22320 entries, 0 to 22319
Data columns (total 6 columns):
 #   Column   Non-Null Count  Dtype  
---  ------   --------------  -----  
 0   website  22320 non-null  object 
 1   item     22320 non-null  object 
 2   product  22320 non-null  object 
 3   date     22320 non-null  object 
 4   time     22320 non-null  object 
 5   price    22320 non-null  float64
dtypes: float64(1), object(5)
memory usage: 1.0+ MB
In [9]:
dataset['date']= pd.to_datetime(dataset['date'])
In [10]:
dataset.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 22320 entries, 0 to 22319
Data columns (total 6 columns):
 #   Column   Non-Null Count  Dtype         
---  ------   --------------  -----         
 0   website  22320 non-null  object        
 1   item     22320 non-null  object        
 2   product  22320 non-null  object        
 3   date     22320 non-null  datetime64[ns]
 4   time     22320 non-null  object        
 5   price    22320 non-null  float64       
dtypes: datetime64[ns](1), float64(1), object(4)
memory usage: 1.0+ MB

3. Descriptive Statistics

The following section focuses on the statistical methodologies used to understand the nature of the data and identify patterns within it. The aim is to use the insights generated to specify further investigations into price differences.

To begin, examine the descriptive statistics of the different product prices.

In [11]:
dataset.groupby([dataset['website'], dataset['item']]).describe()
Out[11]:
price
count mean std min 25% 50% 75% max
website item
AH bier 372.0 10.036909 4.804548 0.30 6.8750 9.955 13.5200 22.76
boter 372.0 10.473871 4.845388 0.15 6.9425 10.390 13.6675 24.03
brood 372.0 9.996909 4.889011 0.03 6.7375 9.830 13.1750 26.71
kaas 372.0 10.108683 4.532393 0.02 7.2600 10.140 12.9750 23.57
melk 372.0 10.527151 4.977114 0.11 7.3050 10.230 13.5725 30.24
rijst 372.0 9.759892 4.877919 0.14 6.5475 9.295 12.8825 23.39
shampoo 372.0 9.701720 4.434210 0.04 6.6650 9.535 12.8925 22.50
tandpasta 372.0 10.490081 4.592223 0.11 7.3525 10.280 13.6175 28.40
toiletpapier 372.0 10.179462 4.975054 0.05 6.7025 10.155 13.6450 25.26
waspoeder 372.0 10.094274 4.942366 0.04 6.7075 9.980 13.4550 21.72
Bonprix badpak 372.0 10.162177 5.810105 0.00 5.0700 10.215 15.1800 19.91
jurk 372.0 9.889382 5.721596 0.05 4.6475 10.010 14.3575 19.99
overhemd 372.0 10.279866 5.641259 0.09 5.5425 10.045 15.2675 19.98
pyjama 372.0 10.185860 5.694142 0.02 5.5250 10.050 15.2275 19.95
rok 372.0 9.982500 5.759994 0.02 5.2100 9.755 14.7600 19.98
rugzak 372.0 10.661505 5.487109 0.23 6.4150 11.155 15.2900 19.99
slaapzak 372.0 10.235242 5.737792 0.00 5.4150 10.440 15.2875 20.00
spiegel 372.0 10.080242 5.876740 0.08 5.0800 9.625 15.3825 19.95
tafellaken 372.0 10.104167 5.712087 0.04 5.0225 9.795 14.9075 19.99
vloerkleed 372.0 9.833011 5.719746 0.10 4.6750 9.855 14.7000 19.96
Coolblue grasmaaier 372.0 36.397903 14.588670 1.03 26.8400 36.580 45.6525 86.68
koelkast 372.0 34.925645 14.204794 1.17 25.4250 34.170 45.4225 79.75
laptop 372.0 34.463925 15.297022 0.36 23.9500 34.750 44.1475 84.83
loopband 372.0 34.988817 14.814872 0.38 24.8175 35.855 45.3725 71.90
magnetron 372.0 34.650242 14.622001 0.78 25.0000 34.185 43.7375 75.61
printer 372.0 35.301613 14.502443 1.42 25.3300 35.040 45.4850 74.99
steelstofzuiger 372.0 34.335672 14.812585 0.73 24.8350 33.895 44.8000 74.63
tondeuse 372.0 35.081855 14.471116 1.25 25.2000 34.070 44.4275 81.58
vaatwasser 372.0 34.385780 15.410738 0.21 23.6700 33.920 45.3850 84.17
wasmachine 372.0 34.252419 14.581221 0.84 25.1025 34.795 42.9075 81.12
Kruidvat babyvoeding 372.0 10.044220 5.862755 0.07 5.4550 9.785 15.4150 19.97
douchegel 372.0 10.400672 5.659698 0.03 5.5975 10.615 15.2725 19.98
haarborstel 372.0 10.062769 5.627118 0.04 4.8675 10.155 14.7850 19.91
haarspeld 372.0 10.281505 5.676836 0.13 5.4975 10.090 14.8225 19.97
mascara 372.0 10.036183 5.940321 0.06 4.9250 9.995 15.5825 19.96
scheerapparaat 372.0 10.386263 5.780597 0.16 5.3950 10.615 15.2100 19.95
shampoo 372.0 10.084247 5.978820 0.01 5.2225 9.935 15.3775 19.99
toiletpapier 372.0 9.914140 6.032613 0.09 4.4300 9.880 15.1325 19.96
waspoeder 372.0 10.355188 5.664762 0.11 5.6725 10.435 15.0875 19.88
wasverzachter 372.0 9.593629 5.711350 0.02 4.4100 9.350 14.1150 19.79
Lidl autostoel 372.0 10.282473 4.691365 0.01 7.0175 10.280 13.5425 24.22
badkamerkast 372.0 10.178925 4.934339 0.03 6.6725 9.910 13.4125 23.44
haardroger 372.0 10.034812 4.769037 0.09 6.8050 10.080 13.0150 23.93
koelkast 372.0 10.177231 4.688803 0.11 7.0400 10.050 13.3275 23.92
koptelefoon 372.0 10.179032 4.946515 0.13 6.7050 10.030 13.3225 27.54
ladekast 372.0 10.173629 4.898529 0.08 6.6400 10.345 13.5100 27.35
loopfiets 372.0 10.279220 4.521176 0.04 7.2550 10.235 13.0775 22.18
rugzak 372.0 10.393360 4.640009 0.58 7.0925 10.500 13.4650 23.10
slaapzak 372.0 10.422392 4.896548 0.04 6.7925 10.725 13.9400 23.32
vloerkleed 372.0 10.115430 4.912830 0.10 6.7075 9.820 13.4700 28.23
Mediamarkt grasmaaier 372.0 35.036129 13.905600 2.76 23.8775 35.370 45.2525 73.90
koelkast 372.0 36.024462 15.268357 0.49 26.3675 36.445 45.6475 89.54
laptop 372.0 35.258548 16.528527 1.33 23.6375 35.230 45.9400 88.13
loopband 372.0 34.748091 14.785125 2.81 24.1650 35.075 45.1150 82.49
magnetron 372.0 35.816183 15.679800 1.52 25.7300 35.935 45.2150 95.76
printer 372.0 35.865565 14.819318 0.45 25.4125 36.440 46.3550 73.43
steelstofzuiger 372.0 35.071022 15.490778 1.11 24.7950 34.985 44.8900 83.19
tondeuse 372.0 35.356747 14.708947 1.46 24.4125 35.325 44.9650 81.98
vaatwasser 372.0 35.694677 14.358997 0.60 25.2650 35.780 45.1100 75.94
wasmachine 372.0 34.378199 14.712262 0.98 22.9950 33.555 44.0125 89.45

A nice visualisation of the descriptive statistics can be seen obtained using boxplots. Here we can visually compare the distribution of prices for different items on each website.

In [12]:
ax = dataset.groupby(dataset['website']).boxplot(by = 'item', vert=False, figsize = (19,19), fontsize=8)
plt.tight_layout()

Every item has its own price range. To create a new variable for the range, calculated as max-min, once first needs to save the descriptive statistics as a manipulatable table and then make a few changes regarding indices and data types.

In [13]:
summary_table = dataset.groupby([dataset['website'], dataset['item']]).describe().reset_index()
In [14]:
print(summary_table.columns)
MultiIndex([('website',      ''),
            (   'item',      ''),
            (  'price', 'count'),
            (  'price',  'mean'),
            (  'price',   'std'),
            (  'price',   'min'),
            (  'price',   '25%'),
            (  'price',   '50%'),
            (  'price',   '75%'),
            (  'price',   'max')],
           )

As seen above, the columns have MultiIndex indices. It is more convenient to have single indices.

In [15]:
summary_table.columns = ['_'.join(col) for col in summary_table.columns.values]
In [16]:
print(summary_table.columns)
Index(['website_', 'item_', 'price_count', 'price_mean', 'price_std',
       'price_min', 'price_25%', 'price_50%', 'price_75%', 'price_max'],
      dtype='object')
In [17]:
summary_table['price_max'] = summary_table['price_max'].astype(float)
summary_table['price_min'] = summary_table['price_min'].astype(float)

Now the variable 'range' can be calcuated and added.

In [18]:
summary_table.insert(10,"range", (summary_table['price_max']-summary_table['price_min']))
In [19]:
summary_table.head()
Out[19]:
website_ item_ price_count price_mean price_std price_min price_25% price_50% price_75% price_max range
0 AH bier 372.0 10.036909 4.804548 0.30 6.8750 9.955 13.5200 22.76 22.46
1 AH boter 372.0 10.473871 4.845388 0.15 6.9425 10.390 13.6675 24.03 23.88
2 AH brood 372.0 9.996909 4.889011 0.03 6.7375 9.830 13.1750 26.71 26.68
3 AH kaas 372.0 10.108683 4.532393 0.02 7.2600 10.140 12.9750 23.57 23.55
4 AH melk 372.0 10.527151 4.977114 0.11 7.3050 10.230 13.5725 30.24 30.13

It is interesting to separate the products with price ranges of zero, meaning their prices remain exactly constant over time, from the products whose price ranges are non-zero, meaning their prices fluctuate over time. First we count and identify the products with constant prices:

In [20]:
constant_prices = summary_table
constant_prices = constant_prices.loc[constant_prices["range"] == 0]
len(constant_prices) # This counts the number of products
Out[20]:
0
In [21]:
constant_prices
Out[21]:
website_ item_ price_count price_mean price_std price_min price_25% price_50% price_75% price_max range

Now we examine the products with fluctuating prices over time:

In [22]:
changing_prices = summary_table
changing_prices = changing_prices.loc[changing_prices["range"] != 0]
len(changing_prices)
Out[22]:
60
In [23]:
changing_prices
Out[23]:
website_ item_ price_count price_mean price_std price_min price_25% price_50% price_75% price_max range
0 AH bier 372.0 10.036909 4.804548 0.30 6.8750 9.955 13.5200 22.76 22.46
1 AH boter 372.0 10.473871 4.845388 0.15 6.9425 10.390 13.6675 24.03 23.88
2 AH brood 372.0 9.996909 4.889011 0.03 6.7375 9.830 13.1750 26.71 26.68
3 AH kaas 372.0 10.108683 4.532393 0.02 7.2600 10.140 12.9750 23.57 23.55
4 AH melk 372.0 10.527151 4.977114 0.11 7.3050 10.230 13.5725 30.24 30.13
5 AH rijst 372.0 9.759892 4.877919 0.14 6.5475 9.295 12.8825 23.39 23.25
6 AH shampoo 372.0 9.701720 4.434210 0.04 6.6650 9.535 12.8925 22.50 22.46
7 AH tandpasta 372.0 10.490081 4.592223 0.11 7.3525 10.280 13.6175 28.40 28.29
8 AH toiletpapier 372.0 10.179462 4.975054 0.05 6.7025 10.155 13.6450 25.26 25.21
9 AH waspoeder 372.0 10.094274 4.942366 0.04 6.7075 9.980 13.4550 21.72 21.68
10 Bonprix badpak 372.0 10.162177 5.810105 0.00 5.0700 10.215 15.1800 19.91 19.91
11 Bonprix jurk 372.0 9.889382 5.721596 0.05 4.6475 10.010 14.3575 19.99 19.94
12 Bonprix overhemd 372.0 10.279866 5.641259 0.09 5.5425 10.045 15.2675 19.98 19.89
13 Bonprix pyjama 372.0 10.185860 5.694142 0.02 5.5250 10.050 15.2275 19.95 19.93
14 Bonprix rok 372.0 9.982500 5.759994 0.02 5.2100 9.755 14.7600 19.98 19.96
15 Bonprix rugzak 372.0 10.661505 5.487109 0.23 6.4150 11.155 15.2900 19.99 19.76
16 Bonprix slaapzak 372.0 10.235242 5.737792 0.00 5.4150 10.440 15.2875 20.00 20.00
17 Bonprix spiegel 372.0 10.080242 5.876740 0.08 5.0800 9.625 15.3825 19.95 19.87
18 Bonprix tafellaken 372.0 10.104167 5.712087 0.04 5.0225 9.795 14.9075 19.99 19.95
19 Bonprix vloerkleed 372.0 9.833011 5.719746 0.10 4.6750 9.855 14.7000 19.96 19.86
20 Coolblue grasmaaier 372.0 36.397903 14.588670 1.03 26.8400 36.580 45.6525 86.68 85.65
21 Coolblue koelkast 372.0 34.925645 14.204794 1.17 25.4250 34.170 45.4225 79.75 78.58
22 Coolblue laptop 372.0 34.463925 15.297022 0.36 23.9500 34.750 44.1475 84.83 84.47
23 Coolblue loopband 372.0 34.988817 14.814872 0.38 24.8175 35.855 45.3725 71.90 71.52
24 Coolblue magnetron 372.0 34.650242 14.622001 0.78 25.0000 34.185 43.7375 75.61 74.83
25 Coolblue printer 372.0 35.301613 14.502443 1.42 25.3300 35.040 45.4850 74.99 73.57
26 Coolblue steelstofzuiger 372.0 34.335672 14.812585 0.73 24.8350 33.895 44.8000 74.63 73.90
27 Coolblue tondeuse 372.0 35.081855 14.471116 1.25 25.2000 34.070 44.4275 81.58 80.33
28 Coolblue vaatwasser 372.0 34.385780 15.410738 0.21 23.6700 33.920 45.3850 84.17 83.96
29 Coolblue wasmachine 372.0 34.252419 14.581221 0.84 25.1025 34.795 42.9075 81.12 80.28
30 Kruidvat babyvoeding 372.0 10.044220 5.862755 0.07 5.4550 9.785 15.4150 19.97 19.90
31 Kruidvat douchegel 372.0 10.400672 5.659698 0.03 5.5975 10.615 15.2725 19.98 19.95
32 Kruidvat haarborstel 372.0 10.062769 5.627118 0.04 4.8675 10.155 14.7850 19.91 19.87
33 Kruidvat haarspeld 372.0 10.281505 5.676836 0.13 5.4975 10.090 14.8225 19.97 19.84
34 Kruidvat mascara 372.0 10.036183 5.940321 0.06 4.9250 9.995 15.5825 19.96 19.90
35 Kruidvat scheerapparaat 372.0 10.386263 5.780597 0.16 5.3950 10.615 15.2100 19.95 19.79
36 Kruidvat shampoo 372.0 10.084247 5.978820 0.01 5.2225 9.935 15.3775 19.99 19.98
37 Kruidvat toiletpapier 372.0 9.914140 6.032613 0.09 4.4300 9.880 15.1325 19.96 19.87
38 Kruidvat waspoeder 372.0 10.355188 5.664762 0.11 5.6725 10.435 15.0875 19.88 19.77
39 Kruidvat wasverzachter 372.0 9.593629 5.711350 0.02 4.4100 9.350 14.1150 19.79 19.77
40 Lidl autostoel 372.0 10.282473 4.691365 0.01 7.0175 10.280 13.5425 24.22 24.21
41 Lidl badkamerkast 372.0 10.178925 4.934339 0.03 6.6725 9.910 13.4125 23.44 23.41
42 Lidl haardroger 372.0 10.034812 4.769037 0.09 6.8050 10.080 13.0150 23.93 23.84
43 Lidl koelkast 372.0 10.177231 4.688803 0.11 7.0400 10.050 13.3275 23.92 23.81
44 Lidl koptelefoon 372.0 10.179032 4.946515 0.13 6.7050 10.030 13.3225 27.54 27.41
45 Lidl ladekast 372.0 10.173629 4.898529 0.08 6.6400 10.345 13.5100 27.35 27.27
46 Lidl loopfiets 372.0 10.279220 4.521176 0.04 7.2550 10.235 13.0775 22.18 22.14
47 Lidl rugzak 372.0 10.393360 4.640009 0.58 7.0925 10.500 13.4650 23.10 22.52
48 Lidl slaapzak 372.0 10.422392 4.896548 0.04 6.7925 10.725 13.9400 23.32 23.28
49 Lidl vloerkleed 372.0 10.115430 4.912830 0.10 6.7075 9.820 13.4700 28.23 28.13
50 Mediamarkt grasmaaier 372.0 35.036129 13.905600 2.76 23.8775 35.370 45.2525 73.90 71.14
51 Mediamarkt koelkast 372.0 36.024462 15.268357 0.49 26.3675 36.445 45.6475 89.54 89.05
52 Mediamarkt laptop 372.0 35.258548 16.528527 1.33 23.6375 35.230 45.9400 88.13 86.80
53 Mediamarkt loopband 372.0 34.748091 14.785125 2.81 24.1650 35.075 45.1150 82.49 79.68
54 Mediamarkt magnetron 372.0 35.816183 15.679800 1.52 25.7300 35.935 45.2150 95.76 94.24
55 Mediamarkt printer 372.0 35.865565 14.819318 0.45 25.4125 36.440 46.3550 73.43 72.98
56 Mediamarkt steelstofzuiger 372.0 35.071022 15.490778 1.11 24.7950 34.985 44.8900 83.19 82.08
57 Mediamarkt tondeuse 372.0 35.356747 14.708947 1.46 24.4125 35.325 44.9650 81.98 80.52
58 Mediamarkt vaatwasser 372.0 35.694677 14.358997 0.60 25.2650 35.780 45.1100 75.94 75.34
59 Mediamarkt wasmachine 372.0 34.378199 14.712262 0.98 22.9950 33.555 44.0125 89.45 88.47

To obtain a comparable visualization for each product's price range, we can create a bar graph.

In [24]:
changing_prices['product'] = changing_prices['website_'] + ' ' + changing_prices['item_']
In [25]:
ax = changing_prices.plot(kind = 'bar', x='product', y='range', figsize = (19,8))
ax.set_ylabel('Price Range')
ax.set_xlabel('Product')
ax.set_title('Price ranges for different products with fluctuating prices')
Out[25]:
Text(0.5, 1.0, 'Price ranges for different products with fluctuating prices')

One might want to compare the average price ranges recorded for the different websites.

In [26]:
changing_prices["range"].groupby(changing_prices["website_"]).mean()
Out[26]:
website_
AH            24.759
Bonprix       19.907
Coolblue      78.709
Kruidvat      19.864
Lidl          24.602
Mediamarkt    82.030
Name: range, dtype: float64

This already gives some indication into the websites most and least likely to be employing algorithmic pricing techologies. However, the collection of items recorded for each website are not the same across websites - for example, CoolBlue's recorded items include a laptop while Kruidvat's recorded items do not. Thus, the comparison between overall average price range for the various websites should be interpreted carefully. It may be more meaningful to compare the average price ranges for the same items across different websites, rather than different items across the same websites.

In [27]:
changing_prices["range"].groupby(changing_prices["item_"]).mean()
Out[27]:
item_
autostoel          24.210000
babyvoeding        19.900000
badkamerkast       23.410000
badpak             19.910000
bier               22.460000
boter              23.880000
brood              26.680000
douchegel          19.950000
grasmaaier         78.395000
haarborstel        19.870000
haardroger         23.840000
haarspeld          19.840000
jurk               19.940000
kaas               23.550000
koelkast           63.813333
koptelefoon        27.410000
ladekast           27.270000
laptop             85.635000
loopband           75.600000
loopfiets          22.140000
magnetron          84.535000
mascara            19.900000
melk               30.130000
overhemd           19.890000
printer            73.275000
pyjama             19.930000
rijst              23.250000
rok                19.960000
rugzak             21.140000
scheerapparaat     19.790000
shampoo            21.220000
slaapzak           21.640000
spiegel            19.870000
steelstofzuiger    77.990000
tafellaken         19.950000
tandpasta          28.290000
toiletpapier       22.540000
tondeuse           80.425000
vaatwasser         79.650000
vloerkleed         23.995000
wasmachine         84.375000
waspoeder          20.725000
wasverzachter      19.770000
Name: range, dtype: float64

Here we can see that certain items have large price fluctuations, while other items' prices are far more stable.

It may be suspected that there is a correlation between the price levels of the items and the corresponding price ranges. For example, the more expensive an item, the larger its price range (or vice versa). To determine this, we calculate the correlation between the prices (first maxima and then minima) and the ranges.

In [28]:
x = changing_prices["range"].values
y = changing_prices["price_max"].values
r = np.corrcoef(x, y)
r
Out[28]:
array([[1.        , 0.99989205],
       [0.99989205, 1.        ]])
In [29]:
x = changing_prices["range"].values
y = changing_prices["price_min"].values
r = np.corrcoef(x, y)
r
Out[29]:
array([[1.        , 0.74321392],
       [0.74321392, 1.        ]])

Comment on the correlation

4. Investigation into a specific website

Based on the previous section, or perhaps independently, one might want to investigate one specific website and the fluctuation in prices of items offered by it. To begin, specifiy the website's name. This can be changed at any point and the analysis rerun.

In [30]:
individual_website = dataset.loc[(dataset['website']== 'Coolblue')]

We can take a look again at the boxplots to visualize and compare the overall price differences for each item on the website.

In [31]:
ax = individual_website.boxplot(by = 'item', figsize = (19,8))
ax.set_ylabel('Price')
ax.set_xlabel('Item')
Out[31]:
Text(0.5, 0, 'Item')

\ A meaningful investigation explores how the website's prices for each item changed over time; thus time series plots are included. These can be created for all items individually, or all items can be plotted on one set of axes.

In [32]:
item_array = dataset["item"].unique()
In [33]:
web_items = ["0"]* (len(item_array))
for i in range (len(item_array)):
    item_it = individual_website.loc[(individual_website["item"]== item_array[i])]
    web_items[i] = item_it
In [34]:
for i in range (len(item_array)):
    plt.plot(web_items[i]["date"],  web_items[i]["price"])
plt.xlabel('Date')
plt.ylabel('Prices')
plt.title('Time Series plots for Multiple Items')
# Change plot size?
Out[34]:
Text(0.5, 1.0, 'Time Series plots for Multiple Items')
In [35]:
ay = individual_website.groupby(individual_website['item']).plot(kind='scatter', x='date', y='price', figsize =(20,6))
# How to set titles to show the items for each scatterplot?

az = individual_website.groupby(individual_website['item']).plot(kind='line', x='date', y='price', figsize =(20,6))

\ Comment on the time series trends.

5. Investigation into a specific item

Certain items appear on different websites, such as a laptop or grasmaaier. It is interesting to explore the level and consistency of the prices of these items across different brands.

Specify the item of interest:

In [36]:
individual_item = dataset.loc[(dataset['item']== 'laptop')]
In [37]:
individual_item.describe()
Out[37]:
price
count 744.000000
mean 34.861237
std 15.918929
min 0.360000
25% 23.650000
50% 34.930000
75% 45.095000
max 88.130000

To compare the item's price distributions on different websites, one can consider boxplots, violin plots and side-by-side histograms.

In [38]:
ax = individual_item.boxplot(by = 'website', figsize = (19,8))
ax.set_ylabel('Price')
ax.set_xlabel('Website')
Out[38]:
Text(0.5, 0, 'Website')
In [39]:
web_array = individual_item["website"].unique()

item_webs = ["0"]* (len(web_array))

for i in range (len(web_array)):
    web_it = individual_item.loc[(individual_item["website"]== web_array[i])]
    item_webs[i] = web_it
In [40]:
for i in range (len(item_webs)):
    x = item_webs[i]["price"]
    fig = plt.figure()
    ax = fig.add_axes([0,0,1,1])
    bp = ax.violinplot(x)
    plt.show()
In [41]:
x = ["0"]* (len(item_webs))
for i in range (len(item_webs)):
    x[i] = item_webs[i]["price"]
    plt.hist(x[i])
    plt.xlabel('Prices')
    plt.ylabel('Frequency')
    plt.title('Side-by-Side Histogram with Multiple Websites')
    
# Hmmmm this should not overlap but rather be side-by-side
In [42]:
# Want the above to look like this
x1 = list(individual_item[individual_item["website"]== "Coolblue"]["price"])
x2 = list(individual_item[individual_item["website"]== "Mediamarkt"]["price"])
names = ["Coolblue", "Mediamarkt"]
plt.hist([x1, x2], label=names)
plt.legend()
plt.xlabel('Prices')
plt.ylabel('Frequency')
plt.title('Side-by-Side Histogram with Multiple Items')
Out[42]:
Text(0.5, 1.0, 'Side-by-Side Histogram with Multiple Items')

Time series plots are again meaningful to notice trends and patterns over time.

In [43]:
for i in range (len(web_array)):
    plt.plot(item_webs[i]["date"],  item_webs[i]["price"])
plt.xlabel('Date')
plt.ylabel('Prices')
plt.title('Time Series plots for Multiple Websites')
# Change plot size?
Out[43]:
Text(0.5, 1.0, 'Time Series plots for Multiple Websites')

\ Or if one wants all the time series plots on separate axes:

In [44]:
ay = individual_item.groupby(individual_item['website']).plot(kind='scatter', x='date', y='price', figsize =(20,6))

az = individual_item.groupby(individual_item['website']).plot(kind='line', x='date', y='price', figsize =(20,6))

5. Investigation into a specific product

Certain items on one website might warrant some individual investigation. To begin we consider a descriptive summary.

In [45]:
dataset['product'] = dataset['website'] + ' ' + dataset['item']
individual_product = dataset.loc[(dataset['product']== 'Coolblue laptop')]
individual_product.describe()
Out[45]:
price
count 372.000000
mean 34.463925
std 15.297022
min 0.360000
25% 23.950000
50% 34.750000
75% 44.147500
max 84.830000

For the product, we calculate the product's average price for each day:

In [50]:
daily_mean = individual_product.groupby(individual_product["date"]).mean()
dates_array = dataset["date"].unique()
plt.plot(dates_array, daily_mean)
Out[50]:
[<matplotlib.lines.Line2D at 0x12757c850>]

Then, using the average prices each day, we calculate the rates of change from one day to the next, and plot these.

In [58]:
# I think the rates calculation could probably be simplified...
mean = [0]* len(dates_array)
for i in range (len(dates_array)):
    prod_date = individual_product.loc[(individual_product["date"]== dates_array[i])]
    mean[i] = prod_date["price"].mean()
rates = [0]* (len(dates_array)-1)
for i in range (len(dates_array)-1):
    each_rate = mean[i+1]/mean[i]*100
    rates[i] = each_rate
plt.plot(dates_array[1:len(dates_array)],rates)
Out[58]:
[<matplotlib.lines.Line2D at 0x127852c90>]

Again we can also consider boxplots, violin plots, histograms and density graphs to understand the price distribution.

In [59]:
ax = individual_product.boxplot(figsize = (19,8))
ax.set_ylabel('Price')
Out[59]:
Text(0, 0.5, 'Price')
In [60]:
x = individual_product["price"]
data_to_plot = [x]
fig = plt.figure()
ax = fig.add_axes([0,0,1,1])
bp = ax.violinplot(data_to_plot)
plt.show()
In [61]:
plt.hist(individual_product["price"])
plt.xlabel('Prices')
plt.ylabel('Frequency')
Out[61]:
Text(0, 0.5, 'Frequency')
In [62]:
product_mean = individual_product['price'].mean()
product_median = individual_product['price'].median()
ax = individual_product['price'].plot(kind='density', figsize = (14,6))
ax.axvline(product_mean, color = 'red')
ax.axvline(product_median, color = 'green')
Out[62]:
<matplotlib.lines.Line2D at 0x127560fd0>
In [63]:
ay = individual_product.plot(kind='scatter', x='date', y='price', figsize =(20,6))
az = individual_product.plot(kind='line', x='date', y='price', figsize =(20,6))