Privia Security was chosen as one of Türkiye's fastest growing companies!

Read the News Read the News
4 April 2021

Facebook Data Leak and Phone Numbers

Facebook Data Leak and Phone Numbers

Last Saturday, information was published on a hacker forum stating that Facebook data containing 533 million rows of personal data had been leaked. Some people began to make requests for this data through messaging applications such as Telegram, and the sample data they provided substantiated the situation. When we examined the forum screenshots, it was clearly shown that these were user data.

The data shared on this hacker forum was first distributed for a fee (initially at prices around 3-5 euros by being directed to buy credits on the forum) and afterwards began to be distributed free of charge on many sources where it could be downloaded. When we examined the data, it became clear that it contained personal data — primarily phone numbers, as well as email addresses, location, gender and similar information.

When we download the data, which was shared in compressed format, we see that it has been classified by country. The Facebook data, classified for 106 countries — Turkey among them — was shared with a total of 533 million 313 thousand 128 rows.

Number of rows: 533,313,128

Number of countries: 106

Data size: 15 GB

When we examine the Turkey user data, we see that the compressed file named Turkey.zip is 641 MB (672,381,545 bytes). When we extract this file from compressed format into text format, we see that the file named Turkey.txt reaches a size of 2.91 GB (3,129,423,237 bytes). This dataset of approximately 3 GB contains the personal data of Facebook users.

While the entire data file consists of 106 countries, it contains the personal information of more than 533 million Facebook users. We see that it contains phone numbers, Facebook identifiers (ID), name and surname information, locations, dates of birth, biographies and — in some rows (some users have emails) — email addresses as well.

Posts had appeared first in January, again on the same forum, stating that a bot had been developed which claimed to be able to pull this data for a set price. These posts said that a hacker could obtain this data and had automated the process through a bot they had written. Although many people did not believe this application could exist, some cyber security experts had made statements confirming the existence of this bot.

If you recall the first incident in which the phone numbers of Facebook users were exposed online, we can say that we are facing a similar situation. With the phone number vulnerability that emerged in 2019, it had also been proven that the data of millions of people could be obtained through their phone numbers.

Facebook had issued a statement indicating that this vulnerability was fixed in August 2019. However, this situation proves that user data was obtained by this method before the vulnerability was remediated.

Even though Facebook has fixed this vulnerability, when we examine the dataset, we can say that this data was collected through phone numbers. The fact that, although some data is missing for many users in the dataset, the phone number column appears filled in for all rows also confirms this thesis.

A database of this size, containing private information such as phone numbers and dates of birth, can absolutely and definitely be used by malicious cyber attackers. This information is especially valuable in social engineering attacks or targeted attack scenarios, and is used to lure victims into traps.

The format of the data in the dataset is as follows.

id, phone, first_name, last_name, email, birthday, gender, locale, hometown, location, link

When we examine the data in this format, we see that some country files have been published in zip, some in rar and some in 7z compression format. From this it is also possible to say that the data was collected at certain intervals or at different times, and even by different people. Another point that caught our attention was the coding of country names such as “Singapore1 File Size: 74.0 MB” and “Japan A File Size: 12.3 MB”. While this drew our attention to the fact that Singapore and Japan might have several pieces of data, only these files were included in this dataset and no continuation was seen.

Although there is not yet clear information about this dataset, it is emerging that this is old data produced/collected due to the mobile phone vulnerability reported in 2019. We would like to remind you that the collected data will not have aged much by 2021 and may be valuable particularly because it is critical data.

Of course, it is striking that the only data present completely in the dataset is mobile phone numbers. At the same time, when we look carefully at the writing format of the phone numbers, the fact that all numbers are in the +90 555 555 5555 format rather than in various different styles also confirms this explanation.

It has emerged that the attacker or attackers most likely automatically exploited this vulnerability reported to Facebook in 2019, and downloaded the data until Facebook closed it. It is, of course, also possible to say that this data has been exploited from 2019 until today, and has even been used in many cyber attacks.

We continue our analyses, reminding you that these leaked personal data may be misused for email or spam SMS, spam calls, robocalls, extortion attempts, threats, harassment and more. On the other hand, being sold to betting sites or e-commerce sites are also among the likely possibilities.

Row counts of the data files by country.

1 Afghanistan 558.393
2 Africa 14.323.766
3 Angola 50.889
4 Albania 506.602
5 Algeria 11.505.898
6 Argentina 2.347.553
7 Austria 1.249.388
8 Australia 7.320.478
9 Azerbaijan 99.472
10 Bahrain 1.450.124
11 Bangladesh 3.816.339
12 Belgium 3.183.584
13 Bolivia 2.959.209
14 Botswana 240.606
15 Brazil 8.064.916
16 Brunei 213.795
17 Bulgaria 432.473
18 Burkina Faso 6.413
19 Burundi 15.709
20 Cambodia 2.838
21 Cameroon 1.997.658
22 Canada 3.494.385
23 Chile 6.889.083
24 China 670.334
25 Colombia 17.957.908
26 Costa Rica 1.464.002
27 Croatia 659.115
28 Cyprus 152.321
29 Czech Republic 1.375.988
30 Denmark 639.841
31 Djibouti 14.327
32 Ecuador 310.259
33 Egypt 44.823.547
34 El Salvador 4.779
35 Estonia 87.533
36 Ethiopia 12.753
37 Fiji 5.364
38 Finland 1.381.569
39 France 19.848.559
40 Georgia 95.193
41 Germany 6.054.423
42 Ghana 1.027.969
43 Greece 617.722
44 Guatemala 1.645.068
45 Haiti 15.407
46 Honduras 16.142
47 Hong Kong 2.937.841
48 Hungary 377.045
49 Iceland 31.343
50 India 6.162.450
51 Indonesia 130.331
52 Iran 301.723
53 Iraq 17.116.398
54 Ireland 1.449.919
55 Israel 3.956.428
56 Italy 35.677.323
57 Jamaica 385.890
58 Japan 428.625
59 Jordan 3.105.988
60 Kazakhstan 3.214.990
61 Kuwait 4.468.134
62 Lebanon 1.829.661
63 Libya 4.204.514
64 Lithuania 220.160
65 Luxembourg 188.201
66 Macao 414.228
67 Malaysia 11.675.894
68 Maldives 86.337
69 Malta 115.366
70 Mauritius 848.558
71 Mexico 13.330.561
72 Moldova 46.237
73 Morocco 18.939.198
74 Namibia 409.356
75 Netherlands 5.430.388
76 Nigeria 9.000.131
77 Norway 475.809
78 Oman 5.048.532
79 Palestine 3.367.576
80 Panama 1.502.310
81 Peru 8.075.317
82 Philippine 879.699
83 Poland 2.669.381
84 Portugal 2.277.361
85 Puerto Rico 130.586
86 Qatar 2.526.694
87 Russia 9.996.405
88 Saudi Arabia 28.804.686
89 Serbia 162.898
90 Singapore 3.073.009
91 Slovenia 229.039
92 South Korea 121.744
93 Spain 10.894.206
94 Sudan 9.464.772
95 Sweden 1.092.140
96 Switzerland 1.592.039
97 Syria 6.939.528
98 Taiwan 734.807
99 Tunisia 39.526.412
100 Turkey 19.638.821
101 Turkmenistan 16.279
102 United Arab Emirates 6.978.927
103 United Kingdom 11.522.328
104 Uruguay 1.509.317
105 USA 32.315.282
106 Yemen 4.617.359

Total Data Count; 533,313,128.

The Leaked Turkey Facebook Data

We see that the leaked Turkey data consists of 19,638,821 rows. Since the column names appear in the first row of this data, it contains 19,638,820 user records.

root@PriviaSec:/home/PriviaSec# wc -l Turkey.txt
19.638.820 Turkey.txt

Possibly because of a first-row shift or some other reason, when we count the user mobile phone records using the grep command, this shows it to be 19,638,819.

root@PriviaSec:/home/PriviaSec# cat Turkey.txt | grep "+905" -c
19.638.819

When I sort and count these nearly twenty million mobile numbers by their first three digits, a table such as the one below is formed. We are talking about data of critical importance when the mobile phone number is combined with information such as gender and date of birth.

root@TEAkolikPC:/home/PriviaSec# cat Turkey.txt | grep "+90530" -c
1.231.329
root@TEAkolikPC:/home/PriviaSec# cat Turkey.txt | grep "+90531" -c
1.989.590
root@TEAkolikPC:/home/PriviaSec# cat Turkey.txt | grep "+90532" -c
1.991.915
root@TEAkolikPC:/home/PriviaSec# cat Turkey.txt | grep "+90533" -c
1.474.257
root@TEAkolikPC:/home/PriviaSec# cat Turkey.txt | grep "+90534" -c
1.973.911
root@TEAkolikPC:/home/PriviaSec# cat Turkey.txt | grep "+90535" -c
2.374.716
root@TEAkolikPC:/home/PriviaSec# cat Turkey.txt | grep "+90536" -c
2.210.386
root@TEAkolikPC:/home/PriviaSec# cat Turkey.txt | grep "+90537" -c
2.335.725
root@TEAkolikPC:/home/PriviaSec# cat Turkey.txt | grep "+90538" -c
2.157.846
root@TEAkolikPC:/home/PriviaSec# cat Turkey.txt | grep "+90539" -c
1.899.144

As can be seen above, all telephone numbers between 0530 and 0539 are listed. When we look at these numbers, they complete the data in total.

However, the point that draws our attention is that, apart from these numbers belonging to the Turkcell block, there are no numbers from the other Avea and Vodafone telephone blocks! That is, just as there are no numbers such as 0540, 0541, 0542, 0543, 0544 among these records, there are also no numbers such as 0554, 0555, 0556. Only the telephone block belonging to Turkcell appears in the dataset.

At this point it clearly shows that the cyber attackers pulled the data using the number blocks of the most-used operator in our country.

This also proves that the vulnerability is actually the one that emerged in 2019 for pulling personal data through phone numbers. Since no attempt was made for our other operators, no user data belonging to the number blocks of the other operators was seen. To confirm this situation, we examined the Azerbaijan data and this data likewise proves that the phone number block of the popularly used mobile operator was used.

In short, it has emerged that cyber attackers, by exploiting the vulnerability they discovered in Facebook’s “find your friend by phone number” feature, used the most-used phone number block in order to reach personal data.

Gender Distribution

root@TEAkolikPC:/home/teakolik# cat Turkey.txt | grep ",male," -c
13.338.168
root@TEAkolikPC:/home/teakolik# cat Turkey.txt | grep ",female," -c
5.463.127

Gender information has also been included in the data as male and female. When we list the male and female counts as above, we see that of this nearly 20 million-strong data, 13,338,168 are male and 5,463,127 are female profiles.

Username Usage

10000XXXXXXXX,+90537XXXXXX,MXXXXX,AXXXX,None,None,male,tr_TR,Aksehir,Location*,Aksehir,link*,https://www.facebook.com/username,,,,,,,,,,,,,,

10000XXXXXXXX,+90537XXXXXX,MXXXXX,TXXXX,None,None,male,tr_TR,None,Location*,None,link*,https://www.facebook.com/profile.php?id=1000XXXXXX

In the profile data, the column on the last row contains profile id and username information. At this point we infer from the ?id= data that some people use a username while others do not.

root@TEAkolikPC:/home/teakolik# cat Turkey.txt | grep "?id=" -c
5.679.125

When we filter this data, it turns out that 5,679,125 users do not use a username. The remaining roughly 14 million user profiles are indicated in this dataset as facebook.com/username.

Location Information

It also caught our attention that location information and the place of residence (hometown) information indicated on Facebook were included in the profile data. These data are listed as locale, hometown, location, link.

root@TEAkolikPC:/home/teakolik# cat Turkey.txt | grep ",None,Location*,None,link*," -c
9.920.320

When we filter the data as “None,Location*,None,link*,”, it has emerged that 9,920,320 profiles do not contain location information, but that the remaining large portion of around 10 million contains at least one piece of location information.

Email Addresses

It also caught our attention that in this dataset, email addresses could be obtained for some profiles. Within the dataset the @ sign appears only in email addresses. For this reason we reach the following information by filtering on the @ sign.

root@TEAkolikPC:/home/teakolik# cat Turkey.txt | grep "@" -c
179.803

When we filter and count by @ signs using the above command, 179,803 email addresses are detected. The most popular service providers for the 179,803 emails in the dataset are listed below. The presence of 179 thousand emails in a 20 million-strong dataset is also an important result for us.

root@TEAkolikPC:/home/teakolik# cat Turkey.txt | grep "@hotmail" -c
135.130
root@TEAkolikPC:/home/teakolik# cat Turkey.txt | grep "@gmail" -c
25.069
root@TEAkolikPC:/home/teakolik# cat Turkey.txt | grep "@yahoo" -c
2.967
root@TEAkolikPC:/home/teakolik# cat Turkey.txt | grep "@mynet" -c
2.059
root@TEAkolikPC:/home/teakolik# cat Turkey.txt | grep "@yandex" -c
345

As a result of this filtering, it caught our attention that in the vast majority of accounts the Hotmail email service was used, followed by Gmail and Yahoo. While there are 135,130 people using the Hotmail email service, we see 25,069 people using the Gmail service. Although very old, there are still 2,967 Yahoo mail addresses registered with Facebook.

TR Extension Email Addresses

When we filter the email addresses with the TR extension, we encounter a result such as the following. It has also emerged that gov.tr extension emails and edu.tr extension email addresses are present in this dataset. We would like to remind you that edu.tr extension emails are generally also used by students, while gov.tr extension emails are used only at state levels.

root@TEAkolikPC:/home/teakolik# cat Turkey.txt |egrep -E "@.{1,99}.com.tr" -c
2725
root@TEAkolikPC:/home/teakolik# cat Turkey.txt |egrep -E "@.{1,99}.net.tr" -c
28
root@TEAkolikPC:/home/teakolik# cat Turkey.txt |egrep -E "@.{1,99}.org.tr" -c
32
root@TEAkolikPC:/home/teakolik# cat Turkey.txt |egrep -E "@.{1,99}.edu.tr" -c
155
root@TEAkolikPC:/home/teakolik# cat Turkey.txt |egrep -E "@.{1,99}.gov.tr" -c
15

Facebook Language Encoding

At the same time, the language encoding is also present in this dataset. By filtering the language encoding as tr_TR as below, we reveal the number of people who use Facebook in the Turkish language.

root@TEAkolikPC:/home/teakolik# cat Turkey.txt | grep "tr_TR" -c
17.586.130

Looking at this figure, we see that 17 million people in the dataset use Facebook in the Turkish language, while the remaining roughly 2 million profiles use it in different languages.

Cities

Among those providing location information in the dataset, filtering for three of our cities has been done as below.

root@TEAkolikPC:/home/teakolik# egrep -iE "[I,İ,i,ı]stanbul" Turkey.txt -c
2.503.593
root@TEAkolikPC:/home/teakolik# egrep -iE Ankara Turkey.txt -c
596.453
root@TEAkolikPC:/home/teakolik# egrep -iE "[I,İ,i,ı]zm[I,İ,ı,i]r" Turkey.txt -c
513.927

According to this data filtering, in the location data, Istanbul stands out at the forefront, in second place we see Ankara and subsequently İzmir.

You May Be Interested In These