.eu website categorization
In December 2019 EURid developed an internal study to find out how people and businesses are using the .eu domains and its variants in other scripts. From a random sample of 100,000 domain names, the research team found that more than 70% of domains had active web services. Of those, more than 50% had user generated content – sites with multiple pages, and content clearly created by the user.
More than 20 European languages were found in the data sample, with English, German, Dutch, French, Polish, Italian and Czech among the most popular languages for web content. The research team also categorised the high-quality content by industry and found that four industry groups represented nearly half of the websites in the data sample: manufacturing, trade, publishing and information technology. Other categories included community groups, project management, leisure and entertainment, and tourism.
Next steps for the research team will be to refine their machine learning training data sets to automatically classify web pages in languages other than English.