Back to all jobs
Data Architect- Freelance & Remote Job

Data Architect- Freelance & Remote Job

  • Freelancing
  • Remote
  • 2 November 2020
  • 1 position

Unfortunately, this offer is no longer available. More job offers !

Presentation of the Company

Our client  is a video game development company based in Paris and Montreal.  It is a MMO space opera first-person RPG. In the video game, players can harvest materials by digging voxels, create buildings and ships,trade materials and other items with any player, explore planets and satellites, join communities and engage in spaceship combat. Every action performed by players happens in a persistent open world hosted on a single-shard shared server. Previously in alphatesting since 2019, the video game is now in open beta since August 2020.

Description of the mission

Current architecture

Our client generates high volumes of data for two reasons: players play in a large persistent open world formed by multiple planets, and the emergent gameplay allows a broad variety of playstyles. Moreover the development of the video game implies numerous technical challenges in rendering, balancing and design which requires precise analysis of data that must be complete, up-to-date and of high quality.

As of now, data are generated by 3 main sources:

  • Client data ;
  • Server data ;
  • Web data.

1. Client data

Client data are generated and sent by the game running on players’ PCs. Here are the steps of the data pipeline between PC clients and client’s data warehouse:

  • The client generates JSON files containing attributes of tracked gameplay events. Only a fraction of all events happening in the game engine are tracked ;
  • JSON files are sent by batches to an AWS S3 bucket every 10 minutes to avoid too frequent send queries ;
  • JSON files are cleaned and concatenated by a Python script ;
  • JSON files are sent to the Snowflake data warehouse using a Snowpipe process.

2. Server data

Server data is stored in multiple locations and comes from multiple sources. Server data include:

  • Gameplay events describing states of open world components (transactions, voxels, characters skills) on a PostgreSQL server ;
  • Time series of server health metrics (load balancing, queue time, concurrent users) stored in an Influxdb instance and visualized using Grafana ;
  • Server logs stored in an Elasticsearch instance and visualized using Kibana.

3. Web data

Web data includes:

  • Anonymized personal information of gamer profiles from the game website
  • Payment data form Xsolla platform
  • Marketing data from social networks and other platforms
  • Web data is stored on the same PostgreSQL server as server data.

Below is an outline of current architecture :

Current issues

The current use of data in analytics has some flaws that create efficiency issues:

  • No ETL tool is plugged into Snowflake. There is no reliable way to create, edit or schedule processes that transform data into the Snowflake data warehouse.
  • The Snowflake data warehouse only features a fact table containing all client events.
  • There is no dimension table or referential table yet.
  • As a result, all analytics dashboards are based on client events, and were created during the phase when the game was in alpha. Updating all analytics dashboards to
    beta phase would require time to fix all broken visualizations.

Main Goals of the mission

The study will tackle two bottlenecks of the current architecture:

  • Rebuilding a more reliable and scalable data pipeline between the game clients and the Snowflake data warehouse. The designed solution should be able to handle a much heavier load than the current solution, with good speed and cost performance.
  • Choosing and deploying a data transfer tool between the PostgreSQL server and the Snowflake data warehouse. The tool should be adapted to the technical architecture designed so far, and ideally allow also transferring select data from the Influxdb instance to Snowflake as well.
  • Should the suggested solutions comply with reliability, scalability, speed and cost constraints, the study will include deployment of these solutions into production environment.
  • Eventually,it would be best to merge all data useful for analytics, business intelligence and data science. The Snowflake data warehouse seems like a solution of choice for this central hosting, and in this case the target architecture is outlined below.

Profile

  • Data Architect with at least 2 years of experience on this kind of project/stack
  • Technical stack: Python, AWS (Redshift S3), PostGreSQL, Snowflake, Influxdb,
  • Remote position
  • Proficiency in English is also required.

Unfortunately, this offer is no longer available.