Data Analysis using AWS redshift, matilion, kensis streams etc
£10-20 GBP
Paid on delivery
Two Data engineering/data analytic scenrio based task
Task 1 - Process
What I’d like One to come up with is 5-ish slides on the process and steps you would take in the following situation
A retailer,has agreed to do business with our company,We’ve not worked with them previously and do not know what their data is like. The work will involve product advertising on their website. What we need to do is link this advertising back to their sales data (which will share a common userid with the advertising data) and report in two areas:
[login to view URL] reporting including sales uplift
[login to view URL] operational reports
The data structures are as follows:
Table:Impressions
ImpressionID (PK)
CampaignID
ProductID
ImpressionDatetime
PageViewID (which page it was shown on)
Table :PageViews
PageViewID
PageName
UserID
Table:Clicks
ImpressionID
ClickDatetime
Table:AllSales
SaleID
UserID
ProductID
SaleDatetime
SaleValue
Table:Products
ProductID
ProductName
Table:Campaigns
CampaignID
ProductID
CampaignName
CampaignStartDatetime
CampaignEndDatetime
As a guidance, I’m not looking for code. What we want to see is a high-level set of steps you would go through on receipt of such a dataset (which may include questions about it), and consideration of the objectives that we’d be trying to achieve. An Entity Relationship Diagram should be part of this and a target data architecture too. A further consideration in this work is that we may want to do this for other retailers in a similar position, so repeatability and scalability are important.
Task 2 – Skills Test
What I’d like from you is an approach to cleansing/filtering streaming data. I’d like to see one (possibly two) approach(es) including:
Reasoning for choosing an approach
Considerations that go into making a decision (inc. Risks and GDPR if appropriate)
Relevant Technical Data Flow Architecture
The hypothetical scenario is as follows:
A third-party tech partner company provides us with an advertising PaaS. From their platform they will provide us with Impression and Click data via an AWS Kinesis stream. They have informed us that they’re unable to filter the data to just our instance of the Platform and will be supplying impressions and clicks from other platform users that they’ve asked us to remove from our dataset. We have a defined list of users that should be used as a whitelist for the data filtering.
Again, this should be no more than 5 slides, but should be a bit more detailed than the previous task.
Project ID: #17433612