Skip to content

Downloading data from Integrated Data Lake

This section describes how to download the data from Integrated Data Lake.

Prerequisites

The selection of methods solely depends on the kind of requirement. You can download the data from Integrated Data Lake using below defined methods:

  1. Generate signed URL
  2. Cross account access

Generate Signed URL

To use this method, you can follow below steps:

  1. To generate signed URL to download an object Endpoint:
1
POST /generateDownloadObjectUrls

Content-Type: application/json

Request example:

1
2
3
4
5
6
7
{
  "paths": [
    {
      "path": "myfolder/mysubfolder/myobject.objext"
    }
  ]
}
Response example:

1
2
3
4
5
6
7
8
{
    "objectUrls":[
        {
            "signedUrl":"https://datalake-integ-dide2-5234525690573.s3.eu-central-1.amazonaws.com/data/ten%3Ddide2/myfolder/mysubfolder/myobject.objext?X-Amz-Security-Token=Awervzdg23452xvbxd3434ddg&X-Amz-SignedHeaders=host&X-Amz-Expires=7200&X-Amz-Credentials=ASIATCES50453sdf&X-Amz-Signature=2e2342sfgsdfgsdgh",
            "path":"myfolder/mysubfolder/myobject.objext"
        }
    ]
}
2. You can use this signed URL to download one or multiple objects from the target folder. This URL is valid for 120 mins. Once the time limit is expired, you needs to regenerate the signed URL again.

Endpoint:

1
GET https://datalake-integ-dide2-5234525690573.s3.eu-central-1.amazonaws.com/data/ten%3Ddide2/myfolder/mysubfolder/myobject.objext?X-Amz-Security-Token=Awervzdg23452xvbxd3434ddg&X-Amz-SignedHeaders=host&X-Amz-Expires=7200&X-Amz-Credentials=ASIATCES50453sdf&X-Amz-Signature=2e2342sfgsdfgsdgh

Response example:

1
This is sample text in the file being uploaded.

Cross account access

This method is used, if you need a continuous access to the desired folder for download. Consider an example where you have an AWS account, where any application resides and this application needs to continuously access IDL folder. In such scenarios, Cross Account Access is useful.

To use this method, you can follow below steps:

  1. To create cross account on which access needs to be provided.
1
POST /crossAccounts
1
Content-Type: application/json

Request example:

1
2
3
4
5
6
{
  "name": "testCrossAccount",
  "accessorAccountId": "960568630345",
  "description": "Cross Account Access for Testing",
  "subtenantId": "204a896c-a23a-11e9-a2a3-2a2ae2dbcce4"
}

Response example:

1
2
3
4
5
6
7
8
9
{
  "id": "20234sd34a23a-11e9-a2a3-2a2sdfw34ce4",
  "name": "testCrossAccount",
  "accessorAccountId": "960768132345",
  "description": "Cross Account Access for Testing",
  "timestamp": "2019-09-06T21:23:32.000Z",
  "subtenantId": "204a896c-a23a-11e9-a2a3-2a2ae2dbcce4",
  "eTag": 1
}
2. Once the cross account is created, perform cross account accesses to provide the desired access on desired prefix.

1
POST /crossAccounts/20234sd34a23a-11e9-a2a3-2a2sdfw34ce4/accesses
1
Content-Type: application/json

Request example:

1
2
3
4
5
{
  "description": "Access to read from mysubfolder",
  "path": "myfolder/mysubfolder",
  "permission": "READ"
}

Response example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
{
  "id": "781c8b90-c7b6-4b1c-993c-b51a00b35be2",
  "description": "Access to read from mysubfolder",
  "storageAccount": "dlbucketname",
  "storagePath": "data/ten=tenantname/myfolder/mysubfolder",
  "path": "myfolder/mysubfolder",
  "permission": "READ",
  "status": "ENABLED",
  "timestamp": "2019-11-04T19:19:25.866Z",
  "eTag": 1
}
3. Once the accesses is provided, user can download data through CLI or using AWS SDK to the desired prefix, with the relevant accesses.

Follow the commands given below to download the files from S3 bucket:

$ aws s3 cp s3://tgsbucket/myobject.objext .

download: s3://tgsbucket/myobject.objext to ./myobject.objext

Any questions left?

Ask the community


Except where otherwise noted, content on this site is licensed under the MindSphere Development License Agreement.