Skip to content

SDI Data Management Service Sample Application

Creating a Data Registry for two cross-domain data and uploading file

The customer wants to analyze Design and Plant data. The customer first creates a data registry for these two sources. Let us assume that Design data contains XML type files and Plant data contains CSV files. so two data tags are created for each source. The customer also wants to append the data from file generated by design team so that new versions are created when the input XML file is different in nature, or data is appended when data points are changed but schema remains unchanged The customer wants to replace the schema and data for Plant provided input files.

Prerequisites

  • You have the right assigned role or technical user credentials.
  • The file to be uploaded is of supported file format.
  • Replace sdi_version with v3 or v4 depending on the SDI API version in all sample endpoints below.

Create a Data Registry for two cross-domain data

Two data registries are created for two sources. This can be done using the endpoint:

1
POST /api/sdi/sdi_version/dataRegistries

For the Design data source, the body of the request is:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
  {
    "datatag": "classification",
    "defaultRootTag":
  "ClassificationCode",
    "filePattern": "[a-zA-Z]+.xml
    "fileUploadStrategy": "append",
    "sourceName: "Design",
    "metaDataTags": ["teamcenter"]
    ]
  }

The result can be verified by checking the response:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
{
    "registryId": "24537F02B61706A223F9D764BD0255C8",
    "sourceName": "design",
    "dataTag": "classification",
    "xmlProcessRules": [],
    "metaDataTags": ["teamcenter"],
    "defaultRootTag": "ClassificationCode",
    "filePattern": "[a-z_A-Z0-9]+.xml",
    "createdDate": "2019-10-21T16:16:08.783Z",
    "updatedDate": "2019-10-21T16:16:08.783Z",
    "mutable": false,
    "fileUploadStrategy": "append",
    "category": "ENTERPRISE"
}

For the Plant data source, the body of the request is:

1
2
3
4
5
6
7
8
  {
    "datatag": "plantprocess",
    "filePattern": "[a-zA-Z]+.csv
    "fileUploadStrategy": "replace",
    "sourceName: "Plant",
    "metaDataTags": ["USAPlant"]
    ]
  }

The result can be verified by checking the response:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
{
    "registryId": "3ADBD1C08D3625C5C0B2AEE9D06CC294",
    "sourceName": "plant",
    "dataTag": "plantprocess",
    "xmlProcessRules": [],
    "metaDataTags": ["USAPlant"],
    "defaultRootTag": null,
    "filePattern": "[a-z_A-Z0-9]+.csv",
    "createdDate": "2019-10-21T16:16:56.466Z",
    "updatedDate": "2019-10-21T16:16:56.466Z",
    "mutable": false,
    "fileUploadStrategy": "replace",
    "category": "ENTERPRISE"
}

Once the data registry is created then customer can perform the upload based on generated data registry.

Upload a file for two cross-domain data

After creating data Registry for two sources. Different files can be uploaded using the following endpoint:

For files of type XML

1
2
  POST /dataUpload:
  File = designxml.xml

For files of type CSV

1
2
  POST /dataUpload:
  File = process.csv

Once files are uploaded successfully for a given tenant, user can then start data ingest using REST POST API for data ingest using the following end point:

1
POST /ingestJobs

For the Design data source, the body of the request is:

1
2
3
4
5
6
  {
    sourceName = Design
    dataTag = classification
    rootTag = ClassificationCode
    filePath = designxml.xml
  }

For the Plant data source, the body of the request is:

1
2
3
4
5
6
  {
    POST /ingestJobs:
    sourceName = Plant
    dataTag = plantprocess
    filePath = process.csv
    }

Once the data ingest process starts, customer will get jobId for each POST request. This jobId explains the current activity of SDI system. SDI system will create two different schemas for the uploaded files which customers can query using query service or use the schema for creating semantic model.

Creating advanced Data Registry for XML\PLMXML

The complex\nested XML can be transformed using xmlProcessRules. User can ignore or flatten nested elements using ignore or index rule. In case the xml contains nested element, then add the rule to xmlProcessRules during registry creation.

The body of the request:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
  <Occurrence id="id33">
  <UserData id="id32" type="AttributesInContext">
  <UserValue value="" title="OccurrenceName"> </UserValue>
  <UserValue value="1400" title="SequenceNumber"></UserValue>
  <UserValue value="" title="ReferenceDesignator"></UserValue>
  </UserData>
  </Occurrence>
  <ProductRevision id="id79" name="90214255__001__PART_WF" accessRefs="#id4" subType="ItemRevision" masterRef="#id80" revision="aa">
  <AssociatedDataSet id="id81" dataSetRef="#id78" role="PhysicalRealization </AssociatedDataSet> 
  <AssociatedDataSet id="id181" dataSetRef="#id180" role="PhysicalRealization"></AssociatedDataSet>
  <AssociatedDataSet id="id205" dataSetRef="#id204" role="PhysicalRealization"></AssociatedDataSet> 
  </ProductRevision>

For the index rule, the value of title is actually a transform/ flattened column and not the title itself. A valid registry for that transform rule is:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
{
  "dataTag": "occ",
  "filePattern": "[a-zA-Z0-9]+.xml",
  "fileUploadStrategy": "append",
  "defaultRootTag":"Occurrence",
  " xmlProcessRules ": [
    "Occurrence.UserData.UserValue.index=title"
  ],
   "sourceName": "teamcenter"
}

In this case, index tag defines the transform rule so that, instead of treating Occurrence.UserData.UserValue_value and Occurrence.UserData.UserValue_title as a column, the system would treat Occurrence.Userdata.UserValue.OccurenceName.value as transformed column.

For the ignore rule, the body of the request is:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
      {
        "dataTag": "productrev",
        "filePattern": "[a-zA-Z0-9]+.xml",
        "fileUploadStrategy": "append",
        "defaultRootTag":"ProductRevision",
        " xmlProcessRules ": [
          "ignore=AssociatedDataSet"
        ],
        "sourceName": "teamcenter"
      }

In this case Ignore tag is what defines an element that needs to be ignored. In this case all elements and sub-elements of AssociatedDataSet will be ignored from processing.

Creating custom data types during schema generation

The customer wants to find out sample regular expression based on the sample data and create their own custom data types that should be used by the SDI system during schema generation. Some of the data contain email addresses of employees.

This can be done using the endpoint:

1
POST /sdi/api/sdi_version/suggestPatterns

With the example values, the URL pattern is as below:

1
/api/sdi/sdi_version/suggestPatterns?sampleValues=myrealemployee@realemail.com&testValues=anothertrueamployee@realemail.com, notmyemail@notmyemail.com

Two patterns will be generated. The customer can register the pattern using register data type endpoints.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
[
  {
    "schema": "[a-z]+[@][a-z]+email.com",
    "matches": [false,true],
    "schemaValid": true
  },
  {
    "schema": "[a-z]+[@][a-z]+[\\.][a-z]+",
    "matches": [false, true],
    "schemaValid": true
  }
]

Searching schema

Data is fed into SDI from ERP corresponding to inventory parts data. The ingested file is CSV file. Search Schema POST method will provide schema of this ingested file with attribute name, data types. Using the POST Method, /searchSchemas schemas can be retrieved for job complete status files. Request Payload can be:

1
2
3
4
5
6
7
8
9
{
  "schemas": [  -> The elements in this list must be similar, each element must contain homogenous parameters. (a combination of dataTag, sourceName and schemaName)
    {
      "dataTag": "string",
      "schemaName": "string",
      "sourceName": "string"
    }
  ]
}

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
{
  "schemas": [  -> The elements in this list must be similar, each element must contain homogenous parameters. (a combination of dataTag, sourceName, metadataTags and schemaName)
    {
      "dataTag": "string",
      "schemaName": "string",
      "sourceName": "string",
      "metaDataTags": ["string"]
    }
  ]
}
1
2
3
4
5
6
7
{
  "schemas": [  -> The elements in this list must be similar, each element must contain homogenous parameters. (like metadataTags)
    {      
      "metaDataTags": ["string"]
    }
  ]
}

Example of creating a data registry for two cross-domain data IDL user

The customer is interested in analyzing Design and Plant data. In this case customer first creates a data registry for those two sources. Let’s say design data contains XML type files and Plant data contains CSV files, so customer creates two data tags for each source. The customer also wants to append the data from file generated by design team so that new versions are created when the input XML file is different in nature or data is appended when there is no change in schema but only data points are changed. The customer wants to replace the schema and data for Plant provided input files.

1
POST /api/sdi/sdi_version/dataRegistries
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
  {
    "datatag": "classification",
    "defaultRootTag":
  "ClassificationCode",
    "filePattern": "[a-zA-Z]+.xml
    "fileUploadStrategy": "append",
    "sourceName: "Design",
    "metaDataTags": ["teamcenter"]
    ]
  }
1
POST /api/sdi/sdi_version/dataRegistries
1
2
3
4
5
6
7
8
  {
    "datatag": "plantprocess",
    "filePattern": "[a-zA-Z]+.csv
    "fileUploadStrategy": "replace",
    "sourceName: "Plant",
    "metaDataTags": ["USAPlant"]
    ]
  }

Once the data registry is created then the customer can perform the upload based on generated data registry using IDL and provide the registryId created above.

Example of Schema Evolution

This section explains how schema evolution works for the given input files and data registry.

Data Registry: NHTSA (source), Vehicle (data tag)

Ingested File Sequence and test data

  • File Name: vehicle_202001.csv contains sample data:
ID Name MfgDate
12345 AwesomeCar 12:20:2015
34555 AnotherAwesomeCar 13:01:2016
32131 AnotherAwesomeCar 01:12:2019

GeneratedSchema:

id: integer
name:string
mfgdate:timestamp

  • File Name: vehicle_202002.csv contains sample data:
ID Name MfgDate
34-456 OKCar 12:20:2020
34555 AnotherAwesomeCar 13:01:2016
32131 AnotherAwesomeCar 01:12:2019

GeneratedSchema:

id: string.
name:string
mfgdate:timestamp
The type for property id is changed from integer to string, as the record limit is within 5000. So, SDI will allow the evolution of schema and changing the data type.

  • File Name: vehicle_202003.csv contains sample data:
ID Name Price MfgDate
34-456 OKCar 25000 12:20:2020
34555 AnotherAwesomeCar 50000 13:01:2016
32131 AnotherAwesomeCar 55000 01:12:2019

GeneratedSchema:

id: string
name:string
mfgdate:timestamp
price:integer
In this case, schema is evolved and new column price is added.

  • File Name: vehicle_202004.csv contains sample data:
ID Name Price MfgDate
34-567 GreatCar 65810.45 Unknown
34555 AnotherAwesomeCar 50000 13:01:2016
32131 AnotherAwesomeCar 55000 01:12:2019

GeneratedSchema:

id: string
name:string
mfgdate:string
price:float

The type for the property mfgdate is changed from timestamp to string. The data type for the property price is changed from integer to float, as record limit is within 5000. So, SDI will allow the evolution of schema and changing the data type.

  • File Name: vehicle_202005.csv contains sample data: after 5000 records
ID Name Price MfgDate
34-789 AwesomeCar Unavailable Unknown
34555 AnotherAwesomeCar 50000 13:01:2016
32131 AnotherAwesomeCar 55000 01:12:2019

This results in error as SDI is unable to convert column price to String because of incompatible type-source float, incoming is string and record limit has reached to 5000. So type change is not allowed during schema evolution.

  • File Name: vehicle_202006.csv contains sample data: after 5000 records
ID Name Price MfgDate
2356 AwesomeCar 95000 13:01:2024
34555 AnotherAwesomeCar 50000 13:01:2016
32131 AnotherAwesomeCar 55000 01:12:2019

GeneratedSchema:
id: string- Incoming type is changed to encompasing type string, based on the existing type.
name:string
mfgdate:string Incoming type is changed to encompasing type string, based on the existing type.
price:float
The type for property id and mfgdate is string since the existing type is string. So, the incoming data type is changed to string to make the schema consistent.

Any questions left?

Ask the community


Except where otherwise noted, content on this site is licensed under the MindSphere Development License Agreement.