Analyzing Security when Querying over Decentralized Environments

Jitse De Smet

Analyzing Security when Querying over Decentralized Environments

How to abstract data updates in a permissioned decentralized environment behind a query abstraction layer?

Jitse De Smet

How to abstract data updates in a permissioned decentralized environment behind a query abstraction layer?

  • Situate Thesis
  • Research Question and Hypothesis
  • The Past
  • The Future

Situate Thesis

  • Decentralization Efforts (like Solid Solid project icon )
    • Heterogeneity of Interfaces
      (SPARQL-endpoint, LDP, ...)
    • Heterogeneity of Data
      (I might have a smartwatch, you might not)
    • Heterogeneity of Structure
      (I sort pictures by date, you by location)
  • Query Processing using SPARQL

Situate Thesis: Query Processing using SPARQL

Heterogeneity is hard for developers

Example: SPARQL Query for my selfies with Alice

SELECT * where
{
    ?picture a ex:picture ;
             ex:contains ex:Alice, ex:Bob ;
             ex:taken-by ex:Bob .
}
        
Example: SPARQL Query to add my selfie

INSERT DATA
{
    # a ex:picture ;
      ex:contains ex:Alice, ex:Bob ;
      ex:taken-by ex:Bob .
}
        

Situate Thesis: Solid Spec

Research Question and Hypothesis

"How to abstract data updates in a permissioned decentralized environment behind a query abstraction layer ?"

Data consumers don't interact with the interfaces directly.

Data stores can reject the actions of data consumers.

Data stores are small, distributed, and the owner is in control.

We use a query language (think SPARQL, SQL, ...) to add the abstraction.

Research Question and Hypothesis

  1. The efforts of a developer to update data in a single data store can be significantly lowered by adding a query abstraction layer.
  2. The efforts of a developer to update data in two data stores separately, where data stores have the same interface, but different structures can be significantly lowered by adding a query abstraction layer.
  3. The efforts of a developer to perform a cross-data-store update where data stores can have different interfaces and different structures, can be significantly lowered by adding a query abstraction layer.
  4. The number of additional http requests, compared to manually performing POST the required resources, required by an update-query engine will be small (<5).

The Past: step-by-step

  1. Start by leaving the original idea
  2. Read about querying
  3. Think you will work on query optimization based on structural knowledge
  4. Write shape descriptions for SolidBench
  5. Read about it
  6. Meet with promoter, get the "update query" hint
  7. Read about update queries
  8. Solidify the idea
  9. Read some more specs
  10. Get to work

The Past: Getting to Work

  • What is LDP?
  • Can we use Shape Trees for updates?
Example: LDP Container

<http://example.org/c1/>
   a ldp:BasicContainer;
   dcterms:title "A very simple container";
   ldp:contains <r1>, <r2>, <r3>.
            
Example: LDP Structure
pictures/
  |- Valencia/
  |  |- one.ttl
  |  |- two.ttl
  |- Ghent/
  |  |- one.ttl
  |  |- two.ttl
  |- Paris/
  |  |- one.ttl
  |  |- two.ttl
  |  |- three.ttl
  |- missing.ttl
            
pictures/
  |- 30-01-2024/
  |  |- one.ttl
  |  |- two.ttl
  |- 14-02-2024/
  |  |- one.ttl
  |  |- two.ttl
  |- 17-05-2023/
  |  |- one.ttl
  |  |- two.ttl
  |  |- three.ttl
  |  |- four.ttl
            
Example: SHACL Shape Description

ex:PictureShape
    a sh:NodeShape;
    sh:targetClass ex:Picture ;
    sh:property [
       sh:path ex:depicts ;
       sh:minCount 1 ;
       sh:maxCount 1 ;
       sh:datatype xsd:string ;
    ] ;
    sh:property [
        sh:path ex:contains ;
        sh:nodeKind sh:IRI ;
    ] .
            
Example: Shape Trees

<#PicturesTree>
  a st:ShapeTree ;
  st:expectsType st:Container ;
  st:shape ex:PicturesShape ;
  st:contains <#PicturesByCityTree> .

<#PicturesByCityTree>
  a st:ShapeTree ;
  st:expectsType st:Container ;
  st:shape ex:PicturesByCityShape ;
  st:contains <#PictureTree> .

<#PictureTree>
  a st:ShapeTree ;
  st:expectsType st:Resource ;
  st:shape ex:PictureShape .
            
Is this enough?
To check that, I listed some functional requirements and user stories.
The answer: NO.

The Past: Getting to Work

  1. What if multiple directories match?
    • Do I duplicate?
    • Is one canonical and the other one links to the resource saved in the canonical?
    • And how do I decide which one is canonical?
  2. What if no directories match?
  3. How are resources grouped?
    • Can I just infer that picture-by-date example is just that?
    • What if I need to create a new date directory?
  4. Is that new directory I created a leaf?
    • Or should I make even more directories? (Can be inferred from Shape Tree)
  5. What to do if a resource is changed?
    • Should I alter the Shape Tree?
    • Should I move the resource?
    • Do I have a distance metric, and do I move when the distance is to great?
  6. Should all clients abide to the structural information?

The Future: Overview

  • Adapt Comunica to allow update queries by interpreting SGV
  • Alter SolidBench, so we can measure
  • Feedback Loop: Measure and Adapt

The Future: Evaluation

Experiments using SolidBench:
  • Extend SolidBench with SGV descriptions
  • Implement manual update scripts for each structure
  • Reason how to generalize the different scripts
  • Evaluate updating a single pod using queries
  • Evaluate updating multiple pods using queries

The Future: Evaluation

Possible metrics:
  • Execution time
  • Number of http requests
  • String difference between queries that want the same modification over different data stores
  • What ratio of queries leaves the data store inconsistent when introducing random server failures

Time for Questions