Gallery doesn't get updated when a package is deleted

Developer
Dec 3, 2010 at 2:19 AM

So I wrote a little test client to delete packages by going straight to the /Package WCF service.  What I found was that I could successfully delete the package from the server, but that it remained in the gallery.  Isn't it supposed to get deleted from there as well?

Coordinator
Dec 3, 2010 at 2:00 PM

Yes, that is an issue we haven't addressed yet. We didn't want the synchronization service in Orchard be responsible for deleting packages, as that would require a lot of work in that service (it would basically have to query the whole feed and compare to what it has in Orchard). We're thinking this will be another callback scenario (similar to the authorization) so that when a package is deleted through the WCF service, it will basically notify the front-end to delete it's copy of the package also.

Developer
Dec 3, 2010 at 5:28 PM
This discussion has been copied to a work item. Click here to go to the work item and continue the discussion.
Developer
Dec 3, 2010 at 5:36 PM

I talked to Lou and he had ideas on how the synchronization could work to properly take care of deletions by keeping a log of changed packages.  I'll ask him to comment here.

I don't like the idea of it being a callback scenario, as it will make it impossible for someone to run an additional (readonly) instance of the gallery pointing to the same server, which is something that the two-server architecture is meant to allow.  e.g. suppose I want to write my own site that rates the gallery packages because I think I can do it better than the official gallery.  I'll still need to have a way to keep it in sync with the official package list, including deletions.  But if deletions are handled via a callback to the main gallery, I'm out of luck. 

Coordinator
Dec 3, 2010 at 8:48 PM

Actually, we've been talking about this issue some more since my reply this morning. We decided we didn't like the callback idea either. The idea we're pursuing now is for Gallery Server to keep a separate table of deleted packages. So at the time that you call the Delete method on the server API, the package will be deleted from the gallery server tables, and a record will be added to another table to indicate that the package was deleted (along with the DateTime that it was done). Then the synchronizer in the Orchard gallery can simply request the packages that have been deleted since the last time it was run, and delete the packages from Orchard that were returned from that request. This keeps all the synchronization running the same direction, and avoids possible race conditions where a delete callback would be getting issued as the same time as a synchronization run.

We're starting on this implementation now, but if Lou has another idea we'll definitely consider it and change course if warranted.

Developer
Dec 3, 2010 at 8:56 PM

Ok, sounds like this could work.  You'll have to process the deletes before the add/updates, in case someone deletes a package and instantly re-uploads it.  So that deletion table would be accessed via OData, right?

Coordinator
Dec 3, 2010 at 9:06 PM

Our plan right now is for the server to be responsible for keeping the tables in the correct state when an add or update happens. So when a package is deleted, it's removed from the Packages and PublishedPackages tables and a row is inserted into DeletedPackages. When a package is added, it's added to Packages and deleted from DeletedPackages (if that package ID exists in DeletedPackages).

As for accessing the DeletedPackages table, we were just planning to make that another REST method. Is there a specific reason you need it to be OData?

Developer
Dec 3, 2010 at 10:09 PM

I see, if you clean up the deleted tables when it's re-added, then it's fine.

Do whatever you think work best for accessing the deleted table.

Developer
Dec 3, 2010 at 11:31 PM

The best way to replicate a data set between disconnected systems is with a change table and ascending journal id. It's extremely simple to implement and is robust in most failure modes and race conditions (assuming subscriber isn't re-entant).


== publisher-side ==

/* create a journal record */

JournalRecord {int Id; string ChangeType; string ItemType; string Item; }


/* add entries at appropriate points in the publisher */
/* these could also be behind an IJournalService for obvious reasons */

_journalRepository.Save(new JournalRecord {ChangeType="Update", ItemType="Package", Item=thePackageName});

_journalRepository.Save(new JournalRecord {ChangeType="Delete", ItemType="Package", Item=thePackageName});


/* create an endpoint on the server that takes a journal id and returns next changes, up to 100, as xml */

/journal/list/8272

List(int id) {
  var items = _journalRepository.Fetch(
    record => record.Id > id,
    order => order.Asc(record => record.Id),
    0, 100);
 
  return <Journal><JournalRecord Id="8273" ChangeType="Update" ItemType="Package" Item="elmah" />...</Journal>
}


== subscriber-side ==

state: last-known-journal-id

background-task pseudo code {
  changes = webget /journal/list/{last-known-journal-id}

  /* only distinct names are important for mirroring - change type is informational */
  packagenames = changes.where(itemtype==package).distinct(item)

  foreach (packagename in packagenames) {
    /* synchronize item - extract to methods as appropriate, of course*/
    pkgLocal = contentget(packagename)
    pkgRemote = webget(packagename)
    if (pkgRemote absent) {
      if (pkgLocal published) {
        pkgLocal.unpublish
      }
    }
    else if (pkgLocal absent) {
      create and publish pkgLocal
    }
    else {
      update pkgLocal
    }
  }

  last-known-journal-id = max(changes.id)
}

 

Developer
Dec 6, 2010 at 9:04 AM

Thanks Lou.  So the core differences with the current proposal are:

  • Use the journal for all operations instead of special casing deletes
  • Use journal id instead if timestamps to know what needs to be updated.  I can see time stamps being tricky as they rely on the gallery and the server having a matching clock, which could cause edge cases where some entries are missed.

 

Coordinator
Dec 6, 2010 at 4:08 PM

That's a nice solution. I think we'll go ahead and implement the deletes this way since it's not much of a change from the idea we had started to implement anyway. We'll change the Creates and Updates to use this system at a later point, which shouldn't be too hard once we have the infrastructure in place for the the Deletes.

Coordinator
Dec 8, 2010 at 9:35 PM

The journaling solution has been implemented (only for deletes right now). We ended up using the term "log" instead of "journal" but it's the same idea. The background task in the gallery will now query the log on the server to perform deletes. The infrastructure is in place now to do the same thing with create and update, so we'll do that at some point before v1.