Hi Leif,
I added 2 pipes to buildin.py:
- publish_html creates static HTML views of IDPs and SPs, using XSLT based on Peter Schober’s alternative to MET;
- publish_split: similar to store, but added validUntil and creates signed XML-file per EntityDescriptor. This can be consumed dynamically by ADFS in an IDP role.
I put it directly into buildin.py because it shares some code with the sign pipe. Is this viable from your PoV - if yes, I would make an PR.
Cheers, Rainer
Hi all,
being part of Commons Conservancy brought up yet another subject,
which is whether we should add a header with license information in
every file in the projects under idpy. This is not something done in
an abstract way, there is a specific format modelling this information
(see https://spdx.org/ and https://reuse.software/ - more specifically
https://reuse.software/practices/2.0/) Still, I find it problematic.
We want to open up the question to the wider community and consider
their thoughts on this. The forwarded message below is discussing this
subject. You can see the question we posed, the answer we got and my
comments. Feel free to tell us what you think on this.
---------- Forwarded message ---------
Date: Thu, 16 May 2019 at 09:56
> ---------- Forwarded message ----------
> Date: May 8, 2019, 8:15 AM -0700
>
> > Why does CC think having a single license file per project is
> > insufficient? Our thought is that if we can avoid adding a header to
> > every single file, that would be nice, esp. given we already have this
> > info in the license file and we have the Note Well.
>
>
> this is not just our opinion, but something that is an industry and
> community standard for legal compliance these days. When companies like
> Siemens, Samsung or Honeywell use some code in one of the hundreds or
> thousands of devices and systems in their product line, they need to be
> able to provide the correct license and a download of the exact version.
> This means machine readability too.
>
I've actually observed the opposite of that. Communities abandon the
"license in every file" model, and just use a single LICENSE file in
the root of the project. The LICENSE file contains license
information, that is, it is not a single license but it has exception
sections and so on.
> To quote from https://reuse.software/practices/2.0/ :
>
> Scroll to the section "2. Include a copyright notice and license in each
> file"...
>
> "Source code files are often reused across multiple projects, taken from
> their origin and repurposed, or otherwise end up in repositories where
> they are separate from its origin. You should therefore ensure that all
> files in your project have a comment header that convey that file’s
> copyright and license information: Who are the copyright holders and
> under which license(s) do they release the file?
>
Continuing from above, the standardization of package-management
formats and tools has helped exactly with that: to avoid distribution
of single files, and instead provide packages and modules. It is bad
practice and considered a hack to copy files. Nobody liked that model
and everyone is moving away; it is unstructured, it becomes
unmanageable and it will cause problems.
> It is highly recommended that you keep the format of these headers
> consistent across your files. It is important, however, that you do not
> remove any information from headers in files of which you are not the
> sole author.
>
> You must convey the license information of your source code file in a
> standardised way, so that computers can interpret it. You can do this
> with an SPDX-License-Identifier tag followed by an SPDX expression
> defined by the SPDX specifications."
>
> (the text goes on for a while after this, to clarify the point but this
> is the basic gist of it)
>
> There is a nice Python tool to check:
>
> https://github.com/fsfe/reuse-tool
>
> I hope this makes sense
>
Well, it does not make complete sense. We're talking about licensing a
project. A project is not just code; there are data files (html, xml,
yaml, json files), binary files (archives/zip, images, audio, video,
etc), text files (configs, ini-files, etc) all "not-code". How do you
mark those files? Does the LICENSE file need a license-header? The
json format does not define comments, how do you add a header there?
If a binary file does not get a license header, why should a file with
code get one?
I would expect there to be a way to have the needed information
unified. If the files themselves cannot provide this information it
has to be external; thus the LICENSE file. If someone is worried about
somebody else re-using single files that do not have license
information (a python file, a png image, etc) there is really nothing
you can do (the DRM industry has been trying to solve for a long time;
and still your best bet is "social DRM").
Since, we're developing on open source with a permissive license, even
if someone does that, should we be happy that someone is actually
using what we built or sad that the files they copied did not have a
license header? And if they include the license information of that
copied file in their project's LICENSE file, is this solved?
Having pointed these contradictions, I am thinking that the "license
in every file" model seems to be a step backwards. It is introducing
overhead and does not really solve the problem, while at the same time
it enables a culture of bad practice (copying files around).
Cheers,
--
Ivan c00kiemon5ter Kanakarakis >:3
Attendees: Shayna, Mikael, Matthew, Alex Stuart, Enrique
Thank you to Enrique for providing a synopsis of some of the discussion of
PyFF issue 289, below.
0 - Agenda bash
1 - Project review
a. General -
b. OIDC libraries - https://github.com/IdentityPython (idpy-oidc,
JWTConnect-Python-CryptoJWT, etc)
c. Satosa - https://github.com/IdentityPython/SATOSA
- Matthew is working on the container refresh - trying to get default
configuration right. He is trying to decide on what to put in place of
SAMLtest.id.
d. pySAML2 - https://github.com/IdentityPython/pysaml2
e. Any other project (pyFF, djangosaml2, pyMDOC-CBOR, etc)
- There was much discussion on pyFF issue 289
<https://github.com/IdentityPython/pyFF/issues/289>
- There are two issues: distinguishing between name collisions and
the same entity in two different federations. It should not be hard to
distinguish name collisions with heuristics or distance measurements.
Confidence in the same entity in two different federations ??
They may have
the same title or same description.
- Take subsets of all the entities that the MDQ might know about
and create merged data with a merging strategy. It would be
good to make
that data available, but make it clear that it is merged.
- Mikael states they use pyFF as an authoritative source. If they
select which entity is published then they will be making
decisions for
their users. He would like a clearer policy on how the merge
is done. Right
now, the "replace with the latest-read entity id" rule is
hidden in the
code. There are provisions in the code for how to merge when there are
conflicts. We could invent a way to do the merge. Enrique
would like to
look into this. Mikael states that whenever pyFF is run as a
daemon and the
update endpoint is called, the provisions are applied. Mikael
is not sure
how other people are using pyFF - eduGAIN uses pyFF - how do they do
aggregation? Alex says there is an algorithm using a
precedence rule, but
he is not sure if that is using pyFF functionality or if
there is something
additional written in Python whcih makes that selection.
- Mikael would like a cleaner way to define how auto merging is
done, and Enrique would like to have all the data available.
Maybe in the
select pipeline, there is some way to configure/indicate
using different
lists. The aggregate is like a hard rule/limit, and the
discovery service
is more a convenience approach. But If the entity is not part of the
discovery flow, it won't be used at all.
- Here are Enrique's notes summarizing the discussion, with a
potential solution via Alex. Thank you, Enrique! :
- There is a need to provide SeamlessAccess with filtering
capabilities based on entity data that is merged from
different sources.
For example, we might want to filter IdP data based on registration
authority, which will be different in each metadata
source; or based on
entity categories, that may or may not differ in different sources.
- We have been discussing in pyFF's issue #289 the possibility
of performing this merge in pyFF. However we are reaching
the consensus
that if we take this responsibility in pyFF, we would need
to account for
all possible usecases that would benefit from this merged
data, and not
just SeamlessAccess' usecase. This will require more
extensive research and
discussion among more interested parties; and much care egarding
communication and documentation.
- So one possibility to move forward with this, suggested by a
comment from Alex Stuart regarding the UK federation,
would be to move the
responsibility of merging data to the downstream of pyFF
in SeamlesAccess,
i.e., to thiss-mdq. This would entail using pyFF toproduce
one discojson
output for each of the metadata sources aggregated by
SeamlessAccess, and
prepare thiss-mdq to consume all these outputs, merging
them in whatever
way it seems convenient for SeamlessAccess.
- We will discuss this within SeamlessAccess, and we'll be back
with whatever we conclude.
- Matthew asked where the pyFF documentation is? Mikael says
there are documents exported to readthedocs, and an old
version of inline
documentation in the code that is not picked up by modern
tools. Mikael is
trying to improve the documentation and test without breaking anything.
- Matthew wanted to use pyfFF for a private thiss.js deployment
but couldn't quite get it working. For the pipelines, it uses a domain
specific language that is not easy to work with. We are stuck
with that
right now unless there is a new major version. Without a
working example
its hard to get something going - but there are examples in
the unit tests.
- instead of pyFF, the UK federation uses the shibboleth MDA
framework. EduGAIN uses both pyFF and shibboleth MDA. It's a
complicated
framework to set up. The spring config files are XML-based.
In both cases,
the SAML pipelines refresh.
- Mikael is still working on upstreaming the OpenID Federation
code bases / Wallet repositories
2 - AOB
Hi all,
This is what I remember from last Monday's meeting. It was Mikael, Ivan
and me, please both of you correct me on anything I misremember or
forget.
There was talk about pyFF's issue 291 [1] (Broken handling of # in
filenames and urls), Mikael said he was working on it - (or already had
a PR fixing it?)
Then Ivan told us he had released a couple of pysaml2 versions, and was
working on a further couple of PRs, but I forget the details, perhaps
you @Ivan can fill them in.
Finally there was some discussion on pyFF's issue 289 [2] (When an
entity is loaded from 2 sources, entity data from the 1st source is
lost). I started by showing some "proof of concept" code (just around 15
lines, see [3]) that addresses this issue. This is just a POC since:
* The issue is only addressed in MemoryStore, it would probably also
need to be addressed in RedisWooshStore.
* The data is duplicated. In the current version, when entity data is
loaded from the sources, it is kept in 2 structures in the store:
`md`, which is a dictionary of md sources to lists of entityID's, and
`entities`, which is a dictionary of entityID's to entity data. In the
POC code, we add `md_entities`, which is a dictionary of md sources to
dictionaries of entityID's to entity data.
In the POC, the first 2 structures are still used for all the purposes
that they have ever been used for, and the new `md_entities` structure
is only used when the select pipe is configured with the new option
`dedup False` (the option defaults to `dedup True`).
But of course `md_entities` contains all the data that is in `md` and
`entities`, so we might think of removing the latter and using the
former for the purposes that the latter are used.
A few concerns with this issue were discussed:
* How do consumers (MDQ service) deal when they have duplicates in the
metadata and are asked for some particular entity? In thiss-mdq, when
the metadata is loaded, entities are deduplicated, but some number of
(multivalued) entity attributes are merged.
* What happens when one entityID present in 2 md sources correspond to
different entities (name collisions)? This is a difficult problem, but
somewhat orthogonal to the issue, since in the current pyFF form, it
is also present (currently, one of the entities would just dissapear).
* Can this be abused, if federation A has less strict requirements for
some entity attribute than federation B? Yes possibly, this would need
some risk assessment by the working group; some of the metadata would
not be affected, for example registrationAuthority.
In the end, we agreed that more discussion is needed to reach a
difinitive conclussion, and that any solution is going to carry problems
that can at most only be mitigated but not fully solved.
Best regards,
1.- https://github.com/IdentityPython/pyFF/issues/291
2.- https://github.com/IdentityPython/pyFF/issues/289
3.- https://github.com/enriquepablo/pyFF/commit/0fb326d6043c1a3c6c2bb9a431cf4a9…
--
Enrique Pérez Arnaud
Hi
I have produced a minimal POC that would allow pyFF to load metadata
from different sources and produce a list of entities in which there may
be more than one entity with the same entityID (one for each source in
which it was present). It can be tested with the attachments in the
issue tracker.
1.- https://github.com/enriquepablo/pyFF/commit/0fb326d6043c1a3c6c2bb9a431cf4a9…
Best regards,
--
Enrique Pérez Arnaud
Attendees: Johan L, Shayna, Ivan, Mikael, Matthew, Enrique, Hannah
0 - Agenda bash
1 - Project review
a. General
- Moving project repos - all should be moving under IdPy, but who will
maintain the ones that are new and not going to be under other projects?
- Mikael will keep them floating - most (other than the ones being
added to Satosa) are considered POC, and Sunet and SWAMID will
use them as
reference.
- Mikael will look into the process for bringing the repos under the
IdPy umbrella, described here:
https://github.com/IdentityPython/Governance/blob/master/idpy-projects.md
b. OIDC libraries - https://github.com/IdentityPython (idpy-oidc,
JWTConnect-Python-CryptoJWT, etc)
- Nikos will be putting up PRs with some new functionality.
- Everything should be under Roland's branch for the new repos
c. Satosa - https://github.com/IdentityPython/SATOSA
- Will be posting a new release after the call with the
ldap_attribute_store plugin updates.
d. pySAML2 - https://github.com/IdentityPython/pysaml2
- Will be creating a new release after the call to include:
- https://github.com/IdentityPython/pysaml2/pull/964
- https://github.com/IdentityPython/pysaml2/pull/897
- uses pydantic v1 but now we have pydantic v2, so want to make
sure there are no problems - there may be an issue with
the python version.
Ivan is testing with 3.13. Mikael knows there is a
breakage with pyFF with
3.13 that he thought might be related to pydantic.
- Next will look at some changes Giuseppe has prepared and is
using in his fork around namespace names.
- https://github.com/IdentityPython/pysaml2/pull/625
e. Any other project (pyFF, djangosaml2, pyMDOC-CBOR, etc)
- pyFF: Mikael will be taking a look at the hashmark issue mentioned in
the last meeting. Ivan is looking into this as well.
- Mikael and Enrique are collaborating on the issue Enrique described
last week.
2 - AOB
- Matthew had posted some things on Slack about the attribute mapper,
but was able to figure out what he needed.
- SAML defines attributes - they are not just an identifier. There is
the name, the friendly name, and the name format. The name
format tells you
how the name is structured - it is not really a string. It could be a url
or uri , for example. Within the name you could have a uri with a hash
symbol with a pointer, so you cannot just compare the values as strings.
Parsing the objects the right way may show they are the same. The
uniqueness of an attribute does not come from the name - you have to
combine it with the name format.
- Ivan will try to answer this on Slack and give some examples
- Matthew is currently working on signing outgoing SAML requests - it
is not working out of the box. He will gather his questions on this for
another time.
- Matthew is also working on how to structure tests for an application
that uses SAML, and uses jwts after the SAML response. Would like to mock
up a real world application.
- Next goal is to be able to do integration testing, deploying an IdP
that facilitates that.
- Also doing all the same stuff with open id connect. Still working
on getting the proper configuration.
- Next week, Shayna will be out and Matthew has volunteered to take
notes.