8 Commits

Author SHA1 Message Date
Frej Drejhammar
6700b164d0 Merge branch 'PR/293'
Closes #292
2022-10-23 14:47:04 +02:00
chrisjbillington
13c273f10c Resolve unicode escape sequences not being processed correctly
In `process_unicode_escape_sequences()`, any backslash escape sequences
in the original string are escaped upon the first
`.encode('unicode-escape')` and therefore round-trip the sequence of
`.encode('unicode-escape').decode('unicode-escape')`.

That is not what we want - we want these sequences to be passed-through
the `.encode` unchanged, so that they will be converted to the
character they represent upon `.decode()`.

This patch changes the `.encode()` step to pass through any ascii
characters unchanged, only escaping non-ascii characters. This ensures
any existing backslash escape sequences will be interpreted as the
character they represent upon `.decode()`.
2022-10-23 11:51:33 +11:00
Frej Drejhammar
667404e836 Merge branch 'PR291' 2022-09-21 18:31:16 +02:00
Nicolas Vanhoren
38e236962d Update README.md to change recommandation for crlf filtering 2022-09-21 01:37:39 +02:00
Frej Drejhammar
dbb8158527 Merge branch 'frej/submodule-doc-improvement' 2022-02-10 20:05:07 +01:00
Frej Drejhammar
bb0bcda7ba Merge branch 'frej/fix-re-future-warning' 2022-02-10 20:04:14 +01:00
Frej Drejhammar
838b654614 Remove inconsistencies from submodule documentation
The submodule documentation is not consistent with regards to the
example directory structure. Update the example to be consistent.

Closes #277.
2022-02-09 15:58:48 +01:00
Frej Drejhammar
f179afce65 Fix FutureWarning about nested sets in re
Since Python 3.7 the re module warns for syntax which could, in the
future, be misparsed as a nested set. Avoid this by escaping the
literal `[` we search for in the regexp.

Reported by Monte Davidoff @mndavidoff

Closes #269.
2022-02-09 15:37:29 +01:00
3 changed files with 29 additions and 20 deletions

View File

@@ -27,10 +27,10 @@ command line option.
## Example
Example mercurial repo folder structure (~/mercurial):
Example mercurial repo folder structure (~/mercurial) containing two subrepos:
src/...
subrepo/subrepo1
subrepo/subrepo2
subrepos/subrepo1
subrepos/subrepo2
### Setup
Create an empty new folder where all the converted git modules will be imported:
@@ -41,18 +41,18 @@ Create an empty new folder where all the converted git modules will be imported:
mkdir submodule1
cd submodule1
git init
hg-fast-export.sh -r ~/mercurial/subrepo1
hg-fast-export.sh -r ~/mercurial/subrepos/subrepo1
cd ..
mkdir submodule2
cd submodule2
git init
hg-fast-export.sh -r ~/mercurial/subrepo2
hg-fast-export.sh -r ~/mercurial/subrepos/subrepo2
### Create mapping file
cd ~/imported-gits
cat > submodule-mappings << EOF
"subrepo/subrepo1"="../submodule1"
"subrepo/subrepo2"="../submodule2"
"subrepos/subrepo1"="../submodule1"
"subrepos/subrepo2"="../submodule2"
EOF
### Convert main repository
@@ -60,16 +60,16 @@ Create an empty new folder where all the converted git modules will be imported:
mkdir git-main-repo
cd git-main-repo
git init
hg-fast-export.sh -r ~/mercurial --subrepo-map=../submodule-mappings
hg-fast-export.sh -r ~/mercurial --subrepo-map=~/imported-gits/submodule-mappings
### Result
The resulting repository will now contain the subrepo/subrepo1 and
subrepo/subrepo1 submodules. The created .gitmodules file will look
like:
The resulting repository will now contain the submodules at the paths
`subrepos/subrepo1` and `subrepos/subrepo2`. The created .gitmodules
file will look like:
[submodule "subrepo/subrepo1"]
path = subrepo/subrepo1
[submodule "subrepos/subrepo1"]
path = subrepos/subrepo1
url = ../submodule1
[submodule "subrepo/subrepo2"]
path = subrepo/subrepo2
[submodule "subrepos/subrepo2"]
path = subrepos/subrepo2
url = ../submodule2

View File

@@ -133,7 +133,10 @@ is to convert line endings in text files from CRLF to git's preferred LF:
# $2 = Mercurial's hash of the file
# $3 = "1" if Mercurial reports the file as binary, otherwise "0"
if [ "$3" == "1" ]; then cat; else dos2unix; fi
if [ "$3" == "1" ]; then cat; else dos2unix -q; fi
# -q option in call to dos2unix allows to avoid returning an
# error code when handling non-ascii based text files (like UTF-16
# encoded text files)
-- End of crlf-filter.sh --
```

View File

@@ -266,7 +266,7 @@ def sanitize_name(name,what="branch", mapping={}):
if not auto_sanitize:
return mapping.get(name,name)
n=mapping.get(name,name)
p=re.compile(b'([[ ~^:?\\\\*]|\.\.)')
p=re.compile(b'([\\[ ~^:?\\\\*]|\.\.)')
n=p.sub(b'_', n)
if n[-1:] in (b'/', b'.'): n=n[:-1]+b'_'
n=b'/'.join([dot(s) for s in n.split(b'/')])
@@ -434,9 +434,15 @@ def load_mapping(name, filename, mapping_is_raw):
def process_unicode_escape_sequences(s):
# Replace unicode escape sequences in the otherwise UTF8-encoded bytestring s with
# the UTF8-encoded characters they represent. We need to do an additional
# .decode('utf8').encode('unicode-escape') to convert any non-ascii characters into
# their escape sequences so that the subsequent .decode('unicode-escape') succeeds:
return s.decode('utf8').encode('unicode-escape').decode('unicode-escape').encode('utf8')
# .decode('utf8').encode('ascii', 'backslashreplace') to convert any non-ascii
# characters into their escape sequences so that the subsequent
# .decode('unicode-escape') succeeds:
return (
s.decode('utf8')
.encode('ascii', 'backslashreplace')
.decode('unicode-escape')
.encode('utf8')
)
def parse_quoted_line(line):
m=quoted_regexp.match(line)